This article compares the Bayesian standard of what constitutes convincing evidence to the frequentist reliance on the p-value to measure evidence. The author concludes that a much smaller p-value (0.005 or 0.001) is needed to be consistent with the Bayesian standard.
Johnson VE. Revised standards for statistical evidence. Proceedings of the National Academy of Sciences. 2013. doi:10.1073/pnas.1313476110. Excerpt: “Recent advances in Bayesian hypothesis testing have led to the development of uniformly most powerful Bayesian tests, which represent an objective, default class of Bayesian hypothesis tests that have the same rejection regions as classical significance tests. Based on the correspondence between these two classes of tests, it is possible to equate the size of classical hypothesis tests with evidence thresholds in Bayesian tests, and to equate P values with Bayes factors. An examination of these connections suggest that recent concerns over the lack of reproducibility of scientific studies can be attributed largely to the conduct of significance tests at unjustifiably high levels of significance. To correct this problem, evidence thresholds required for the declaration of a significant finding should be increased to 25–50:1, and to 100–200:1 for the declaration of a highly significant finding. In terms of classical hypothesis tests, these evidence standards mandate the conduct of tests at the 0.005 or 0.001 level of significance.” Available at: http://www.pnas.org/content/110/48/19313.