THE UNIVERSITY OF BRITISH COLUMBIA

Faculty of Pharmaceutical Sciences & UBC Division of Respiratory Medicine
Respiratory Evaluation Sciences Program

Research Theme Methodology

Lowering the P Value Threshold

AUTHORS:

To the Editor Mr Wayant and colleagues evaluated the effect of lowering the significance threshold from .05 to .005 on major randomized clinical trials (RCTs) published in 2017.(1) The authors reported that 70.7% of primary end points remained significant and suggested that lowering the threshold might address statistical issues such as P-hacking.

The probability that a positive finding in a study is a true positive—the positive predictive value (PPV)—depends on a priori knowledge (ie, prior probability) of the replicability of the study findings.(2) The rationale behind the proposed .005 threshold is that a P value of .05 does not correspond to reasonably high PPVs.(3) Although this might be true for early-phase trials, it does not apply to phase 3 RCTs. Major RCTs have a 69% prior probability of being successfully replicated.(4) Thus, a “positive” RCT would have a 97% chance of being a true positive when the P value is .05 (PPV, 97%).

The original proposal of switching to a .005 P value threshold correctly limited its recommendation to “claims of new discoveries.”(3) These types of studies (eg, basic science, preclinical studies, and early-phase trials) have much lower probabilities of successful replication (as low as 9%3) and thus lower PPVs (as low as 53%). An intervention that has made it to a large phase 3 RCT has already been through extensive testing and most false-positive findings have been ruled out.

Lowering the significance threshold to .005 for large phase 3 trials would lead to larger and more expensive trials than the current standards for little added benefit. The .005 threshold would require 70% more participants than would studies powered on a .05 level to achieve statistical significance for the same effect size.(3)

Many statisticians find the principle of relying on a single P value threshold for deciding on the positivity or negativity of studies as arbitrary and flawed.(4) The American Statistical Association’s statement on P values explicitly rejects using such “bright-line rules” for policy decisions and scientific conclusions.(5) We share these concerns. A P value is a continuous measure of evidence and thus best interpreted in the context of the study. A significance threshold of any value encourages publication bias and P-hacking and should be avoided when possible. However, if a P value significance threshold has to be used for a phase 3 RCT, .05 is good enough. For now, there is no compelling reason to lower the P value threshold for late-phase RCTs.

References:

  1. Wayant C, Scott J, Vassar M. Evaluation of Lowering the P Value Threshold for Statistical Significance From .05 to .005 in Previously Published Randomized Clinical Trials in Major Medical Journals. JAMA. 2018;320(17):1813. doi:10.1001/jama.2018.12288
  2. Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2(8):0696-0701. doi:10.1371/journal.pmed.0020124
  3. Benjamin DJ, Berger JO, Johannesson M, et al. Redefine statistical significance. Nat Hum Behav. 2018;2(1):6-10. doi:10.1038/s41562-017-0189-z
  4. Ioannidis JPA. Contradicted and Initially Stronger Effects in Highly Cited Clinical Research. JAMA. 2005;294(2):218. doi:10.1001/jama.294.2.218
  5. Wasserstein RL, Lazar NA. The ASA’s Statement on p -Values: Context, Process, and Purpose. Am Stat. 2016;70(2):129-133. doi:10.1080/00031305.2016.1154108

Emergency Procedures   |   Terms of Use   |   UBC Copyright   |   Accessibility