Incentive-Compatible Critical Values
authors・Adam McCloskey, Pascal Michaillat
abstract・Scientific research heavily relies on statistical hypothesis testing—for instance, to evaluate theories, or to assess the effectiveness of public policies. Such tests are informative when their size is properly controlled, so the frequency of rejecting true null hypotheses (type I error) stays below a prespecified nominal level. Publication bias, however, tends to exaggerate test sizes. Since scientists are typically only able to publish significant results, they have an incentive to conduct studies until reaching significance. Such a process leads to test sizes inflated above nominal significance levels, because the critical values used to determine significance assume that test statistics are constructed from a single study. This paper addresses this problem by constructing critical values that are compatible with scientists' behavior given their incentives. We assume that researchers conduct studies until their test statistic exceeds the critical value, or until the expected benefit from conducting an extra study falls below the cost. When such incentive-compatible critical value (ICCV) is used to assess significance, researchers may conduct multiple studies; but readers know that a significant result would not occur more often than the nominal significance level when the null hypothesis is true. ICCVs are larger than classical critical values because researchers report the best result from multiple studies. Yet, for a broad range of research processes and calibrations, ICCVs stay in a fairly narrow range.