Incentive-Compatible Critical Values
authors・Adam McCloskey, Pascal Michaillat
abstract・Statistical hypothesis tests are a cornerstone of scientific research. The tests are informative when their size is properly controlled, so the frequency of rejecting true null hypotheses (type I error) stays below a prespecified nominal level. Publication bias exaggerates test sizes, however. Since scientists can typically only publish results that reject the null hypothesis, they have the incentive to continue conducting studies until attaining rejection. Such p-hacking takes many forms: from collecting additional data to examining multiple regression specifications, all in the search of statistical significance. The process inflates test sizes above their nominal levels because the critical values used to determine rejection assume that test statistics are constructed from a single study—abstracting from p-hacking. This paper addresses the problem by constructing critical values that are compatible with scientists' behavior given their incentives. We assume that researchers conduct studies until finding a test statistic that exceeds the critical value, or until the benefit from conducting an extra study falls below the cost. We then solve for the incentive-compatible critical value (ICCV). When the ICCV is used to determine rejection, readers can be confident that size is controlled at the desired significance level, and that the researcher's response to the incentives delineated by the critical value is accounted for. Since they allow researchers to search for significance among multiple studies, ICCVs are larger than classical critical values. Yet, for a broad range of researcher behaviors and beliefs, ICCVs lie in a fairly narrow range.
illustration・ICCV for two-sided hypothesis tests with 5% significance level, in various situations and calibrations.