Abstract:
Background: P-value is the most widely used statistical index for inference in science. A p value greater than 0.05, i.e., nonsignificant results, however, cannot distinguish the two following situations: the absence of evidence or the evidence of absence. Unfortunately, researchers in psychological science may not be able to interpret p-value correctly, resulting in possible mistakes in statistical inference based on nonsignificant result. Indeed, Aczel et al (2019) surveyed three empirical studies published in Psychonomic Bulletin & Review, Journal of Experimental Psychology: General, and Psychological Science. They found that about 72% of nonsignificant results were misinterpreted as evidence in favor of the null hypothesis. The misinterpretation of nonsignificant results may lead severe consequences. One such consequence is the dismay of the nonsignificant results as null effect, ignoring the small but meaningful effects (e.g., Jia, et al., 2018). More importantly, misintepreted non-signficant results when comparing certain traits (e.g., age, gender) in matched-group clinical trials may creat a false “matched” group, thus render the effect of intervention meaningless. As psychological science keeps growing in China, it is important to estimate how nonsignificant results were interpreted in the empirical studies published in Chinese Journals. However, no such meta-research has been done. To fill the gap, we surveyed 500 empirical papers published in five important Chinese psychological journals, to explore the following questions: (1) how often are nonsignificant results reported, that is, how severe is the publication bias; (2) how do researchers interpret nonsignificant results in their own studies; (3) if researcher interpreted nonsignificant as “evidence for absence,” does empirical data provide enough support the null effect.
Method: Based on our pre-registration (https://osf.io/czx6f), we randomly selected empirical research papers published in 2017 and 2018 in five Chinese prominent journals (Acta Psychologica Sinica, Psychological Science, Chinese Journal of Clinical Psychology, Psychological Development and Education, Psychological and Behavioral Studies). First, according to the publication volume of each journal, we randomly selected 500 empirical research. Secondly, we screened the abstracts of the selected articles and judged whether they contained negative statements. Thirdly, we categorized each negative statement into 4 categories (Correct-frequentist, Incorrect-frequentist: whole population, Incorrect-frequentist: current sample, Difficult to judge). Finally, we calculated Bayes factors based on the t values and sample size associated with the nonsignificant results to investigate whether empirical data provide enough evidence in favor of null hypothesis.
Results: Our survey revealed that: (1) out of 500 empirical research, 36% of their abstracts (n = 180) mentioned nonsignificant results; (2) there were 236 negative statements in the article that referred to nonsignificant results in abstracts, and 41% negative statements misinterpreted nonsignificant results, i.e., the authors inferred that the results provided evidence for the absence of effects; (3) 5.1% (n = 2) nonsignificant results can provide strong evidence in favor of null hypothesis (BF01 > 10). Compared with the results from Aczel et al (2019), we found that empirical papers published in Chinese journal reported more nonsignificant results (36% vs. 32%), and researchers make fewer misinterpretation based on nonsignificant results (41% vs. 72%). It worth noting that there exists a categorization of ambiguous statements about nonsignificant results in the Chinese context: “there is no significant difference between condition A and condition B”. This statement has two interpretations: it can be interpreted as a different way to say “statistically nonsignificant”, or as “there is no differences between condition A and condition B”. The percentage of misinterpretation of nonsignificant results raised to 61% if we used the second interpretation, instead of 41% when we use the first interpretation.
Conclusion: The results suggest that Chinese researchers need to enhance their understanding of nonsignificant results and use more appropriate statistical methods to extract information from non-significant results. Also, more precise wording should be used in the Chinese context.
"
-
From:
王珺
-
Subject:
Psychology
>>
Statistics in Psychology
-
Cite as:
ChinaXiv:202003.00056
(or this version
ChinaXiv:202003.00056V1)
DOI:10.12074/202003.00056V1
CSTR:32003.36.ChinaXiv.202003.00056.V1
-
TXID:
0875eb08-5238-44e4-b866-e43488d4abab
- Recommended references:
王珺,宋琼雅,许岳培,贾彬彬,陆春雷,陈曦,戴紫旭,黄之玥,李振江,林景希,罗婉莹,施赛男,张莹莹,臧玉峰,左西年,胡传鹏.解读不显著结果:基于500个实证研究的量化分析.中国科学院科技论文预发布平台.[ChinaXiv:202003.00056V1]
(Click&Copy)