Submitted Date
Subjects
Authors
Institution
Your conditions: 贾彬彬
  • 解读不显著结果:基于500个实证研究的量化分析

    Subjects: Psychology >> Social Psychology submitted time 2023-03-28 Cooperative journals: 《心理科学进展》

    Abstract: Background: P-value is the most widely used statistical index for inference in science. A p-value greater than 0.05, i.e., nonsignificant results, however, cannot distinguish the two following cases: the absence of evidence or the evidence of absence. Unfortunately, researchers in psychological science may not be able to interpret p-values correctly, resulting in wrong inference. For example, Aczel et al (2018), after surveying 412 empirical studies published in Psychonomic Bulletin & Review, Journal of Experimental Psychology: General, and Psychological Science, found that about 72% of nonsignificant results were misinterpreted as evidence in favor of the null hypothesis. Misinterpretations of nonsignificant results may lead to severe consequences. One such consequence is missing potentially meaningful effects. Also, in matched-group clinical trials, misinterpretations of nonsignificant results may lead to false “matched” groups, thus threatening the validity of interventions. So far, how nonsignificant results are interpreted in Chinese psychological literature is unknown. Here we surveyed 500 empirical papers published in five mainstream Chinese psychological journals, to address the following questions: (1) how often are nonsignificant results reported; (2) how do researchers interpret nonsignificant results in these published studies; (3) if researchers interpreted nonsignificant as “evidence for absence,” do empirical data provide enough evidence for null effects? Method: Based on our pre-registration (https://osf.io/czx6f), we first randomly selected 500 empirical papers from all papers published in 2017 and 2018 in five mainstream Chinese psychological journals (Acta Psychologica Sinica, Psychological Science, Chinese Journal of Clinical Psychology, Psychological Development and Education, Psychological and Behavioral Studies). Second, we screened abstracts of these selected articles to check whether they contain negative statements. For those studies which contain negative statements in their abstracts, we searched nonsignificant statistics in their results and checked whether the corresponding interpretations were correct. More specifically, all those statements were classified into four categories (Correct-frequentist, Incorrect-frequentist: whole population, Incorrect-frequentist: current sample, Difficult to judge). Finally, we calculated Bayes factors based on available t values and sample sizes associated with those nonsignificant results. The Bayes factors can help us to estimate to what extent those results provided evidence for the absence of effects (i.e., the way researchers incorrectly interpreted nonsignificant results). Results: Our survey revealed that: (1) out of 500 empirical papers, 36% of their abstracts (n = 180) contained negative statements; (2) there are 236 negative statements associated with nonsignificant statistics in those selected studies, and 41% of these 236 negative statements misinterpreted nonsignificant results, i.e., the authors inferred that the results provided evidence for the absence of effects; (3) Bayes factor analyses based on available t-values and sample sizes found that only 5.1% (n = 2) nonsignificant results could provide strong evidence for the absence of effects (BF01 > 10). Compared with the results from Aczel et al (2019), we found that empirical papers published in Chinese journals contain more negative statements (36% vs. 32%), and researchers made fewer misinterpretations of nonsignificant results (41% vs. 72%). It worth noting, however, that there exists a categorization of ambiguous interpretations of nonsignificant results in the Chinese context. More specifically, many statements corresponding to nonsignificant results were “there is no significant difference between condition A and condition B”. These statements can be understood either as “the difference is not statistically significant”, which is correct, or “there is no difference”, which is incorrect. The percentage of misinterpretations of nonsignificant results raised to 64% if we adopt the second way to understand these statements, in contrast to 41% if we used the first understanding. Conclusion: Our results suggest that Chinese researchers need to improve their understanding of nonsignificant results and use more appropriate statistical methods to extract information from nonsignificant results. Also, more precise wordings should be used in the Chinese context.

  • Evaluating null effect in psychological research: A practical primer

    Subjects: Psychology >> Statistics in Psychology submitted time 2021-04-25

    Abstract: 在心理学研究中,以下两种情况下研究者可能需要对零效应进行评估:第一,推断某种效应不存在;第二,意外出现不显著结果,需要区分到底是效应不存在还是当前数据未能提供足够的证据。然而,常用的原假设显著性检验(Null hypothesis significance test, NHST)无法直接评估零效应。近年来,等价检验、贝叶斯估计和贝叶斯因子三种方法逐渐被用于评估零效应:在频率统计框架下,等价检验通过检验效应是否在最小感兴趣区内(Smallest effect size of interest, SESOI),通过p值来推断效应是否为零;在贝叶斯统计框架下,贝叶斯估计通过对比后验分布的最高密度区间和实际等价区的重叠情况,推断效应是否为零;而贝叶斯因子则是通过评估当前数据对原假设和备择假设的相对支持程度,推断当前数据对原假设的相对支持程度。文章通过分析两个真实的数据,展示三种方法的实际应用。三种方法各有其特点:等价检验在逻辑上是对NHST的拓展,易于从传统统计中延伸使用;贝叶斯因子的解读较符合直觉,逻辑上清晰;贝叶斯估计则具有较强的灵活性,可拓展于更多的研究问题。以上三种评估零效应的方法,可能能够帮助心理学研究者在实际研究中进行合理的统计推断和研究决策。

  • Interpreting Nonsignificant Results: A Quantitative Investigation Based on 500 Chinese Psychological Research

    Subjects: Psychology >> Statistics in Psychology submitted time 2020-10-17

    Abstract: P-value is the most widely used statistical index for inference in science. Unfortunately, researchers in psychological science may not be able to interpret p-value correctly, resulting in possible mistakes in statistical inference. Our specific goal was to estimate how nonsignificant results were interpreted in the empirical studies published in Chinese Journals. Frist, We randomly selected 500 empirical research papers published in 2017 and 2018 in five Chinese prominent journals (Acta Psychological Sinica, Psychological Science, Chinese Journal of Clinical Psychology, Psychological Development and Education, Psychological and Behavioral Studies). Secondly, we screened the abstracts of the selected articles and judged whether they contained negative statements. Thirdly, we categorized each negative statement into 4 categories (Correct-frequentist, Incorrect-frequentist: whole population, Incorrect-frequentist: current sample, Difficult to judge). Finally, we calculated Bayes factors based on the t values and sample size associated with the nonsignificant results to investigate whether empirical data provide enough evidence in favor of null hypothesis. Our survey revealed that: (1) 36% of these abstracts (n = 180) mentioned nonsignificant results; (2) there were 236 negative statements in the article that referred to nonsignificant results in abstracts, and 41% negative statements misinterpreted nonsignificant results; (3) 5.1% (n = 2) nonsignificant results can provide strong evidence in favor of null hypothesis (BF01 > 10). The results suggest that Chinese researchers need to enhance their understanding of nonsignificant results and use more appropriate statistical methods to extract information from non-significant results.

  • Calculating Confidence Intervals of Cohen's d and Eta-squared: A Practical Primer

    Subjects: Psychology >> Statistics in Psychology submitted time 2019-04-15

    Abstract: The recent replication crisis in psychology has motivated many researchers to reform the methods they used in research, reporting effect sizes (ES) and their confidence intervals (CIs) becomes a new standard in mainstream journals. However, a practical tutorial for calculating CIs is still lacking. In this primer, we introduced theoretical basis of CIs of the two most widely-used effect size, Cohen's d and η2, in plain language. The CIs of both Cohen's d and η2 are calculated under the condition that the alternative hypothesis (H1) is true, and both rely on the estimation of non-centrality parameters of non-central distributions by using iterative approximations. More specifically, non-central t-distribution for Cohen's d and non-central F-distribution for η2. Then, we illustrated how to calculate them in R and JASP with real data. This practical primer may help Chinese psychological researchers understand the CIs better and report CIs in their own research. "