Submitted Date
Subjects
Authors
Institution
Your conditions: Statistics in Psychology
  • Model construction for intensive longitudinal dyadic data analysis

    Subjects: Psychology >> Statistics in Psychology submitted time 2024-04-28

    Abstract: Dyadic studies, in which two persons interacting with each other (called a dyad) are the fundamental unit of analysis, are widely used in psychological studies involving interpersonal phenomena. The integration of such studies with intensive longitudinal designs helps to further investigate the dynamics of both individual behaviors and interpersonal effects during the social interactions. However, there is a lack of appropriate statistical approaches that can adequately answer the dyadic research questions of interest based on the characteristics of intensive longitudinal data. Through simulation and empirical studies, this project will investigate the construction, extension, and applications of appropriate statistical models for intensive longitudinal data of different dyadic designs within the framework of Dynamic Structural Equation Modeling (DSEM).
    Specifically, the research contents include: (1) constructing two actor-partner DSEMs with different detrending approaches and selecting the better model for intensive longitudinal data from the standard dyadic design; (2) developing an appropriate statistical model for the intensive longitudinal one-with-many data and extending it to more complex data with time trends; (3) developing an appropriate statistical model for the intensive longitudinal round-robin data and extending it to data with time trends; and (4) illustrating the application of the constructed or extended models under three intensive longitudinal dyadic designs. This project will advance the psychological research to gain a deeper and more scientific understanding of changes in individual behaviors and interpersonal effects in the context of social interactions.

  • Model comparison in cognitive modeling

    Subjects: Psychology >> Cognitive Psychology Subjects: Psychology >> Statistics in Psychology submitted time 2024-04-17

    Abstract: Cognitive modeling has gained widespread application in psychological research. Model comparison plays a crucial role in cognitive modeling, as researchers need to select the best model for subsequent analysis or latent variable inference. Model comparison involves considering not only the fit of the models to the data (balancing overfitting and underfitting) but also the complexity of the parameter data and mathematical forms. This article categorizes and introduces three major classes of model comparison metrics commonly used in cognitive modeling, including: goodness-of-fit metrics (such as mean squared error, coefficient of determination, and ROC curves), cross-validation-based metrics (such as AIC, DIC), and marginal likelihood-based metrics. The computation methods and pros and cons of each metric are discussed, along with practical implementations in R using data from the orthogonal Go/No-Go paradigm. Based on this foundation, the article identifies the suitable contexts for each metric and discusses new approaches such as model averaging in model comparison.

  • The implementation of Bayesian ANOVA in JASP: A practical primer

    Subjects: Psychology >> Statistics in Psychology submitted time 2024-04-16

    Abstract: The application of Bayesian statistics to hypothesis testing - Bayes factors - is increasing in psychological science. Bayes factors quantify the evidence supporting the competing hypothesis or model, respectively, thereby making a judgment about which hypothesis or model is more supported by the data based on its value. The principles and applications of Bayes factor for ANOVA are, however, not available in China. We first present the theoretical foundation of Bayesian ANOVA and its calculation rules. It also shows how to perform Bayesian ANOVA and how to interpret and report the results of five common designs (one-factor between-group design, one-factor within-group design, two-factor between-group design, two-factor within-group design, and two-factor mixed design) using example data. Theoretically, Bayesian ANOVA is an effective alternative to conventional ANOVA as a powerful vehicle for statistical inferences.

  • Estimating test reliability of intensive longitudinal studies: Perspectives on multilevel structure and dynamic nature

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Statistics in Psychology submitted time 2023-11-28

    Abstract: With the widespread use of intensive longitudinal studies in psychology and other social sciences, reliability estimation of tests in intensive longitudinal studies has received increasing attention. Earlier reliability estimation methods drawn from cross-sectional studies or based on generalizability theory have many limitations and are not applicable to intensive longitudinal studies. Considering the two main characteristics of intensive longitudinal data, multilevel structure and dynamic nature, the reliability of tests in intensive longitudinal studies can be estimated based on multilevel confirmatory factor analysis, dynamic factor analysis, and dynamic structural equation models. The main features and applicable contexts of these three reliability estimation methods are demonstrated with empirical data. Future research could explore the reliability estimation methods based on other models, and should also pay more attention to the testing and reporting of test reliability in intensive longitudinal studies.

  • Confidence Interval Width Contours: Sample Size Planning for Linear Mixed-Effects Models

    Subjects: Psychology >> Statistics in Psychology submitted time 2023-10-07

    Abstract: Hierarchical data, which is observed frequently in psychological experiments, is usually analyzed with the linear mixed-effects models (LMEMs), as it can account for multiple sources of random effects due to participants, items, and/or predictors simultaneously. However, it is still unclear of how to determine the sample size and number of trials in LMEMs. In history, sample size planning was conducted based purely on power analysis. Later, the influential article of Maxwell et al. (2008) has made clear that sample size planning should consider statistical power and accuracy in parameter estimation (AIPE) simultaneously. In this paper, we derive a confidence interval width contours plot with the codes to generate it, providing power and AIPE information simultaneously. With this plot, sample size requirements in LMEMs based on power and AIPE criteria can be decided. We also demonstrated how to run sensitivity analysis to assess the impact of the magnitude of experiment effect size and the magnitude of random slope variance on statistical power, AIPE and the results of sample size planning.
    There were two sets of sensitivity analysis based on different LMEMs. Sensitivity analysis Ⅰ investigated how the experiment effect size influenced power, AIPE and the requirement of sample size for within-subject experiment design, while sensitivity analysis Ⅱ investigated the impact of random slope variance on optimal sample size based on power and AIPE analysis for the cross-level interaction effect. The results for binary and continuous between-subject variables were compared. In these sensitivity analysis, two factors regarding sample size varied: number of subjects (I=10, 30, 50, 70, 100, 200, 400, 600, 800), number of trials (J=10, 20, 30, 50, 70, 100, 150, 200, 250, 300). The additional manipulated factor was the effect size of experiment effect (standard coefficient of experiment condition= 0.2, 0.5, 0.8, in sensitivity analysis Ⅰ) and the magnitude of random slope variance (0.01, 0.09 and 0.25, in sensitivity analysis Ⅱ). A random slope model was used in sensitivity analysis Ⅰ, while a random slope model with level-2 independent variable was used in sensitivity analysis Ⅱ. Data-generating model and fitted model were the same. Estimation performance was evaluated in terms of convergence rate, power, AIPE for the fixed effect, AIPE for the standard error of the fixed effect, and AIPE for the random effect.
    The results are as following. First, there were no convergence problems under all the conditions , except that when the variance of random slope was small and a maximal model was used to fit the data. Second, power increased as sample size, number of trials or effect size increased. However, the number of trials played a key role for the power of within-subject effect, while sample size was more important for the power of cross-level effect. Power was larger for continuous between-subject variable than for binary between-subject variable. Third, although the fixed effect was accurately estimated under all the simulation conditions, the width 95% confidence interval (95%width) was extremely large under some conditions. Lastly, AIPE for the random effect increased as sample size and/or number of trials increased. The variance of residual was estimated accurately. As the variance of random slope increased, the accuracy of the estimates of variances of random intercept decreased, and the accuracy of the estimates of random slope increased.
    In conclusion, if sample size planning was conducted solely based on power analysis, the chosen sample size might not be large enough to obtain accurate estimates of effects size. Therefore, the rational for considering statistical power and AIPE during sample size planning was adopted. To shed light on this issue, this article provided a standard procedure based on a confidence interval width contours plot to recommend sample size and number of trials for using LMEMs. This plot visualizes the combined effect of sample size and number of trials per participant on 95% width, power and AIPE for random effects. Based on this tool and other empirical considerations, practitioners can make informed choices about how many participants to test, and how many trials to test each one for.
     

  • Practical application of Bayesian linear mixed-effects models in psychology: A primer

    Subjects: Psychology >> Statistics in Psychology Subjects: Psychology >> Experimental Psychology submitted time 2023-08-11

    Abstract: Compared to the traditional statistical methods, Bayesian linear mixed-effects modeling (BLMM) has a great number of advantages in dealing with the hierarchical structures underlying datasets and providing more intuitive statistical results. These advantages together popularize BLMM in psychological and other field research. However, there is still a lack of tutorials on the practical applications of BLMM in psychology studies in China. Therefore, we first briefly introduced the basic concepts and rationales of BLMM. Then we employed a simulated dataset to demonstrate how to understand fixed effects and random effects, and how to use the popular brms R package to specify models for BLMM based on the experimental design. We additionally covered the procedure of pre-specifying priors with prior predictive checks, and the steps of performing hypothesis testing using the Bayes Factor. BLMM, with its extensions such as Generalized BLMM, has great flexibility and capability, they can and should be applied in various psychology research.

  • Using Excel software to calculate Bayesian factors: taking goodness of fit test (Chi-square test) as an example

    Subjects: Mathematics >> Statistics and Probability Subjects: Psychology >> Statistics in Psychology submitted time 2023-07-02

    Abstract: Taking the goodness of fit test (Chi test) as an example, this paper attempts to calculate the Bayesian factor BF10 of n-fold Bernoulli test by the Excel software (using JASP software as the evidence). The results showed that in the range of 0.15-0.55 (the rate of samples which are all "true"), the calculated results of Excel were more accurate, and the differences between the two (Excel and JASP) were not statistically significant (P>0.3).

  • On the reliability of point estimation of model parameter: taking the CDMs as an example

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Statistics in Psychology submitted time 2023-05-11

    Abstract: Cognitive diagnostic models (CDMs) are psychometric models which have received increasing attention within the field of psychological, educational, social, biological, and many other disciplines. It has been argued that an inappropriate convergence criterion for MLE-EM (maximum likelihood estimation using the expectation maximization) algorithm could result in unpredictably distorted model parameter estimates, and thus may yield unstable and misleading conclusions drawn from the fitted CDMs. Although several convergence criteria have been developed, it remains an unexplored question, how to specify the appropriate convergence criterion for the fitted CDMs.
    A comprehensive method for assessing convergence is proposed in this study. To minimize the impact by the model parameter estimation framework, a new framework adopting the multiple starting values strategy mCDM is introduced. To examine the performance of the convergence criterion for MLE-EM in CDMs, a simulation study under various conditions was conducted. Five convergence assessment methods were examined: the maximum absolute change in model parameters, the maximum absolute change in item endorsement probabilities and structural parameters, the absolute change in log-likelihood, the relative log-likelihood, and the comprehensive method. The data generating models were the saturated CDM and the hierarchical CDM. The number of items was set to J = 16 and 32. Three levels of sample sizes were considered: 500, 1000, and 4000. Three convergence tolerance value conditions were: 10-4 , 10-6 , and 10-8 . The simulated response data were fitted by the saturated CDM using the mCDM and the R package GDINA. And the maximum number of iterations was set to 50000.
    Simulation results suggest that:
    (1) The saturated CDM converged under all conditions. However, the actual number of iterations exceeded 30000 under some conditions, which implies that when predefined maximum iteration number is less than 30000, the MLE-EM algorithm might mistakenly stop.
    (2) The model parameter estimation framework affected the performance of the convergence criteria. The performance of the convergence criteria under the mCDM framework was comparable or superior to that of the GDINA framework.
    (3) Regarding the convergence tolerance values considered in this study, 10-8  consistently had the best performance in providing the maximum value of the log-likelihood and 10-4  had the worst as suggested by the higher log-likelihood value. Compared to all other convergence assessment methods, the comprehensive method in general had the best performance, especially under the mCDM framework. The performance of the maximum absolute change in model parameters was similar to the comprehensive method, however, its good performance was not guaranteed. On the contrary, the relative log-likelihood had the worst performance under the mCDM or GDINA framework.
    The simulation results showed that, the most appropriate convergence criterion for MLE-EM in CDMs was the comprehensive method with tolerance 10-8  under the mCDM framework. Results from the real data analysis also demonstrated the good performance of the proposed comprehensive method and mCDM framework.
     

  • Scaling methods of second-order latent growth models and their comparable first-order latent growth models

    Subjects: Psychology >> Statistics in Psychology submitted time 2023-03-29

    Abstract:

    Latent growth models (LGMs) are a powerful tool for analyzing longitudinal data, and have attracted the attention of scholars in psychology and other social science disciplines. For a latent variable measured by multiple indicators, we can establish both a univariate LGM (also called first-order LGM) based on composite scores and a latent variable LGM (also called second-order LGM) based on indicators. The two model types are special cases of the first-order and second-order factor models respectively. In either case, we need to scale the factors, that is, to specify their origin and unit. Under the condition of strong measurement invariance across time, the estimation of growth parameters in second-order LGMs depends on the scaling method of factors/latent variables. There are three scaling methods: the scaled-indicator method (also called the marker-variable identification method), the effect-coding method (also called the effect-coding identification method), and the latent-standardization method.

    The existing latent-standardization method depends on the reliability of the scaled-indicator or the composite scores at the first time point. In this paper, we propose an operable latent-standardization method with two steps. In the first step, a CFA with strong measurement invariance is conducted by fixing the mean and variance of the latent variable at the first time point to 0 and 1 respectively. In the second step, estimated loadings in the first step are employed to establish the second-order LGM. If the standardization is based on the scaled-indicator method, the loading of the scaled-indicator is fixed to that obtained in the first step, and the intercept of the scaled-indicator is fixed to the sample mean of the scaled-indicator at the first time point. If the standardization is based on the effect-coding method, the sum of loadings is constrained to the sum of loadings obtained in the first step, and the sum of intercepts is constrained to the sum of the sample mean of all indicators at the first time point. We also propose a first-order LGM standardization procedure based on the composite scores. First, we standardize the composite scores at the first time point, and make the same linear transformation of the composite scores at the other time points. Then we establish the first-order LGM, which is comparable with the second-order LGM scaled by the latent-standardization method.

    The scaling methods of second-order LGMs and their comparable first-order LGMs are systematically summarized. The comparability is illustrated by modeling the empirical data of a Moral Evasion Questionnaire. For the scaled-indicator method, second-order LGMs and their comparable first-order LGMs are rather different in parameter estimates (especially when the reliability of the scale-indicator is low). For the effect-coding method, second-order LGMs and their comparable first-order LGMs are relatively close in parameter estimates. When the latent variable at the first time point is standardized, the mean of the intercept-factor of the first-order LGM is close to 0 and not statistically significant; so is the mean of the intercept-factor of the second-order LGM through the effect-coding method, but those through two scaled-indicator methods are statistically significant and different from each other.  

    According to our research results, the effect-coding method is recommended to scale and standardize the second-order LGMs, then comparable first-order LGMs are those based on the composite scores and their standardized models. For either the first-order or second-order LGM, the standardized results obtained by modeling composite total scores and composite mean scores are identical.

  • CCTE-A database of Chinese COVID-19 Terms

    Subjects: Psychology >> Cognitive Psychology Subjects: Psychology >> Experimental Psychology Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Statistics in Psychology Subjects: Psychology >> Other Disciplines of Psychology Subjects: Linguistics and Applied Linguistics >> Linguistics and Applied Linguistics Subjects: Other Disciplines >> Synthetic discipline submitted time 2023-02-08

    Abstract: Objective: To establish a multi-dimensional and standardized lexical database of COVID-19-related terms and words. The database may have facilitated COVID-19-related research in domains such as Psychology, Psychiatry, Neuroscience, etc. Methods: This database referred to the established methods of the emotional lexical database at home and abroad, and used the dot-detection task and words in the database as experimental materials to test the attention bias of the subjects suspected of having COVID-19 phobia, so as to test the validity of the database. Results: 196 COVID-19-related words and 99 neutral words were included in the word database. Then, we classified and evaluated the words through six dimensions, and established a standardized database of Chinese COVID-19-related terms. The words have good reliability and internal consistency. In addition, the validity was tested through the dot-detection task. Subjects with COVID-19 fear and those without COVID-19 fear showed a significant attentional bias toward COVID-19-related words Limitations: The initial sample size is small and the database application needs further development. Conclusions: The database of Chinese COVID-19 terms has good reliability, internal consistency, and reliability, and can be used as materials related to COVID-19-related research in the future.

  • Model Construction and Sample Size Planning for Mixed-Effects Location-Scale Models

    Subjects: Psychology >> Statistics in Psychology submitted time 2023-01-31

    Abstract: With the advancement of research depth in psychology and the development of data collection technics, interest in Mixed-Effects Location-Scale Models (MELSM) has increased drastically. When residual variances are heterogeneous, these models are able to add predictors in different levels, then help explore the relationship among traits and simultaneously investigate the inter- and intra-individual variability, as well as their explanatory variables. This study includes both simulated studies and empirical studies. In detail, the main contents of this project are: 1) Comparing and selecting candidate models based on Bayesian fit indices to construct MELSM; 2) Planning sample size according to both power analysis and accuracy in parameter estimation analysis for MELSM; 3) Extending the sample size planning method for MELSM to better frame the considerations of uncertainty; 4) Developing an R package for MELSM and illustrating the application of MELSM in empirical psychological studies. Based on the study, we hope these statistical models can be widely implemented. Moreover, the reproducibility and replicability of psychological studies will be enhanced finally.

  • Sequential Bayes Factor Analysis: Balance Informativeness and Efficiency in designing experiments

    Subjects: Psychology >> Statistics in Psychology submitted time 2022-12-31

    Abstract:

    The key of experimental design is to balance between informativeness and efficiency. However, power analysis only focuses on informativeness and is difficult to implement. Sequential Bayes Factor analysis takes the advantage of Bayes Factor‘s ability and reach a trade-off between informativeness and efficiency by setting Bayes Factor criteria and the sequential analysis during data collection. The present primer demonstrates how to perform three steps of sequential Bayes Factor analysis using open-source software JASP and R. This method considers practical issues in real research practices and is easy to implement, which can help researchers to design more efficient experiments.

  • The Status, Approach and Challenges of Artificial Intelligence-Empowered Psychological Research

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Statistics in Psychology Subjects: Psychology >> Other Disciplines of Psychology submitted time 2022-09-19

    Abstract: Human beings have entered the era of artificial intelligence (AI), and it is in urgent need of innovative data collection and processing methods to carry out increasingly complex psychological research. AI and related technology can help collect more ecologically valid, dynamic, diverse, and accurate data, and analyze massive and multi-modal data, which makes up for the deficiency of traditional methods. Therefore, incorporating AI is a major direction of the future development of psychological research. In the meantime, it is also important to not rely too much on an AI-based data-driven approach. The integration of top-down theory-driven and bottom-up data-driven approaches is also crucial in intelligent psychological research.

  • Exploring the longitudinal relations: Based on longitudinal models with cross-lagged structure

    Subjects: Psychology >> Statistics in Psychology submitted time 2022-08-08

    Abstract:基于交叉滞后结构的追踪模型对于揭示变量间纵向关系具有重要作用,也为因果关系的验证奠定了基础。交叉滞后面板模型在一定条件下可转换为其他形式的模型,如何选择适当的模型是重要的议题。本文对各模型进行概述,并从模型结构、预设轨迹、时间点要求等方面进行比较,最后通过一个实例说明如何选择适当的模型。结果表明,不同模型在变量关系的判断上可能给出很不同的结果,实际运用中应当有模型选择和模型比较的意识。

  • A standardized checklist on reporting meta-analysis in open science era

    Subjects: Psychology >> Statistics in Psychology submitted time 2022-07-30

    Abstract: Meta-analysis is a crucial tool for accumulating evidence in basic and applied research. In the open science era, meta-analysis becomes an important way for integrating open data from different sources. Meanwhile, because of the great researchers’ degree introduced by multiple-step and multiple-choices in each step of meta-analysis, the openness and transparency are crucial for reproducing results of meta-analysis. To (1) understand the transparency and openness of meta-analysis reports published in Chinese journals and (2) improve the transparency and openness of future meta-analysis by Chinese researchers, we developed a Chinese version of checklist for meta-analysis, which was based on the Preferred Reporting Items for Systematic Review and Meta-Analysis protocols (PRISMA) and the principle of openness and transparency, and then surveyed the methods and results of 68 meta-analysis papers in mainstream Chinese psychological journals in last five years. Our results revealed that openness and transparency of Chinese meta-analysis reports need to be improved, especially in the following aspects: the date/time and limitation of literature search, the details of screening and data collection, the flow chart of article screening, the details of effect size transformation, and the evaluation of individual research bias. The checklist we present, which lists almost all aspects that an open meta-analysis should include, can be used as a guide for future meta-analysis.

  • Evaluation of predictors’ relative importance: Methods and applications

    Subjects: Psychology >> Statistics in Psychology submitted time 2022-07-26

    Abstract: Evaluating predictors’ relative importance becomes increasingly important in the context of the explosion of high-dimensional data in psychological research. The key of relative importance analysis is to choose appropriate measures and inference approaches. Dominance analysis and relative weight are the recommended measures of relative importance among others. Bootstrap sampling is often used to infer the importance of a single variable or the difference between the importance of two variables. For three or more variables, Bayesian tests were recently developed to evaluate their importance orderings. Besides linear regression models, relative importance studies  have been extended to logistic regression models, structural equation models, and multilevel models. However, only continuous predictors are concerned in these models. Although relative importance analysis has been widely used in psychological studies, researchers may incorrectly select and interpret the importance measures. Therefore, a real data example is used to illustrate how the relative importance can be evaluated.

  • 认知诊断模型Q矩阵修正:完整信息矩阵的作用

    Subjects: Psychology >> Statistics in Psychology Subjects: Psychology >> Psychological Measurement submitted time 2022-07-15

    Abstract:

    A Q-matrix, which defines the relations between latent attributes and items, is a central building block of the cognitive diagnostic models (CDMs). In practice, a Q-matrix is usually specified subjectively by domain experts, which might contain some misspecifications. The misspecified Q-matrix could cause several serious problems, such as inaccurate model parameters and erroneous attribute profile classifications. Several Q-matrix validation methods have been developed in the literature, such as the G-DINA discrimination index (GDI), Wald test based on an incomplete information matrix (Wald-IC), and Hull methods. Although these methods have shown promising results on Q-matrix recovery rate (QRR) and true positive rate (TPR), a common drawback of these methods is that they obtain poor results on true negative rate (TNR). It is important to note that the worse performance of the Wald-IC method on TNR might be caused by the incorrect computation of the information matrix.

    A new Q-matrix validation method is proposed in this paper that constructs a Wald test with a complete empirical cross-product information matrix (XPD). A simulation study was conducted to evaluate the performance of the Wald-XPD method and compare it with GDI, Wald-IC, and Hull methods. Five factors that may influence the performance of Q-matrix validation were manipulated. Attribute patterns were generated following either a uniform distribution or a higher-order distribution. The misspecification rate was set to two levels: $QM\text{=}0.15$and$QM\text{=}0.3$. Two sample sizes were manipulated: 500 and 1000. The three levels of IQ were defined as high IQ, ${{P}_{j}}\left( 0 \right)\sim U(0,0.2)$and${{P}_{j}}\left( 1 \right)\sim U(0.8,1)$; medium IQ, ${{P}_{j}}\left( 0 \right)\sim U(0.1,0.3)$ and ${{P}_{j}}\left( 1 \right)\sim U(0.7,0.9)$; and low IQ, ${{P}_{j}}\left( 0 \right)\sim U(0.2,0.4)$ and ${{P}_{j}}\left( 1 \right)\sim U(0.6,0.8)$. The number of attributes was fixed at $K\text{=}4$. Two ratios of the number of items to attribute were considered in the study: $J=16$$\left[ (K\text{=}4)\times (JK\text{=}4) \right]$ and $J=32$$\left[ (K\text{=}4)\times (JK\text{=}8) \right]$.

    The simulation results showed the following.

    (1) The Wald-XPD method always provided the best results or was close to the best-performing method across the different factor levels, especially in the terms of the TNR. The HullP and Wald-IC methods produced larger values of QRR and TPR but smaller values of TNR. A similar pattern was observed between HullP and HullR, with HullP being better than HullR. Among the Q-matrix validation methods considered in this study, the GDI method was the worst performer.

    (2) The results from the comparison of the HullP, Wald-IC, and Wald-XPD methods suggested that the Wald-XPD method is more preferred for Q-matrix validation. Even though the HullP and Wald-IC methods could provide higher TPR values when the conditions were particularly unfavorable (e.g., low item quality, short test length, and low sample size), they obtain very low TNR values. The practical application of the Wald-XPD method was illustrated using real data.

    In conclusion, the Wald-XPD method has excellent power to detect and correct misspecified q-entry. In addition, it is a generic method that can serve as an important complement to domain experts’ judgement, which could reduce their workload.

  • Multiverse-style analysis: Introduction and application

    Subjects: Psychology >> Statistics in Psychology submitted time 2022-07-09

    Abstract:

    Selective analysis and selective report are one of the main triggers of the replicability crisis in psychological science. In recent years, researchers have proposed a new method—multiverse-style analysis, which includes multiple data analytic decisions to reduce the subjective selectiveness and arbitrariness and performs robustness to increase the reliability of results. This manuscript introduces the multiverse-style analysis and related steps by using the example of exploring the relationship between smartphone use and smartphone stress. The multiverse-style analysis method has been applied in fields such as psychology and cognitive neuroscience. Future research should continue to develop and improve the statistic inference of multiverse-style analysis, so that it can be applied to more sorts of data and broader research fields.

  • Research on Person-fit in Cognitive Diagnostic Assessment

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Statistics in Psychology Subjects: Psychology >> Educational Psychology submitted time 2022-05-12

    Abstract:

    Cognitive Diagnostic Assessment (CDA) has been widely used in educational assessment. It can provide guidance for further study and teaching by analyzing whether the test-takers have acquired knowledge points or skills.

    In psychometrics, statistical methods for assessing the fit of an examinee’s item responses to a postulated psychometric model are often called person-fit statistic. The person-fit analysis can help to verify the individual diagnostic results, and is mainly used to distinguish the abnormal examinees from the normal ones. The abnormal response patterns include “sleeping” behavior, fatigue, cheating, creative responding, random guessing responses and cheating with randomness, and all of these abnormal response patterns can affect the deviation of examinee’s ability estimation. The person-fit analysis can help researchers identify the abnormal response patterns more accurately, so as to delete the abnormal responding examinees and improve the validity of the test. In the past, most of the person fit researches were mainly carried out under the Item Response Theory (IRT) framework, while only few papers have been published dealing with person-fit under the CDM framework. This study attempts to fill a gap in the literature by introducing new methods. In this study, a new person fit index (R) was proposed.

    In order to verify the validity of the newly developed person fit index, this study explores the type I error and statistical test power of R index under different item length, item discrimination and different misfit types of respondent, and compares it with existing methods RCI and lz . Type I error rate was defined as the proportion of flagged abnormal response patterns by a person fit statistic out of 1,000 generated normal response patterns from the DINA model. The control variables of this study include: the number of subjects is controlled to 1000, the cognitive diagnosis model is chosen as DINA model, the attributes are 6, and the Q matrix is fixed. Finally, in order to reflect the value of person fit index in practical application, the R index is applied to the empirical data of fractional subtraction.

    The results show that the type I error of R index is reasonable and stable at 0.05. In the aspect of statistical test power, with the improvement of item differentiation, the statistical test power of each index in different abnormal examinees is improved. With the increase in the number of items, most of the statistical power show an upward trend. For different types of abnormal subjects, R index perform best in the cases of random guessing responses and cheating with randomness. In the case of fatigue, sleep, and creative responding, the lz  index perform better. In the empirical data study, the detection rate of abnormal examinees is 4.29%.

    With the increase of the discrimination of items and the increase of the number of items, the power of R index has improved, and the performance of R index is the most robust when the discrimination of item is low. The R index has a high power for the types of abnormal behavior such as creative responding behavior, random guessing responses and cheating with randomness.

    "

  • Detection of aberrant response patterns using a residual-based statistic in testing with polytomous items

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Statistics in Psychology submitted time 2022-04-06

    Abstract:本文提出一种多级计分项目下的个人拟合统计量R ,考察它在检测6种常见的异常作答模式(作弊、猜测、随机、粗心、创新作答、混合异常)下的表现,并与标准化对数似然统计量lzp 进行比较。结果表明:(1) 在异常作答覆盖率较低并且异常作答类型为作弊和猜测时,R 的检测率显著高于lzp ;(2) 随着测验长度和被试异常程度的增加,两种统计量的检测率都会上升;(3) 在一些条件下,R 与lzp 检测效果接近。实证数据分析进一步展示了R 统计量的使用方法和过程,结果也表明R 统计量具有较好的应用前景。