Submitted Date
Subjects
Authors
Institution
Your conditions: Psychological Measurement
  • A Profile-Perspective on Daily-life Multi-Situational Individual Differences Assessment

    Subjects: Psychology >> Psychological Measurement submitted time 2024-08-08

    Abstract: As the theories and methods of psychometrics evolve, the situation-dependence of individual differences is receiving increasing attention. To achieve comprehensive and accurate measurement of individual differences and to promote the optimal development of individuals and society, researchers have in recent years placed greater emphasis on measuring individual states in a rich array of real-life daily situations and constructing computational models of individual differences. This approach aims to provide a more comprehensive and objective measurement of individual differences compared to traditional laboratory settings. The technological advancements represented by intelligent sensors and wearable devices have made the measurement of individual differences in daily life more convenient and efficient. This has propelled new progress in daily multi-situational research on individual differences, encompassing subjective reports, behavioral performances, and physiological responses. It has also led to the development of an individual-centered profile perspective for analyzing high-dimensional data from daily multi-situational studies. Future research should focus on the integration of daily multi-situation measurement with profile perspective analysis methods to deepen the understanding of the mechanisms of individual differences and to advance the theory and development of individual differences.

  • Development of the Childbearing Motivation Questionnaire and Test of its Reliability and Validity

    Subjects: Psychology >> Psychological Measurement submitted time 2024-07-24

    Abstract: Fertility motivation is the reason why individuals want or don’t want children.To date, well developed scale for assessing fertility motivation is lacking in China. Based on the systematic review of fertility motivation measurement tools in China and at abroad, this study comprehensively used qualitative and quantitative research methods to develop fertility motivation scales. A stratified sampling method was used to investigate 2000 people of childbearing age online to test the reliability and validity of the newly developed scale. Due to the different brain pathways of positive and negative emotions, positive and negative childbearing motivations are scored separately, and factor analysis found that both positive and negative childbearing motivation scales have two dimensions (pursuit and desire/custom and norm; emotional and social/somatic and physical). The tests result of the internal consistency, construct validity, criterion validity, and content validity suggest that the scale has satisfactory psychometric properties, and are suitable for measuring childbearing motivation of the reproductive age population in China.

  • From behavior domain to behavior attribute: Issues and suggestions in measuring pro-environmental behavior

    Subjects: Psychology >> Social Psychology Subjects: Psychology >> Psychological Measurement submitted time 2024-07-22

    Abstract: Existing research has developed a plethora of tools for measuring pro-environmental behavior, encompassing scales, individual behavioral paradigms, and group games. These tools predominantly hinge upon the behavioral domain, with the most frequently mentioned areas being conservation, transportation, waste disposal, consumption, and social citizenship behaviors (e.g., donation). However, current measurements of pro-environmental behavior face challenges related to low standardization and limited generalizability of results. These issues primarily stem from the prevailing reliance on measurement paradigms grounded in behavioral domains, neglecting the crucial consideration of behavioral attributes, which constitute the defining and distinguishing characteristics of behavior. Future research could address these by developing standardized measurement tools based on behavioral attributes and validating them through the selection of various real-life behaviors as criteria for validity testing.

  • Application of machine learning methods in test security

    Subjects: Psychology >> Psychological Measurement submitted time 2024-07-19

    Abstract: The post hoc detection of test security has traditionally relied on statistics, but emerging machine learning methods offer enhanced detection performance. To advance the field of test security, we proposed a review of the research literature, categorizing the methods into three major categories: supervised learning, unsupervised learning, and semi-supervised learning. Each of these major categories was further subdivided into three subcategories: ensemble learning, deep learning, and transfer learning. The study elucidated the distinctive attributes of diverse machine learning methodologies, provided practical recommendations for data acquisition and processing, and outlined strategies for input feature selection. Finally, prospective avenues for future research were identified, including machine learning-based person-fit research, machine learning test security research utilizing multimodal data, test security research employing generative adversarial networks, and the interpretability of research results.

  • Design of the polytomous simplest complete Q matrix based on the reachability matrix

    Subjects: Psychology >> Psychological Measurement submitted time 2024-06-14

    Abstract: The identifiability of cognitive diagnosis models relies heavily on the completeness of the Q matrix. However, existing test designs primarily focus on dichotomously-scored items, neglecting the importance of polytomous cognitive diagnostic test design. Moreover, this limitation poses a significant obstacle to the advancement of cognitive diagnosis. To bridge this gap, this paper aimed to introduce novel designs for the construction of polytomous structured and unstructured simplest complete Q matrices (SSCQM/USCQM). Our proposed approach considered all ideal response patterns (IRPs) of knowledge states (KSs) on the reachability matrix as research objects, with the objective of minimizing the number of columns selected from the reachability matrix. This ensured one-to-one correspondence between the set of KSs and the set of IRPs, thereby enhancing the completeness of the SSCQM. Additionally, we derived a polytomous USCQM by considering the relationship between the SSCQM and the sub-matrix of the corresponding identity matrix while ensuring that each row contains at least one 1 . Interestingly, the construction process revealed that there were more USCQMs than SSCQMs. This innovative approach expanded the possibilities for polytomous cognitive diagnostic test design.
    This study focused on the design and evaluation of cognitive diagnostic tests using polytomous structured and unstructured Q matrices (SSCQM/USCQM). We conducted two studies to comprehensively examine the influence of factors such as the number of attributes, attribute hierarchies, and item parameters on the precision of the SSCQM, USCQM, and reachability matrix. In the first study, variations in attribute structures and item parameter values were investigated to understand their impact on Q matrix accuracy. On the other hand, the second study explored the effects of attribute hierarchies and the number of attributes on the precision of the SSCQM, USCQM, and reachability matrix.
    Both simulation studies and actual measurement data were utilized to assess the robustness and efficacy of the two methods. Firstly, the simulation results revealed several key findings. Notably, increasing the number of SSCQMs or USCQMs positively influenced the accuracy of the results. In the context of long tests, the USCQM demonstrated higher Pattern Match Ratio (PMR) and Marginal Match Ratio (MMR) compared to the SSCQM and the reachability matrix. This trend was particularly evident when there was an increase in item parameters, attribute numbers, or a change in attribute hierarchy. However, it is noteworthy that, regardless of these various factors, the PMR and MMR of the three tests exhibited minimal differences. On the other hand, in short tests with good item quality, the SSCQM achieved the best performance compared to other methods. This highlights the importance of considering specific test characteristics and item quality when selecting the appropriate Q matrix type. These findings provide valuable insights into the factors that influence the precision of Q matrices. They emphasize the benefits of increasing the number of matrices, understanding the impact of item parameters, and recognizing the performance disparities among different matrix types. Obtaining a comprehensive understanding of these relationships is vital for optimizing the design and implementation of cognitive diagnostic testing, ultimately guaranteeing accurate assessments of individual knowledge states. Secondly, analysis of the actual measurement data showed high identification repetition rates for the SSCQM and the reachability matrix, with a minimal difference in attribute mastery ratio.
     In summary, both the SSCQM and the USCQM demonstrate adequate performance when compared to other Q matrices under similar conditions. These findings emphasize the significance of prioritizing completeness in cognitive diagnostic testing. This research seeks to contribute to the advancement of cognitive diagnosis by addressing the limitations of existing test designs and introducing new techniques for constructing polytomous Q matrices. In addition, the findings presented in this paper offer valuable insights for researchers and practitioners seeking to design high-quality cognitive diagnostic tests that accurately assess individual knowledge states.

  • Key Action Encoding Incorporating Misconceptions and Its Application in Diagnostic Classification Analysis of Process Data 「open review」

    Subjects: Psychology >> Psychological Measurement submitted time 2024-04-27

    Abstract: Process data encompasses the human-computer interaction data captured in computer-based learning and assessment systems, reflecting participants’ problem-solving processes. Among various types of process data, action sequences stand out as a quintessential type, delineating participants’ step-by-step problem-solving processes. However, the non-standardized format of action sequences, characterized by varying data lengths among participants, presents challenges for the direct application of traditional psychometric models like diagnostic classification models (DCM). Extending psychometric models applicable to standardized structured data to process data analysis often necessitates key-action encoding – determining if each participant’s data contains essential problem-solving actions and encoding them (e.g., “1” for contains and “0” for does not contain ). Zhan and Qiao (2022) proposed a key-action encoding method facilitating the application of DCM to process data analysis for identifying participants’ mastery of problem-solving skills. Nevertheless, their approach overlooks the adverse impact of misconceptions on problem-solving. To this end, this study introduces a key-action encoding approach incorporating misconceptions and explores its utility in diagnostic classification analysis of process data. This new encoding method integrates both problem-solving skills and misconceptions, extending Zhan and Qiao’s (2022) approach.
    An illustrative example is provided to compare the performance of the proposed encoding approach with Zhan and Qiao’s (2022) approach using a real-world interactive assessment item, “Tickets,” from PISA 2012. For the proposed approach, eight attributes (four problem-solving skills and four misconceptions) and 28 phantom items (i.e., key actions) were defined based on the scoring rule and assessment framework of the interactive assessment item. In contrast, Zhan and Qiao’s approach defined four attributes (problem-solving skills) and 10 phantom items. Four DCMs – DINA, DINO, ACDM, and GDINA models – were employed for data analysis. The relative fit metrics for model-data comparison were selected from AIC, BIC, CAIC, and SABIC. Additionally, a chi-square test was employed to evaluate whether there existed a significant difference in the fit to the data between GDINA and each of the constrained models. For assessing absolute fit between the model and the data, the SRMSR metric was utilized. Moreover, item quality was evaluated using the item differentiation index (IDI), while classification reliability was determined by calculating the classification accuracy index.
    The findings reveal that: (1) considering both problem-solving skills and misconceptions enables more nuanced participant classification, facilitating identification of specific factors influencing problem-solving success and failure and offering targeted remedial suggestions for personalized instruction; (2) the introduction of misconceptions slightly enhances diagnostic classification reliability; (3) a moderate-to-high negative correlation exists between participants’ mastery of misconceptions and raw scores, indicating misconceptions diminish students’ overall problem-solving performance.
    In summary, this study proposes a key-action encoding approach incorporating misconceptions and explores its application in diagnostic classification analysis of process data, specifically action sequences. The proposed approach aids researchers in pinpointing specific factors influencing problem-solving outcomes and provides methodological support for targeted interventions. To enhance participants’ problem-solving performance, beyond improving their skills, addressing misconceptions’ adverse effects merits consideration.

    Peer Review Status: Dispute NO Commenting
  • Development and Validation of the Susceptibility to PUA Personality Traits Scale and the Characteristics Manifestation Scale of PUA Relationships

    Subjects: Psychology >> Psychological Measurement submitted time 2024-03-25

    Abstract: Objective: To explore the relationship between personal characteristics and the possibility of receiving PUA in the context of Chinese culture, compile a personal special quality table and the basic characteristic scale of PUA relations suitable for people who are susceptible to PUA in the context of Chinese culture, and test their credibility and validity. Methods: The initial questionnaire is formed by combining literature retrieval, theoretical model construction and questionnaire survey; 1,188 adults were selected as the subjects in the PUA Personal Quality Table, and 1,188 adults who had experienced or were experiencing intimate relationships in the PUA Relationship Performance Characteristic Table were selected as the subjects. The trial questionnaire carried out project analysis and exploratory factor analysis; both questionnaires carried out verification factor analysis and credibility test. Results: The scale is vulnerable to PUA personal special quality table contains 4 dimensions, a total of 20 items. The fitting index of the factor structure model is good, RMSEA=0.060, CFI=0.937, IFI=0.937, TLI=0.924, SRMR=0.04 2; The performance characteristic scale of the two PUA relationship contains 6 dimensions, with a total of 29 items. RMSEA=0.053, CFI=0.925, TLI=0.919, GFI=0.913, SRMR=0.059. The internal consistency between the total scale of scale 1 and each dimension is between 0.779-0.909, and the internal consistency between the total scale of scale II and each dimension is between 0.897-0.970. Conclusion: The credibility and validity of the PUA personal special quality scale and the PUA relationship performance characteristic scale are good, and can be used as one of the measurement tools for the study of personal characteristics and the possibility of PUA in the context of Chinese culture.

  • Core Items Selection and Psychometric Properties of the Adult Attention-Deficit Hyperactivity Disorder Self-Report Scale-Chinese Short Version (ASRS-CSV)

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Clinical and Counseling Psychology Subjects: Medicine, Pharmacy >> Clinical Medicine submitted time 2024-03-12

    Abstract: Objective: This study aimed to develop and validate the Chinese Short Version of the Adult ADHD Self-Report Scale (ASRS-CSV), addressing the need for culturally appropriate diagnostic tools for Attention-Deficit Hyperactivity Disorder (ADHD) in the Chinese adult population.
    Methods: Utilizing a combination of intergroup difference analysis, factor analysis, and network analysis, we identified core ADHD symptoms pertinent to the Chinese cultural context. The study involved two samples: a vocational and technical school sample (N=1144) and an internet sample (N=1654), comprising adults aged 16-25 years. Reliability, validity, and diagnostic efficacy of the ASRS-CSV were assessed through psychometric testing.
    Results: The ASRS-CSV demonstrated high internal consistency (Cronbach’s alpha > 0.9) and robust convergent validity (AVE > 0.7). The scale’s diagnostic cutoff points were optimized, revealing high sensitivity and specificity for ADHD screening. Cross-cultural analysis highlighted differences in core ADHD symptoms between Chinese and Western populations, underscoring the scale’s cultural sensitivity.
    Conclusion: The ASRS-CSV is a reliable, valid, and efficient tool for screening ADHD in Chinese adults, reflecting the socio-cultural nuances of ADHD symptomatology. Its development marks a significant advancement in the field of psychiatry, offering a tailored approach for ADHD assessment in China and contributing to the global discourse on cross-cultural psychiatric diagnosis.

  • Statistical power analysis of event-related potential studies: methods and influencing factors

    Subjects: Psychology >> Experimental Psychology Subjects: Psychology >> Psychological Measurement submitted time 2024-03-04

    Abstract: Statistical power is one of the key indicators for assessing the robustness and replicability of research results. However, the standardization and completeness of calculating and reporting statistical power in event-related potential studies still need improvement. This paper aims to provide researchers with references for calculating and reporting statistical power during the design or preregistration of research protocols at various stages of event-related potential studies by summarizing the influencing factors, methods, and application examples of statistical power in such studies.

  • Exploration of Computerized Adaptive Item Bank Development for Emotional Stability Based on ChatGPT

    Subjects: Psychology >> Psychological Measurement submitted time 2024-02-01

    Abstract: To obtain a high-quality large-scale item bank, the extensive manpower and resources required for traditional project development have been constraining the development and application of computerized adaptive testing. However, the automatic item generation, based on the latest natural language processing technology holds promise in addressing this challenge. With the advancements in generative pre-trained models based on the Transformer architecture, the generation of items tailored to specific measurement objectives (especially non-cognitive tasks) becomes feasible. This study aimed to utilize ChatGPT to generate a large number of Chinese version personality items measuring emotional stability and to establish a computerized adaptive item bank based on this premise.
    We utilized ChatGPT based on GPT-4 Turbo to generate 114 items measuring emotional stability. Following expert review, 75 items were retained and formed the GPT item bank, while 42 widely-used items were selected to form the classic item bank. Testing was conducted on the aforementioned items, yielding 479 valid participants. Additionally, sample data from two separately administered measures, CBF-PI-B and BFI-2, were going to be used for subsequent cross-sample reliability comparisons. Procedures for item bank construction including unidimensionality test, IRT model selection, item analysis, and item bank quality analysis, as well as simulated computerized adaptive testing, were employed to assess the quality and CAT performance of the item bank.
    After the above analysis steps, it was found that all items in the classic item bank and the GPT item bank passed the unidimensionality test, showing no differential item functioning, and had good discrimination parameters and reasonable difficulty distribution. Both item banks provided high test information and marginal reliability for most trait levels of the examinees, with low measurement error. The overall item bank formed by combining all items remained of good quality. Simulation results of computerized adaptive testing showed that all three item banks achieved high validity with fewer items compared to traditional tests for the same level of precision. Under the same testing length, GPT item bank exhibited higher reliability and demonstrated stability across samples. Additionally, comparison revealed that the CAT performance of the GPT item bank even exceeded that of the classic item bank, while the overall item bank performance was slightly better than that of the GPT item bank.
    This study innovatively explores the development of a computerized adaptive item bank using the latest version of ChatGPT, validating the feasibility of this user-friendly project generation tool. Through comparison with previous research results, it reconfirms the excellent quality of projects generated by GPT-4. The study showcases the immense potential and possibilities of large language models in project development, particularly in the creation of large-scale item banks, while also indicating at a shift in the responsibilities of psychologists in future project development.

  • Psychometric Properties of Multidimensional State Anxiety Scale for College Students (MSAS-CS): Based on Factor Analysis and Network Analysis

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Clinical and Counseling Psychology submitted time 2024-01-14

    Abstract: Based on the State-Trait Anxiety Theory and the Psychopathological Network Theory, we developed the Multidimensional Anxiety Experience Scale for college students. This study conducted item analysis, factor analysis, network analysis, validity and reliability testing, as well as gender invariance testing. The results indicate that: (1) The Multidimensional Anxiety Experience Scale for college students consists of 27 items, organized into seven dimensions: Social Communication Anxiety (SCA), Learning Anxiety (LA), Family Relationship Anxiety (FRA), Future Anxiety (FA), Gender Norms Anxiety (GNA), Appearance Anxiety (AA), and Economic Anxiety (EA). (2) The scale demonstrates a reasonable factor network structure, good validity and reliability, and gender invariance, thus effectively measuring the level of state anxiety in Chinese college student.

  • Automated Scoring of Open-ended Situational Judgment Tests

    Subjects: Psychology >> Psychological Measurement submitted time 2023-12-21

    Abstract:     Situational Judgment Tests (SJTs) have gained popularity for their unique testing content and high face validity. However, traditional SJT formats, particularly those employing multiple-choice (MC) options, have encountered scrutiny due to their susceptibility to test-taking strategies. In contrast, open-ended and constructed response (CR) formats present a propitious means to address this issue. Nevertheless, their extensive adoption encounters hurdles primarily stemming from the financial implications associated with manual scoring. In response to this challenge, we propose an open-ended SJT employing a written-constructed response format for the assessment of teacher competency. This study established a scoring framework leveraging natural language processing (NLP) technology to automate the assessment of response texts, subsequently subjecting the system's validity to rigorous evaluation. The study constructed a comprehensive teacher competency model encompassing four distinct dimensions: student-oriented, problem-solving, emotional intelligence, and achievement motivation. Additionally, an open-ended situational judgment test was developed to gauge teachers' aptitude in addressing typical teaching dilemmas. A dataset comprising responses from 627 primary and secondary school teachers was  collected, with manual scoring based on predefined criteria applied to 6,000 response texts from 300 participants. To expedite the scoring process, supervised learning strategies were employed, facilitating the categorization of responses at both the document and sentence levels. Various deep learning models, including the convolutional neural network (CNN), recurrent neural network (RNN), long short-term memory (LSTM), C-LSTM, RNN+attention, and LSTM+attention, were implemented and subsequently compared, thereby assessing the concordance between human and machine scoring. The validity of automatic scoring was also verified.
        This study reveals that the open-ended situational judgment test exhibited an impressive Cronbach's alpha coefficient of 0.91 and demonstrated a good fit in the validation factor analysis through the use of Mplus. Criterion-related validity was assessed, revealing significant correlations between test results and various educational facets, including instructional design, classroom evaluation, homework design, job satisfaction, and teaching philosophy. Among the diverse machine scoring models evaluated, CNNs have emerged as the top-performing model, boasting a scoring accuracy ranging from 70% to 88%, coupled with a remarkable degree of consistency with expert scores (r= 0.95, QWK=0.82). The correlation coefficients between human and computer ratings for the four dimensions—student-oriented, problem-solving, emotional intelligence, and achievement motivation—approximated 0.9. Furthermore, the model showcased an elevated level of predictive accuracy when applied to new text datasets, serving as compelling evidence of its robust generalization capabilities.
        This study ventured into the realm of automated scoring for open-ended situational judgment tests, employing rigorous psychometric methodologies. To affirm its validity, the study concentrated on a specific facet: the evaluation of teacher competency traits. Fine-grained scoring guidelines were formulated, and state-of-the-art NLP techniques were used for text feature recognition and classification. The primary findings of this investigation can be summarized as follows: (1) Open-ended SJTs can establish precise scoring criteria grounded in crucial behavioral response elements; (2) Sentence-level text classification outperforms document-level classification, with CNNs exhibiting remarkable accuracy in response categorization; and (3) The scoring model consistently delivers robust performance and demonstrates a remarkable degree of alignment with human scoring, thereby hinting at its potential to partially supplant manual scoring procedures.
     

  • Estimating test reliability of intensive longitudinal studies: Perspectives on multilevel structure and dynamic nature

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Statistics in Psychology submitted time 2023-11-28

    Abstract: With the widespread use of intensive longitudinal studies in psychology and other social sciences, reliability estimation of tests in intensive longitudinal studies has received increasing attention. Earlier reliability estimation methods drawn from cross-sectional studies or based on generalizability theory have many limitations and are not applicable to intensive longitudinal studies. Considering the two main characteristics of intensive longitudinal data, multilevel structure and dynamic nature, the reliability of tests in intensive longitudinal studies can be estimated based on multilevel confirmatory factor analysis, dynamic factor analysis, and dynamic structural equation models. The main features and applicable contexts of these three reliability estimation methods are demonstrated with empirical data. Future research could explore the reliability estimation methods based on other models, and should also pay more attention to the testing and reporting of test reliability in intensive longitudinal studies.

  • Development of Online Calibration Method Based on SCAD Penalty and EM Perspective in CD-CAT: a study based on the G-DINA model

    Subjects: Psychology >> Psychological Measurement submitted time 2023-11-22

    Abstract: Cognitive diagnostic computerized adaptive testing (CD-CAT) provides a detailed diagnosis of an examinee’s strengths and weaknesses in the content measured in a timely and accurate manner, which can be used as a reference for further study or remediation planning, thus meeting the practical need for efficient and detailed test results. The successful implementation of CD-CAT is based on an item bank, but its maintenance is a very challenging task. A psychometrically popular choice for maintaining an item bank is online calibration. Currently, the research on online calibration methods in the CD-CAT that can calibrate Q-matrix and item parameters simultaneously is very weak. The existing methods are basically developed based on the deterministic input, noisy and gate (DINA) model. Compared with the DINA model, the generalized DINA (G-DINA) model has been more widely applied because it is less restrictive and can meet the requirements of a large number of test data in psychological and educational assessment. Therefore, if the online calibration method that jointly calibrates the Q-matrix and item parameters can be developed for models with few constraints such as G-DINA, its meaning is understood without explanation.
    In current study, a new online calibration method, SCADOCM, was proposed, which was suitable for the G-DINA model. The construction of SCADOCM was based on the smoothly clipped absolute deviation penalty (SCAD) and marginalized maximum likelihood estimation (MMLE/EM) algorithm. For the new item j, the log-likelihood function with SCAD can be formulated based on the examinees’ responses in this item and the examinees’ attribute marginal mastery probability, and the q-vector of the new item can be estimated by the q-vector estimator based on SCAD. Then, the EM algorithm was used to estimate the item parameter of the new item j based on the posterior distributions of examinees’ attribute patterns, the examinees’ responses to new item j and the estimated q-vector.  
    To examine the performance of the proposed SCADOCM and compare it with the SIE method, two simulation studies (Study 1 and Study 2) are conducted. Study 1 is based on a simulated item bank while Study 2 is based on the real item bank (Internet addiction item bank; Shi, 2017). In these simulation studies, four factors were manipulated: the calibration sample size (nj = 50 vs. 100 vs. 500 vs. 1000 vs. 2000), the distribution of the attribute pattern (uniform distribution vs. high-order distribution vs. normal distribution), the item quality (U (0.05, 0.15) vs. U (0.1, 0.3)), and the online calibration methods (SCADOCM vs. SIE). The results showed that (1) SCADOCM has satisfactory calibration accuracy and calibration efficiency, and is superior to the SIE method. In addition, the traditional SIE method is not applicable for the G-DINA model, and its Q-matrix estimation accuracy rate is low under all experimental conditions. (2) The item calibration accuracy of SCADOCM and SIE increases with the increase of calibration sample and item quality under most conditions, and its item calibration accuracy in the uniform distribution/higher-order distribution is greater than that in the normal distribution. (3) The calibration efficiency of SCADOCM decreases with the increase of calibration samples, but it is less affected by the item quality and the attribute pattern distribution; the calibration efficiency of SIE decreases with the increase of calibration samples, but it is less affected by the item quality. Moreover, the calibration efficiency of the SIE method in the normal distribution is slightly slower than that of uniform distribution/high-order distribution.
    To sum up the results, this study demonstrated that the SCADOCM has higher item calibration accuracy and calibration efficiency, and outperforms the SIE method; meanwhile, the traditional SIE method is not suitable for G-DINA model. All in all, this study provides an efficient and accurate method for item calibration in CD-CAT, and provides important support for further promoting the application of CD-CAT in practice.

  • Cognitive Diagnostic Assessment Based on Signal Detection Theory: Modeling and Application

    Subjects: Psychology >> Psychological Measurement submitted time 2023-11-13

    Abstract: Cognitive diagnostic assessment (CDA) is aimed at diagnose which skills or attributes examinees have or do not have as the name expressed. This technique provides more useful feedback to examinees than a simple overall score got from classical test theory or item response theory. In CDA, multiple-choice (MC) is one of popular item types, which have the superiority on high test reliability, being easy to review, and scoring quickly and objectively. Traditionally, several cognitive diagnostic models (CDMs) have been developed to analyze the MC data by including the potential diagnostic information contained in the distractors.
    However, the response to MC items can be viewed as the process of extracting signals (correct options) from noises (distractors). Examinees are supposed to have perceptions of the plausibility of each options, and they make the decision based on the most plausible option. Meanwhile, there are two different states when examinee response to items: knows or does not know each item. Thus, the signal detection theory can be integrated into CDM to deal with MC data in CDA. The cognitive diagnostic model based on signal detection theory (SDT-CDM) is proposed in this paper and has several advantages over traditional CDMs. Firstly, it does not require the coding of q-vector for each option. Secondly, it provides discrimination and difficulty parameters that traditional CDMs cannot provide. Thirdly, it can directly express the relative differences between each options by plausibility parameters, providing a more comprehensive characterization of item quality.
    The results of two simulation studies showed that (1) the marginal maximum likelihood estimation approach via Expectation Maximization (MMLE/EM) algorithm could effectively estimate the model parameters of the SDT-CDM. (2) the SDT-CDM had high classification accuracy and parameter estimation precision, and could provide option-level information for item quality diagnosis. (3) independent variables such as the number of attributes, item quality, and sample size affected the performance of the SDT-CDM, but the overall results were promising. (4) compared with the nominal response diagnostic model (NRDM), the SDT-CDM was more accurate in classifying examinees under all data conditions.
    Further, an empirical study on the TIMSS 2011 mathematics assessment were conducted using both the SDT-CDM and the NRDM to inspect the ecological validity for the new model. The results showed that the SDT-CDM had better fitting and a smaller number of model parameters than the NRDM. The difficulty parameters of the SDT-CDM were significantly correlated with those of the two- (three-) parameter logical models. And the same was true of the discrimination parameters for the SDT-CDM. However, the correlation between the discrimination parameters of the NRDM and those of the two- (three-) parameter logical models was low and not significant. Besides, the classification accuracy and classification consistency of the SDT-CDM were higher than those of the NRDM. All the results indicated that the SDT-CDM was worth promoting.

  • Reliability and validity of the Chinese version of the mobile Agnew Relationship Measure (mARM-C)

    Subjects: Psychology >> Applied Psychology Subjects: Psychology >> Psychological Measurement submitted time 2023-05-27

    Abstract: In order to assess the reliability and validity of the Chinese version of the mobile Agnew Relationship Measure (mARM-C), 574 university students who had recently used meditation apps were recruited to complete both the mARM-C and criterion measures. After two weeks, a subset of 102 of these participants were retested. The exploratory factor analysis and network analysis results revealed that the mARM-C comprised 19 items across five factors. Further confirmatory factor analysis demonstrated that the five-factor model was a good fit, and the questionnaire exhibited satisfactory criterion-related validity, convergent validity, discriminant validity, and good internal consistency reliability, which met the criteria for psychological measurement standards. These results indicate that the mARM-C is a reliable and valid instrument, capable of measuring the digital therapeutic alliance between users and programs in internet-based self-help interventions.

  • On the reliability of point estimation of model parameter: taking the CDMs as an example

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Statistics in Psychology submitted time 2023-05-11

    Abstract: Cognitive diagnostic models (CDMs) are psychometric models which have received increasing attention within the field of psychological, educational, social, biological, and many other disciplines. It has been argued that an inappropriate convergence criterion for MLE-EM (maximum likelihood estimation using the expectation maximization) algorithm could result in unpredictably distorted model parameter estimates, and thus may yield unstable and misleading conclusions drawn from the fitted CDMs. Although several convergence criteria have been developed, it remains an unexplored question, how to specify the appropriate convergence criterion for the fitted CDMs.
    A comprehensive method for assessing convergence is proposed in this study. To minimize the impact by the model parameter estimation framework, a new framework adopting the multiple starting values strategy mCDM is introduced. To examine the performance of the convergence criterion for MLE-EM in CDMs, a simulation study under various conditions was conducted. Five convergence assessment methods were examined: the maximum absolute change in model parameters, the maximum absolute change in item endorsement probabilities and structural parameters, the absolute change in log-likelihood, the relative log-likelihood, and the comprehensive method. The data generating models were the saturated CDM and the hierarchical CDM. The number of items was set to J = 16 and 32. Three levels of sample sizes were considered: 500, 1000, and 4000. Three convergence tolerance value conditions were: 10-4 , 10-6 , and 10-8 . The simulated response data were fitted by the saturated CDM using the mCDM and the R package GDINA. And the maximum number of iterations was set to 50000.
    Simulation results suggest that:
    (1) The saturated CDM converged under all conditions. However, the actual number of iterations exceeded 30000 under some conditions, which implies that when predefined maximum iteration number is less than 30000, the MLE-EM algorithm might mistakenly stop.
    (2) The model parameter estimation framework affected the performance of the convergence criteria. The performance of the convergence criteria under the mCDM framework was comparable or superior to that of the GDINA framework.
    (3) Regarding the convergence tolerance values considered in this study, 10-8  consistently had the best performance in providing the maximum value of the log-likelihood and 10-4  had the worst as suggested by the higher log-likelihood value. Compared to all other convergence assessment methods, the comprehensive method in general had the best performance, especially under the mCDM framework. The performance of the maximum absolute change in model parameters was similar to the comprehensive method, however, its good performance was not guaranteed. On the contrary, the relative log-likelihood had the worst performance under the mCDM or GDINA framework.
    The simulation results showed that, the most appropriate convergence criterion for MLE-EM in CDMs was the comprehensive method with tolerance 10-8  under the mCDM framework. Results from the real data analysis also demonstrated the good performance of the proposed comprehensive method and mCDM framework.
     

  • The Measurement and Influence of Colleges’ Academic Involution

    Subjects: Psychology >> Social Psychology Subjects: Psychology >> Psychological Measurement submitted time 2023-05-04

    Abstract: Academic involution may harm the cultivation and development of college students, but there has not been a reliable measurement tool to assess it. This paper developed a Colleges’Academic Involution Scale (CAIS) and examined its reliability and validity with 3 studies. Study 1 generated a 31-item pool based on literature review, daily cases, and interview, and filtered items based on a 338-undergraduate sample. Study 2 confirmed a 16-item final version CAIS, which consisted of three dimensions: unwilling hardworking, excessive competition, and surface learning, based on a large sample (N = 3000) and an independent sample (N = 571). Based on the 3000-undergraduate sample, more than 60% of college students are involved in academic involution. Specifically, individuals with high scores in the CAIS showed stronger zero-sum belief, higher trait anxiety, lower life satisfaction, and poorer sleep quality, but not greater creative potential. Study 3 revealed that the test-retest reliability of the final version scale reached 0.83 based on a new sample (N = 99). The CAIS could be a reliable and effective tool for future research exploring harms, causes, and ways to mitigate academic involution.

     

  • Test mode effect: Sources, detection, and applications

    Subjects: Psychology >> Psychological Measurement submitted time 2023-04-22

    Abstract: Test mode effect (TME) refers to the difference in test function caused by the administration of the same test in different test modes. The existence of TME will have an impact on test fairness, selection criteria and test equating, so it is of great significance to accurately detect and interpret TME. By systematically sorting out the source, detection (including the experimental design and detection methods) and research results of TME, the methodology of TME research is comprehensively demonstrated. Further interpretation of the TME model, expansion of the test modes in TME research, and application of TME research results to largescale educational assessment programs in China, are important future development directions in the field of TME.

  • CCTE-A database of Chinese COVID-19 Terms

    Subjects: Psychology >> Cognitive Psychology Subjects: Psychology >> Experimental Psychology Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Statistics in Psychology Subjects: Psychology >> Other Disciplines of Psychology Subjects: Linguistics and Applied Linguistics >> Linguistics and Applied Linguistics Subjects: Other Disciplines >> Synthetic discipline submitted time 2023-02-08

    Abstract: Objective: To establish a multi-dimensional and standardized lexical database of COVID-19-related terms and words. The database may have facilitated COVID-19-related research in domains such as Psychology, Psychiatry, Neuroscience, etc. Methods: This database referred to the established methods of the emotional lexical database at home and abroad, and used the dot-detection task and words in the database as experimental materials to test the attention bias of the subjects suspected of having COVID-19 phobia, so as to test the validity of the database. Results: 196 COVID-19-related words and 99 neutral words were included in the word database. Then, we classified and evaluated the words through six dimensions, and established a standardized database of Chinese COVID-19-related terms. The words have good reliability and internal consistency. In addition, the validity was tested through the dot-detection task. Subjects with COVID-19 fear and those without COVID-19 fear showed a significant attentional bias toward COVID-19-related words Limitations: The initial sample size is small and the database application needs further development. Conclusions: The database of Chinese COVID-19 terms has good reliability, internal consistency, and reliability, and can be used as materials related to COVID-19-related research in the future.