How to measure statistical learning ability: evidence from test reliability

Author: Wenbo Yu ¹ Hetong Qi ² Tianlin Wang ² Dandan Liang ^1，3
Institute:

1. School of Chinese Language and Culture, Nanjing Normal University

2. School of Education, University at Albany, State University of New York

3. Interdisciplinary Research Center for Linguistic Science，University of Science and Technology of China
Correspondent： 梁丹丹 Email:ldd233@163.com
Submit Time:2024-08-28 19:21:40

Abstract: Research has considered statistical learning (SL) as a fundamental learning mechanism in cognition, for which individuals rely on the statistical regularities from visual and verbal input during information processing. Learners’ utilization of SL has been shown to impact different aspects of language development, including phonological, lexical, and syntactic development for infants, school-aged children, and adult second language learners. Take the verbal SL task as an example, participants are first exposed to a nonsensical artificial language or visual sequence for 5~10 mins and then asked to finish a 2 alternative forced choice task (2AFC). Accuracy on each trial is coded in a dichotomous manner, with 0 for incorrect and 1 for correct, and aggregated across participants to generate the mean accuracy of the group. If it is higher than chance level, it is assumed that learning has occurred. This research perspective is called the perspective of inter group differences.
Recent studies have utilized the scores of 2AFC task as participants’ SL ability; learners language development and other high cognitive skills have been predicted based on their performance in these tasks. However, this index is the result from the perspective of inter-group differences, which is suitable for judging whether the test group exhibits statistical learning effects, but not measuring the relationship between SL ability and other cognitive ability. Under this perspective of individual difference, some researchers criticized the low reliability of SL tasks and suggested that the task results are not psychometrically satisfactory. In the current study, we aimed to put forward a modified SL task that is relatively more comprehensive. Two aspects of traditional tasks have been modified; one is that we constructed learning materials with mixed-lengths targets, and another is that we employed a familiarity rating task to measure learning outcomes in addition to the 2AFC task. These two actions aimed to obtain test scores with bigger variability so that improving the reliability of task. Finally, some papers argued that visual SL task is free from linguistics experience, thus is with better reliability index; so we also compared the reliability between the visual and verbal modalities.
A total of 143 participants took part in our experiment: 38 in the artificial language A condition, 36 in artificial language B condition, 35 in visual image A condition, and 34 in visual image B condition. Two types of reliability Cronbach’s alpha coefficient and split-half reliability were computed with the reliability function in R. The results of this study are divided into three aspects. Firstly, the index of two types of reliability in the current study are better than previous studies. This indicates that the learning materials we constructed with mixed-length nonsensical words exhibit some advantages in reliability. Secondly, the results revealed that both the Cronbach’s alpha coefficient and split-half reliability of statistical learning tasks in the visual modality were higher than those in the auditory modality, which is consistent with the opinion of Siegelman (2018a). Then, the reliability of forced-choice tasks in the visual modality was higher than that of familiarity rating tasks, suggesting the results obtained from 2AFC task are more stable and consistent across participants. Additionally, scores from 2AFC task and familiarity rating task correlated with each other in verbal modality, but not in visual modality.
The current study explored the task in measuring SL ability. The results underscore the importance of using mixed-length learning materials and suggest employing visual stimuli in assessing statistical learning abilities in addition to the traditional utilization of forced-choice tasks during the testing phase. Future studies should not only focus on designing brief SL tasks for children and language disorder population that align with psychometric standards, but also rethink the cognitive mechanism underlying various SL task.

statistical learning 2-alternative forced choice task familiarity rating task reliability

From: Wenbo Yu
Subject: Psychology >> Cognitive Psychology
Contribution： Journal Submitted
Cite as: ChinaXiv:202408.00250 (or this version ChinaXiv:202408.00250V1)
DOI:10.12074/202408.00250
CSTR:32003.36.ChinaXiv.202408.00250
TXID： a0586891-8151-4057-8dca-2eb07e1a8773
Recommended references： 于文勃,亓鹤潼,王天琳,梁丹丹.如何测量统计学习能力？——基于测验信度的角度.中国科学院科技论文预发布平台.[DOI:10.12074/202408.00250] (Click&Copy)

Version History

[V3]	2025-01-02 16:10:25	ChinaXiv:202408.00250v3 View This Version	Download
[V2]	2024-12-29 20:02:52	ChinaXiv:202408.00250v2 View This Version	Download
[V1]	2024-08-28 19:21:40	ChinaXiv:202408.00250V1	Download

Related Paper

1. 越被小用，越失激情？员工资质过剩感对其工作激情的影响	2025-08-18
2. 经颅交流电刺激在心理学研究中的应用	2025-08-18
3. 内隐情绪调节的认知神经机制	2025-08-18
4. 领导人际情绪管理策略如何打破员工向领导者宣泄的自我延续效应？宣泄者一接受者互动视角	2025-08-14
5. 面向空间导航能力的虚拟现实测验设计	2025-08-14
6. 为何最优化患者对医生更警惕？道德推脱的中介作用	2025-08-13
7. 后悔情绪及其调节	2025-08-11
8. 中国内地学生学习投入的变迁(2006~2024年)	2025-08-09
9. 孤独症儿童负性情绪调节特征及干预：基于多模态评估的正念与认知策略训练	2025-08-07
10. 孤独症谱系障碍儿童语音情绪识别的障碍：韵律、语义还是整合困难？——基于三水平元分析的探究	2025-08-05


Public comments Anonymous comments Send only to author