Subjects: Psychology >> Social Psychology Subjects: Psychology >> Cognitive Psychology Subjects: Psychology >> Psychological Measurement Subjects: Computer Science >> Natural Language Understanding and Machine Translation submitted time 2023-01-30
Abstract: As a basic technique in natural language processing (NLP), word embedding represents a word with a low-dimensional, dense, and continuous numeric vector (i.e., word vector). Word embeddings can be obtained by using neural network algorithms to predict words from the surrounding words or vice versa (Word2Vec and FastText) or words’ probability of co-occurrence (GloVe) in large-scale text corpora. In this case, the values of dimensions of a word vector denote the pattern of how a word can be predicted in a context, substantially connoting its semantic information. Therefore, word embeddings can be utilized for semantic analyses of text. In recent years, word embeddings have been rapidly employed to study human psychology, including human semantic processing, cognitive judgment, individual divergent thinking (creativity), group-level social cognition, sociocultural changes, and so forth. We have developed the R package “PsychWordVec” to help researchers utilize and analyze word embeddings in a tidy approach. Future research using word embeddings should (1) distinguish between implicit and explicit components of social cognition, (2) train fine-grained word vectors in terms of time and region to facilitate cross-temporal and cross-cultural research, and (3) deepen and expand the application of contextualized word embeddings and large pre-trained language models such as GPT and BERT.
Subjects: Psychology >> Cognitive Psychology submitted time 2023-01-18
Abstract:
How semantics are represented in human brain is a central issue in cognitive neuroscience. Previous studies typically address this issue by artificially manipulating the properties of stimuli or task demands. Having brought valuable insights into the neurobiology of language, this psychological experimental approach may still fail to characterize semantic information with high resolution, and have difficulty quantifying context information and high-level concepts. The recently-developed natural language processing (NLP) techniques provide tools to represent the discrete semantics in the form of vectors, enabling automatic extraction of word semantics and even the information of context and syntax. Recent studies have applied NLP techniques to model the semantic of stimuli, and mapped the semantic vectors onto brain activities through representational similarity analyses or linear regression. A consistent finding is that the semantic information is represented by a vastly distributed network across the frontal, temporal and occipital cortices. Future studies may adopt multi-modal neural networks and knowledge graphs to extract richer information of semantics, apply NLP models to automatically assess the language ability of special groups, and improve the interpretability of deep neural network models with neurocognitive findings.
Peer Review Status:Awaiting Review