Medical Named Entity Recognition from Un-labelled Medical Records based on Pre-trained Language Models and Domain Dictionary 后印本

作者： Chaojie, Wen ¹ Tao, Chen ¹ Xudong, Jia ¹ Jiang, Zhu ¹
作者单位：

1. Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen 529020, China
通讯作者： Tao, Chen
提交时间：2022-11-27 19:12:05

摘要: Medical named entity recognition (NER) is an area in which medical named entities are recognized from medical texts, such as diseases, drugs, surgery reports, anatomical parts, and examination documents. Conventional medical NER methods do not make full use of un-labelled medical texts embedded in medical documents. To address this issue, we proposed a medical NER approach based on pre-trained language models and a domain dictionary. First, we constructed a medical entity dictionary by extracting medical entities from labelled medical texts and collecting medical entities from other resources, such as the Yidu#2; N4K data set. Second, we employed this dictionary to train domain-specific pre-trained language models using un-labelled medical texts. Third, we employed a pseudo labelling mechanism in un-labelled medical texts to automatically annotate texts and create pseudo labels. Fourth, the BiLSTM-CRF sequence tagging model was used to fine-tune the pre-trained language models. Our experiments on the un-labelled medical texts, which were extracted from Chinese electronic medical records, show that the proposed NER approach enables the strict and relaxed F1 scores to be 88.7% and 95.3%, respectively.

Medical named entity recognition Pre-trained language model Domain dictionary Pseudo labelling Un-labelled medical data

分类： 计算机科学 >> 计算机科学的集成理论
引用： ChinaXiv:202211.00388 (或此版本 ChinaXiv:202211.00388V1)
DOI:10.1162/dint_a_00105
CSTR:32003.36.ChinaXiv.202211.00388.V1
科创链TXID： ae15088d-cc39-4e7e-9dfb-e99a68f967b0
推荐引用方式： Chaojie, Wen,Tao, Chen,Xudong, Jia,Jiang, Zhu.Medical Named Entity Recognition from Un-labelled Medical Records based on Pre-trained Language Models and Domain Dictionary.中国科学院科技论文预发布平台.[DOI:10.1162/dint_a_00105] (点此复制)

版本历史

[V1]

2022-11-27 19:12:05

ChinaXiv:202211.00388V1

下载全文

相关论文推荐

1. Unraveling the Black-box Magic: An Analysis of Neural Networks’ Dynamic Local Extrema	2025-07-08
2. MDPO: Multi-Granularity Direct Preference Optimization for Mathematical Reasoning	2025-06-10
3. Semantic structures within natural language and their cognitive functions	2025-06-03
4. Physical models realizing the transformer architecture of large language models	2025-05-27
5. DO-RAG: A Domain-Specific QA Framework Using Knowledge Graph-Enhanced Retrieval-Augmented Generation	2025-05-20
6. Mathematical formalism and physical models for generative artificial intelligence	2025-05-07
7. What surface characteristics truly affect thermal contact resistance -- An interpretability study based on deep learning and convolutional neural networks	2025-04-11
8. The Thermal Contact Resistance Dataset and the Artificial Intelligence-Driven Prediction of Thermal Contact Resistance in Multi-material Systems	2025-04-11
9. Utilizing Large Language Models to Analyze PSR.exe Recorded Input for Computer Use	2025-03-21
10. Recent Advances in Robotic Navigation via Large Language Models	2025-03-06


公开评论匿名评论仅发给作者