ChinaXiv.org 中国科学院科技论文预发布平台

Reg Login

EN | 中文

Submitted Date

2023
1

Subjects

Library Science
1

Authors

Institution

result total 1.

Hide Summary

Hits

Date

Your conditions: 王若佳

1. ChinaXiv:202307.00628
Download

Healthcare Data Mining: Word Segmentation and Named Entity Recognition in Chinese Electronic Medical Record

Subjects: Library Science，Information Science >> Library Science submitted time 2023-07-26 Cooperative journals: 《图书情报工作》

Wang Ruojia Cho Sang Wang Jimin

Abstract： [Purpose/significance] Healthcare big data is an important basic strategic resource in China. Word segmentation and entity recognition of Chinese electronic medical record(EMR) is helpful in extracting important information from a large number of unstructured text.[Method/process] In this study, a Chinese medical thesaurus is firstly built in terms of authoritative medical subject headings, official standards and health website data; then, the effect of four segmentation methods is compared based on the corpus of artificial segmentation and manual annotation; finally, CRF model is used to identify 5 entities, including disease, symptom, test, drug and treatment.[Result/conclusion] Results show that (i)AC automaton model has the best F-measure in EMR word segmentation, which is 82%; (ii) compared with Western medical record, it's difficult to identify medical entities in the record of traditional Chinese medicine. Besides, "Test" and "Disease" entities have better F-measure, while the F-measure of "Symptom" entity is not that ideal.

Hits 350 Downloads 160 Comment 0

友情链接: PubScholar 哲学社会科学预印本

Operating Unit: National Science Library，Chinese Academy of Sciences
Production Maintenance: National Science Library，Chinese Academy of Sciences
Mail: eprint@mail.las.ac.cn
Address: 33 Beisihuan Xilu,Zhongguancun,Beijing P.R.China

Recruiting preprint review experts License Information Term & Conditions