ChinaXiv.org 中国科学院科技论文预发布平台

Submitted Date

Subjects

Authors

Institution

result total 4.

Hide Summary

Hits

Date

Downloads

1. ChinaXiv:201711.01199
Download

基于CRFs 的冶金领域中文专利术语抽取研究

Subjects: Library Science，Information Science >> Information Science submitted time 2017-10-11 Cooperative journals: 《数据分析与知识发现》

王密平王昊邓三鸿吴志祥

Abstract：【目的】探讨冶金领域中文专利术语抽取模型的最优条件, 用于有效地抽取冶金领域专利术语。【方法】使用尚不完善的核心语料库, 在无需人工标引的情况下, 采用条件随机场(CRFs)构建字角色标注的冶金领域中文专利术语识别模型。详细说明模型的构建过程, 同时重点对比CFRs 的各个因素(特征组合、字长窗口等)对识别效果的影响。【结果】实验结果表明字序列、级别特征、领域特征、温度特征的组合在字长窗口为3, c 等于1,f 等于1 时, 准确率达到94.26%, 召回率达到94.37%, F1 值达到94.5%。【局限】核心词典欠完善, 使得部分词语标注不够准确; 未与其他方法作详细比较, 未详细说明CRFs 的可靠性。【结论】CRFs 在适当的角色和特征以及特征模板的组合下能较好地识别出冶金领域的中文专利术语。

YES

Hits 6675 Downloads 1800 Comment 0
2. ChinaXiv:202209.00004
Download

Research on the construction of event recognition model in historical books based on text generation technology

Subjects: Library Science，Information Science >> Automation method and equipment in intelligence process submitted time 2022-08-31

Wang, Yanying Wang, Hao Zhu, Hui Li, Xiaomin

Abstract： Objective In order to construct a event recognition model in historical books, the performance of sequence labeling method in event recognition in historical ancient books is compared with that of text generation method. Methods In this paper, "Three Kingdoms" is selected as the original corpus. To compare the performance of the two methods, performing on the "Three Kingdoms" event data set, the sequence labeling experiment used BMES annotation and builded the BBCN-SG model ,and the text generation experiment builded the T5-SG model.It also builded RoBERTa-SG and NEZHA-SG models to conduct comparative experiments on generative models. Combining three text generation models and integrating the idea of Stacking ensemble learning, the Stacking-TRN-SG model is constructed. Results On the subject of modeling event recognition in historical ancient books, the performance of the text generation method is significantly better than that of the sequence labeling method. In the text generation method, the performance of the three models is RoBERTa-SG > T5-SG > NEZHA-SG. Stacking ensemble learning greatly improves the recognition performance of generation models. Limitations The computational resources of this paper are limited, and the Stacking-TRN-SG model lacks application research in other historical and ancient corpora. Conclusions The Stacking-TRN-SG model constructed in this paper preliminarily realizes the automatic event recognition of historical ancient books.

Peer Review Status:Awaiting Review

Hits 3312 Downloads 765 Comment 0
3. ChinaXiv:201901.00050
Download

基于神经网络与领域知识的外交国际合作元素抽取

Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2019-01-03 Cooperative journals: 《计算机应用研究》

张子靖万常选刘德喜刘玉刘喜平江腾蛟

Abstract： In order to get valuable information in bilateral cooperation in real time, it is of utmost importance to efficiently extract international cooperation elements in Web diplomacy news. This paper abstracted international cooperation element extraction into a problem similar to named entity recognition. First of all, it defined the connotations of international cooperation elements.Secondly, it extracted the rules that contained domain knowledge.Then it proposed a framework for extracting international cooperation elements for diplomatic news texts which combined with neural networks and domain knowledge.Finally, the method was compared with the neural network method and its own rule combination in the same corpus. The experimental results show that the proposed method has better results.

YES

Hits 1898 Downloads 943 Comment 0
4. ChinaXiv:202308.00275
Download

Research on Feature Extraction Scheme of Chinese-character Granularity in Sequence Labeling Model——A Case Study About Clinical Named Entity Recognition of CCKS2017: Task2

Subjects: Library Science，Information Science >> Library Science submitted time 2023-08-26 Cooperative journals: 《图书情报工作》

Sun An Yu Yingxiang Luo Yonggang Wang Qi

Abstract： [Purpose/significance] According to the characteristics of Chinese language expression, this paper proposes a feature extraction method of words with word segmentation tag of character granularity, which can effectively improve the F₁ value of Chinese clinical named entity recognition, and the method can be used for other Chinese sequence labeling model. [Method/process] This paper chose three kinds of features of Chinese-words, including part-of-speech Tagging, keyword weight and dependency parsing, to construct the clinical cases training text in sequence labeling model of the Chinese-character granularity, and the corpus source is CCKS2017:Task2. Then, in different feature combination modes, this paper adopted CRF algorithm to verify Method 1 and Method 2,which are two kinds of words feature extraction methods for character granularity. [Result/conclusion] Compared with Method 1, for the four different combinations of word features, Method 2 has been improved in the task of CNER, and the F₁ value has increased by an average of 0.23% in the 4-fold cross-validation test. The experiment shows that in the context of mature Chinese word segmentation technology, Method2 can obtain better word feature representations than Method 1, and it has a lifting effect on the processing performance of Chinese-Character Granularity in Sequence Labeling Model.

Hits 1163 Downloads 444 Comment 0

基于CRFs 的冶金领域中文专利术语抽取研究

Research on the construction of event recognition model in historical books based on text generation technology

基于神经网络与领域知识的外交国际合作元素抽取

Research on Feature Extraction Scheme of Chinese-character Granularity in Sequence Labeling Model——A Case Study About Clinical Named Entity Recognition of CCKS2017: Task2