Subjects: Library Science,Information Science >> Information Science submitted time 2017-10-11 Cooperative journals: 《数据分析与知识发现》
Abstract:【目的】探讨冶金领域中文专利术语抽取模型的最优条件, 用于有效地抽取冶金领域专利术语。【方法】使用尚不完善的核心语料库, 在无需人工标引的情况下, 采用条件随机场(CRFs)构建字角色标注的冶金领域中文专利术语识别模型。详细说明模型的构建过程, 同时重点对比CFRs 的各个因素(特征组合、字长窗口等)对识别效果的影响。【结果】实验结果表明字序列、级别特征、领域特征、温度特征的组合在字长窗口为3, c 等于1,f 等于1 时, 准确率达到94.26%, 召回率达到94.37%, F1 值达到94.5%。【局限】核心词典欠完善, 使得部分词语标注不够准确; 未与其他方法作详细比较, 未详细说明CRFs 的可靠性。【结论】CRFs 在适当的角色和特征以及特征模板的组合下能较好地识别出冶金领域的中文专利术语。
Subjects: Library Science,Information Science >> Automation method and equipment in intelligence process submitted time 2022-08-31
Abstract: Objective In order to construct a event recognition model in historical books, the performance of sequence labeling method in event recognition in historical ancient books is compared with that of text generation method. Methods In this paper, "Three Kingdoms" is selected as the original corpus. To compare the performance of the two methods, performing on the "Three Kingdoms" event data set, the sequence labeling experiment used BMES annotation and builded the BBCN-SG model ,and the text generation experiment builded the T5-SG model.It also builded RoBERTa-SG and NEZHA-SG models to conduct comparative experiments on generative models. Combining three text generation models and integrating the idea of Stacking ensemble learning, the Stacking-TRN-SG model is constructed. Results On the subject of modeling event recognition in historical ancient books, the performance of the text generation method is significantly better than that of the sequence labeling method. In the text generation method, the performance of the three models is RoBERTa-SG > T5-SG > NEZHA-SG. Stacking ensemble learning greatly improves the recognition performance of generation models. Limitations The computational resources of this paper are limited, and the Stacking-TRN-SG model lacks application research in other historical and ancient corpora. Conclusions The Stacking-TRN-SG model constructed in this paper preliminarily realizes the automatic event recognition of historical ancient books.
Peer Review Status:Awaiting Review
Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2019-01-03 Cooperative journals: 《计算机应用研究》
Abstract: In order to get valuable information in bilateral cooperation in real time, it is of utmost importance to efficiently extract international cooperation elements in Web diplomacy news. This paper abstracted international cooperation element extraction into a problem similar to named entity recognition. First of all, it defined the connotations of international cooperation elements.Secondly, it extracted the rules that contained domain knowledge.Then it proposed a framework for extracting international cooperation elements for diplomatic news texts which combined with neural networks and domain knowledge.Finally, the method was compared with the neural network method and its own rule combination in the same corpus. The experimental results show that the proposed method has better results.
Subjects: Library Science,Information Science >> Library Science submitted time 2023-08-26 Cooperative journals: 《图书情报工作》
Abstract: [Purpose/significance] According to the characteristics of Chinese language expression, this paper proposes a feature extraction method of words with word segmentation tag of character granularity, which can effectively improve the F1 value of Chinese clinical named entity recognition, and the method can be used for other Chinese sequence labeling model. [Method/process] This paper chose three kinds of features of Chinese-words, including part-of-speech Tagging, keyword weight and dependency parsing, to construct the clinical cases training text in sequence labeling model of the Chinese-character granularity, and the corpus source is CCKS2017:Task2. Then, in different feature combination modes, this paper adopted CRF algorithm to verify Method 1 and Method 2,which are two kinds of words feature extraction methods for character granularity. [Result/conclusion] Compared with Method 1, for the four different combinations of word features, Method 2 has been improved in the task of CNER, and the F1 value has increased by an average of 0.23% in the 4-fold cross-validation test. The experiment shows that in the context of mature Chinese word segmentation technology, Method2 can obtain better word feature representations than Method 1, and it has a lifting effect on the processing performance of Chinese-Character Granularity in Sequence Labeling Model.