Subjects: Library Science,Information Science >> Library Science submitted time 2023-08-27 Cooperative journals: 《图书情报工作》
Abstract: [Purpose/significance] Data science is emerging as a new interdisciplinary field which combines many fields. Extracting the corresponding entities knowledge from the announcement information of data science recruitment can not only help to understand the development of data science from a market perspective, but also help to improve the content of data science teaching.[Method/process] Based on the recruitment announcement from the recruitment website, combining with information science data collection, annotation and organization methods, data science corpus was constructed and the corresponding entities from it were extracted.[Result/conclusion] In the existing 11000 annotated data science corpus scale recruitment announcement, based on the Bi-LSTM-CRF, CRF and Bi-LSTM models, this paper compared the extraction performance of data science recruiting entities and finally determined the final data science recruitment entities automatic extraction model, designed the data science recruitment entities automatic extraction platform, and built a data science recruitment entities network.
Subjects: Library Science,Information Science >> Library Science submitted time 2023-08-26 Cooperative journals: 《图书情报工作》
Abstract: [Purpose/significance] Abstract can explain concisely the research purposes, research methods and the final part of the statement, which is of high exploration value and significance.[Method/process] In this paper, four short-term memory networks (long short-term memory, support vector machine, LSTM-CRF and CNN-CRF) were selected to summarize the journal articles of 3672 CNKI databases.[Result/conclusion] The long-term memory network model identifies the highest F value of 69.15%, the maximum F value of LSTM-CRF neural network model is 88.76%, and the highest F value of RNN-CRF model is 89.10%. The highest support vector machine classifier classification macro F value is 72.04%. The experimental results have a high reference value for the selection of the experimental model of the functional structure of academic dissertation in the field of library and information science.
Subjects: Library Science,Information Science >> Information Science submitted time 2017-11-08 Cooperative journals: 《数据分析与知识发现》
Abstract:【目的】中文机构名结构复杂、罕见词多, 识别难度大, 对其进行正确识别对于信息抽取、信息检索、知识挖掘和机构科研评价等情报学中的后续任务意义重大。【方法】基于深度学习的循环神经网络(Recurrent Neural Network, RNN)方法, 面向中文汉字和词的特点, 重新定义了机构名标注的输入和输出, 提出汉字级别的循环网络标注模型。【结果】以词级别的循环神经网络方法为基准, 本文提出的字级别模型在中文机构名识别的准确率、召回率和F 值均有明显提高, 其中F 值提高了1.54%。在包含罕见词时提高更为明显, F 值提高了11.05%。【局限】在解码时直接使用了贪心策略, 易于陷入局部最优, 如果使用条件随机场算法进行建模可能获取全局最优结果。【结论】本文方法构架简单, 能利用到汉字级别的特征来进行建模, 比只使用词特征取得了更好的结果。