Subjects: Library Science,Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》
Abstract: [Purpose/significance] The paper explores the construction of literature intelligence big data knowledge resource system, which supports multi-domain intelligent knowledge service.[Method/process] Based on the AI application requirements, drawing on the industry experience, combing the problems of existing resource system, the paper expanded the resource system from multi-level and multi-dimensional, built a reliable data processing process and computing platform to support efficient data collection and processing, and developed intelligent data governance tools to achieve effective governance of knowledge resources and ensure the provision of high-quality data resources.[Result/conclusion] It has initially formed a knowledge resource system covering multiple types and disciplines of sci-tech literature, constructed and completed a highly automated data collection and governance process, implemented multiple data quality control, and accumulated hundreds of millions of high-quality data. At present, it has provided data support for multiple knowledge services.
Subjects: Library Science,Information Science >> Information Science submitted time 2023-03-13
Abstract: ChatGPT is a dialogue system developed by OpenAI company. It appears as a Chatbot, but in essence, it uses Artificial Intelligence Generated Content (AIGC) technology to produce answers. Its key foundation is Generative Pre-trained Transformer and the core technology is InstructGPT. When compared to similar products in the early stages, its main characteristic is that there is a significant decrease in making up facts and generating toxic content. We present a systematic investigation of the technical structure, relevant research and practice, and applications scenarios of ChatGPT. Based on the investigation, we analyze the inspiration from the rapid development of AI technology as well as the influence of ChatGPT on Scientific Research and Library & Information Service. According to these influence, we present eight suggestions for the Library & Information Service field. Overall, the field should find its distinctive value orientation in the AI era, not only by maintaining the conventional scientific research paradigm, but also by exploring new technologies to boost scientific research.
Peer Review Status:Awaiting Review
Subjects: Library Science,Information Science >> Information Retrieval submitted time 2023-02-09
Abstract:
[Background and purpose] Author recognition is developing towards the use of multilevel features. Compared with stylistic features, thematic features are still a few in the research and application of author recognition, especially for Chinese social media texts. At the same time, the research on the use of topic features focuses more on the innovation of the extraction technology and methods of topic features, but not on the identified topics and the application methods of topic features. Therefore, the basic purpose of this study is to study the use of topic features in the author recognition of Chinese social media texts, and further develop strategies to identify and screen the core topics in the topic features, optimize the use of topic features, so as to improve the use effect of topic features in the author recognition. [Methods] The research first uses the LDA topic model to extract the academic topics and social topics of the candidate authors, and then uses Word2vec to develop a merge screening strategy to identify and represent the core topics, and finally uses N-gram features and similarity calculation to achieve author recognition. [Results] The experimental results showed that the thematic features had a certain positive effect on the author's recognition in the corpus of this study, and the strategies and applications related to the core thematic features proposed in this study could also optimize the use of thematic features.
Peer Review Status:Awaiting Review
Subjects: Library Science,Information Science >> Information Science submitted time 2017-11-08 Cooperative journals: 《数据分析与知识发现》
Abstract:【目的】通过采用语义识别、知识关系计算等方法提升科技文献检索系统的服务功能和效果, 使之能够呈现更加丰富的知识化语义信息, 将更多的知识点和知识关系展现给用户。【方法】应用数据挖掘和关系计算工具,深度识别和抽取科技文献中的语义知识, 分析、计算、构建语义关系, 并将得到的语义知识和语义关系建立多维语义索引树, 设计新的数据组织呈现模型。【结果】研发语义丰富化检索示范系统, 在科技文献检索应用过程中充分揭示语义信息, 丰富检索体验。【局限】选取的试验数据集合不够充足, 缺少其他领域应用对比。【结论】本文模型设计给用户带来更多的知识层面的关联、揭示和导航, 提升了检索系统体验。同时分析了设计模型的不足之处, 探索改进方法。
Subjects: Library Science,Information Science >> Information Science submitted time 2017-10-11 Cooperative journals: 《数据分析与知识发现》
Abstract:【目的】开发网络信息存档WARC 文件的解析与索引系统, 充分挖掘科技网站存档资源价值。【应用背景】在网络资源采集存档领域, WARC 文件格式获得了广泛的应用。随着网络信息的多样化, 已有的WARC 文件索引工具越来越难以满足用户多样性的查询需求。【方法】采用模块化方案解析WARC 文件。分析比较常用的索引工具, 选择Solr 平台开发全文索引系统。【结果】实现对WARC 文件基于内容的检索访问服务, 并在WARC的索引中增加了学科分类、资源类型和存档时间等分面检索内容, 从多维度对WARC 文件内容进行揭示。【结论】向用户提供了丰富的科技网站存档数据信息, 提高了用户检索访问效率。
Subjects: Library Science,Information Science >> Philology submitted time 2017-08-21
Abstract:[目的/意义]本文期望通过采用数据挖掘、语义识别、知识关系计算等技术方法来提升科技文献检索系统的服务功能和效果,使之能够呈现更加丰富的知识化语义信息,将更多的知识点和知识关系展现给用户。[方法/过程] 本文应用semrap和clausIE数据挖掘和关系计算工具,识别和抽取科技文献中的语义对象,分析、计算、构建语义关系,并将得到的语义对象和语义关系设计建立多维语义索引树,设计了新的数据组织呈现模型。[结果/结论] 研发语义丰富化检索示范系统,在科技文献检索系统中充分揭示语义信息,给用户带来更多的知识内容层面的导航、关联、发掘和揭示,同时分析了设计模型的优势与不足。
Peer Review Status:Awaiting Review
Subjects: Library Science,Information Science >> Information Science submitted time 2016-06-13
Abstract:本论文围绕富文档载体类型的鉴别、元数据的提取等开展相应的实际应用探索。笔者通过开源工具PDFBox以及Tika对不同类型的富文档元数据及正文内容进行提取,取得了很好的实际效果,为科研人员提供了大量的有学术价值的情报资源。但是由于开源工具的局限性以及富文档特殊的文档结构,导致提取出来的元数据及正文内容准确率欠缺完美,笔者后续将对此进行研究并完善改进。
Peer Review Status:Awaiting Review
Subjects: Library Science,Information Science >> Library Science submitted time 2016-05-05
Abstract:[目的]对典型科技文献语义检索系统进行调研和总结。[文献范围]利用Web of Knowledge和Google Scholar检索semantic search相关文献以及语义检索系统的参考文献和研究报告。[方法]根据文本语义处理程度,将这些系统归纳为语义查询扩展的检索系统、以概念或实体为中心的检索系统、以关系为中心的检索系统和面向知识发现的检索系统。[结果]提出科技文献语义检索系统的基本框架,总结科技文献语义检索系统功能特点。[局限]缺少对语义检索系统的性能评测。[结论]为构建面向科技文献的语义检索系统提供良好借鉴。
Peer Review Status:Awaiting Review
Subjects: Library Science,Information Science >> Library Science submitted time 2016-03-10
Abstract:文章对图书馆领域的主流资源发现平台(Elsevier,Springer,中国知网 CNKI)和商业资源发现系统(Primo,Summon,EDS)的页面构造和页面布局等方面进行了多层次多角度的分析,并对多个平台(如Willy,英国国家图书馆,荷兰国家图书馆,美国国会图书馆,美国 NSDL,OCLC,PubMed等)的特色资源和功能进行调研分析,最终借助上述平台的优秀功能,改进我中心自行建设的资源集成发现服务系统,并着重提升用户使用体验。
Peer Review Status:Awaiting Review
Subjects: Library Science,Information Science >> Library Science submitted time 2016-02-02
Abstract:【目的】构建国际重要科研机构 Web 存档系统。【方法】基于 IIPC 开源软件拓展采集存档框架, 在采集端采用三层扩展策略, 在采集客户端增加自动上传及报告等管理功能, 开发WARC文件内容解析模块, 利用Solr进行索引。【结果】在采集端实现三层扩展, 通过增加采集客户端功能提高存档流程自动化程度, 通过增加的WARC文件内容解析功能抽取更多信息, 实现索引及检索服务的扩展。【局限】没有使用大规模采集存档进行检验。【结论】扩展后的采集存档框架初步具备分布式、可扩展、全自动化的特点。
Peer Review Status:Awaiting Review