• 实体及其属性的相关抽取技术

    Subjects: Computer Science >> Computer Application Technology submitted time 2017-03-09

    Abstract:信息抽取是当前搜索引擎与自然语言处理研究领域的核心技术之一,它用来对文本做匹配,以获得其中包含的各种实体以及它们的属性及关系。本文对实体及其属性的抽取做了简单介绍,包括基于规则的抽取技术和基于统计的抽取技术,并介绍了几个典型的系统实例,如:IE2、GATE和SystemT及它们的原理,最后简单介绍了我们在这个领域的工作成果。

  • Research on Ontology Building Methods of Chinese Ancient Books

    Subjects: Library Science,Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》

    Abstract: [Purpose/significance] It is very helpful to build semantic ontology of Chinese ancient books for texting mining and text analysis of China history. However, there are lots of differences between ancient and modern Chinese in syntactic structure. The difference makes a lot of difficulties in Ontology Building of Chinese ancient books. [Method/process] This paper focused on ontology building methods of ancient Chinese books based on Natural language processing (NLP) technique. We designed the ontology model based on CIDOC CRM which is an international standard for the description of cultural heritages. Then we gave a solution to extract instances of the ontology automatically which is a hybrid method of regulation extraction and CRFs recognition based on the syntactic structure of Chinese ancient books. At last, we did an examination using one of Chinese ancient books called Zuo Zhuan. [Result/conclusion] The experiment results show that our method can improve the extraction precision of Ontology instances, which can enhance the efficiency of ontology construction from Chinese ancient books. This paper got 93% F-score on the testing of regular-based method, and 82.51% F-score on CRFs method using the best feature template. It also finds that it is important to use the characters of the position and part-of-speech of words to enhance the extraction of ontology instances in our methods.