• 基于生命周期模型的科技文献数据管理体系研究

    Subjects: Other Disciplines >> Synthetic discipline submitted time 2023-03-31 Cooperative journals: 《农业图书情报学报》

    Abstract: [Purpose/Significance] Scientific and technical (S&T) literature data resources are characterized with wide coverage, large quantity, many types, fast update and strong timeliness. In order to improve the effect and security of S&T literature data management, this paper studies the S&T literature management system based on the data life cycle model. [Method/Process] This paper explores the management mode of S&T documents, constructs the life cycle system of S&T documents based on the data management process, and expounds the data management tools and methods from the stages of data creation, data storage, data pre-processing, data calculation, data service, data archiving and data destruction. In the data creation stage, specific data access forms are formulated for different sources and data types, and personalized data creation tools are built to receive data completely. In the data storage stage, a unified document metadata storage system is developed by analyzing the characteristics and shortcomings of various types of data, so as to better explain and organize scientific and technological document data. In the data pre-processing stage, various tools are built to realize the formatting pre-processing, parsing, conversion, structuring and other operations of various types of data. In the data computing stage, data enrichment processing, entity relationship extraction and knowledge graph construction are mainly completed. Data provides services through a unified service interface. Data archiving completes data archiving and saving. In the data destruction phase, unnecessary data is safely destroyed. [Results/Conclusions] In this paper, the management and practice based on the life cycle of S&T literature were first carried out based on the core data set Web Of Science BP data , and then explored from the seven phases of creation, storage, pre-processing, calculation, service, archiving and destruction. Finally, based on the DAMA data quality evaluation principle, the comprehensive evaluation and evaluation of the data management effect were carried out from the six dimensions of integrity, uniqueness, real-time, validity, accuracy and consistency. The receiving integrity of data was 100%, and the non-null integrity of data was 59.75%. The uniqueness of data reached 99.23%. The real time of data was controllable. The validity of data met the constraint conditions. The accuracy of the data reached 100%. The consistency of data reached 90%. It basically solved the problem that data can be effectively managed and applied in each life cycle stage. Finally, the management model was verified to take effect and achieve desirable service effect.