The Specimen Data Refinery: A Canonical Workflow Framework and FAIR Digital Object Approach to Speeding up Digital Mobilisation of Natural History Collections 后印本

作者： Alex, Hardisty ¹ Paul, Brack ² Carole, Goble ² Laurence, Livermore ³ Ben, Scott ³ Quentin, Groom ⁴ Stuart, Owen ² Stian, Soiland-Reyes ^2,5
作者单位：

1. School of Computer Science and Informatics, Cardiff University, Cardiff CF24 3AA, UK

2. The Department of Computer Science, The University of Manchester, Manchester M13 9PL, UK

3. The Natural History Museum, London SW7 5BD, UK

4. Meise Botanic Garden, 1860 Meise, Belgium

5. Informatics Institute, Faculty of Science, University of Amsterdam, 1090 GH Amsterdam, The Netherlands
通讯作者： Alex, Hardisty Email:hardistyar@gmail.com
提交时间：2022-11-28 20:32:50

摘要: A key limiting factor in organising and using information from physical specimens curated in natural science collections is making that information computable, with institutional digitization tending to focus more on imaging the specimens themselves than on efficiently capturing computable data about them. Label data are traditionally manually transcribed today with high cost and low throughput, rendering such a task constrained for many collection-holding institutions at current funding levels. We show how computer vision, optical character recognition, handwriting recognition, named entity recognition and language translation technologies can be implemented into canonical workflow component libraries with findable, accessible, interoperable, and reusable (FAIR) characteristics. These libraries are being developed in a cloud#2; based workflow platform—the ‘Specimen Data Refinery’ (SDR)—founded on Galaxy workflow engine, Common Workflow Language, Research Object Crates (RO-Crate) and WorkflowHub technologies. The SDR can be applied to specimens’ labels and other artefacts, offering the prospect of greatly accelerated and more accurate data capture in computable form. Two kinds of FAIR Digital Objects (FDO) are created by packaging outputs of SDR workflows and workflow components as digital objects with metadata, a persistent identifier, and a specific type definition. The first kind of FDO are computable Digital Specimen (DS) objects that can be consumed/produced by workflows, and other applications. A single DS is the input data structure submitted to a workflow that is modified by each workflow component in turn to produce a refined DS at the end. The Specimen Data Refinery provides a library of such components that can be used individually, or in series. To cofunction, each library component describes the fields it requires from the DS and the fields it will in turn populate or enrich. The second kind of FDO, RO-Crates gather and archive the diverse set of digital and real-world resources, configurations, and actions (the provenance) contributing to a unit of research work, allowing that work to be faithfully recorded and reproduced. Here we describe the Specimen Data Refinery with its motivating requirements, focusing on what is essential in the creation of canonical workflow component libraries and its conformance with the requirements of an emerging FDO Core Specification being developed by the FDO Forum.

Digital Specimen Workflow FAIR Digital Object RO-Crate

期刊： DATA INTELLIGENCE
分类： 计算机科学 >> 计算机科学的集成理论
引用： ChinaXiv:202211.00437 (或此版本 ChinaXiv:202211.00437V1)
DOI:10.1162/dint_a_00134
CSTR:32003.36.ChinaXiv.202211.00437.V1
推荐引用方式： Alex, Hardisty,Paul, Brack,Carole, Goble,Laurence, Livermore,Ben, Scott,Quentin, Groom,Stuart, Owen, Stian, Soiland-Reyes.(2022).The Specimen Data Refinery: A Canonical Workflow Framework and FAIR Digital Object Approach to Speeding up Digital Mobilisation of Natural History Collections.数据智能（英文）.doi:10.1162/dint_a_00134 (点此复制)

版本历史

[V1]

2022-11-28 20:32:50

ChinaXiv:202211.00437V1

下载全文

1. A Conversation with ChatGPT: The Media and Communications Industry in the Age of AI	2023-10-25
2. A Conversation with ChatGPT: Digital Government Transformation in the Age of AI	2023-10-23
3. A Conversation with ChatGPT: Scientific Research in the Age of AI	2023-09-22
4. 从“拟人归因”到“联盟建立”：人与聊天机器人关系对参与度的影响	2023-04-03
5. An Improved YOLOv5-Based Method for UAV Object Detection	2023-03-23
6. Paving the Way to Open Data	2022-11-29
7. Playing Well on the Data FAIRground: Initiatives and Infrastructure in Research Data Management	2022-11-29
8. Knowledge Representation and Reasoning for Complex Time Expression in Clinical Text	2022-11-28
9. A Workflow Demonstrator for Processing Catalysis Research Data	2022-11-28
10. A Semantic Approach to Workflow Management and Reuse for Research Problem Solving	2022-11-28


公开评论匿名评论仅发给作者

The Specimen Data Refinery: A Canonical Workflow Framework and FAIR Digital Object Approach to Speeding up Digital Mobilisation of Natural History Collections 后印本

版本历史

相关论文推荐