ChinaXiv.org 中国科学院科技论文预发布平台

按提交时间

2022
3

按主题分类

计算机科学的集成理论
3

按作者

按机构

当前资源共 3条

隐藏摘要

点击量

时间

下载量

您选择的条件: Alex, Hardisty

1. ChinaXiv:202211.00437
下载全文

The Specimen Data Refinery: A Canonical Workflow Framework and FAIR Digital Object Approach to Speeding up Digital Mobilisation of Natural History Collections

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-28 合作期刊: 《数据智能（英文）》

Alex, Hardisty Paul, Brack Carole, Goble Laurence, Livermore Ben, Scott Quentin, Groom Stuart, Owen Stian, Soiland-Reyes

摘要： A key limiting factor in organising and using information from physical specimens curated in natural science collections is making that information computable, with institutional digitization tending to focus more on imaging the specimens themselves than on efficiently capturing computable data about them. Label data are traditionally manually transcribed today with high cost and low throughput, rendering such a task constrained for many collection-holding institutions at current funding levels. We show how computer vision, optical character recognition, handwriting recognition, named entity recognition and language translation technologies can be implemented into canonical workflow component libraries with findable, accessible, interoperable, and reusable (FAIR) characteristics. These libraries are being developed in a cloud#2; based workflow platformthe Specimen Data Refinery (SDR)founded on Galaxy workflow engine, Common Workflow Language, Research Object Crates (RO-Crate) and WorkflowHub technologies. The SDR can be applied to specimens labels and other artefacts, offering the prospect of greatly accelerated and more accurate data capture in computable form. Two kinds of FAIR Digital Objects (FDO) are created by packaging outputs of SDR workflows and workflow components as digital objects with metadata, a persistent identifier, and a specific type definition. The first kind of FDO are computable Digital Specimen (DS) objects that can be consumed/produced by workflows, and other applications. A single DS is the input data structure submitted to a workflow that is modified by each workflow component in turn to produce a refined DS at the end. The Specimen Data Refinery provides a library of such components that can be used individually, or in series. To cofunction, each library component describes the fields it requires from the DS and the fields it will in turn populate or enrich. The second kind of FDO, RO-Crates gather and archive the diverse set of digital and real-world resources, configurations, and actions (the provenance) contributing to a unit of research work, allowing that work to be faithfully recorded and reproduced. Here we describe the Specimen Data Refinery with its motivating requirements, focusing on what is essential in the creation of canonical workflow component libraries and its conformance with the requirements of an emerging FDO Core Specification being developed by the FDO Forum.

点击量 1046 下载量 271 评论 0
2. ChinaXiv:202211.00439
下载全文

Canonical Workflows to Make Data FAIR

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-28 合作期刊: 《数据智能（英文）》

Peter, Wittenburg Alex, Hardisty Yann, Le Franc Amirpasha, Mozaffari Limor, Peer Nikolay, A. Skvortsov Zhiming, Zhao Alessandro, Spinuso

摘要： The FAIR principles have been accepted globally as guidelines for improving data-driven science and data management practices, yet the incentives for researchers to change their practices are presently weak. In addition, data-driven science has been slow to embrace workflow technology despite clear evidence of recurring practices. To overcome these challenges, the Canonical Workflow Frameworks for Research (CWFR) initiative suggests a large-scale introduction of self-documenting workflow scripts to automate recurring processes or fragments thereof. This standardised approach, with FAIR Digital Objects as anchors, will be a significant milestone in the transition to FAIR data without adding additional load onto the researchers who stand to benefit most from it. This paper describes the CWFR approach and the activities of the CWFR initiative over the course of the last year or so, highlights several projects that hold promise for the CWFR approaches, including Galaxy, Jupyter Notebook, and RO Crate, and concludes with an assessment of the state of the field and the challenges ahead.

点击量 598 下载量 142 评论 0
3. ChinaXiv:202211.00449
下载全文

Editors’ Note: Special Issue on Canonical Workflow Frameworks for Research

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-28 合作期刊: 《数据智能（英文）》

Peter, Wittenburg Alex, Hardisty Amirpasha, Mozzafari Limor, Peer Nikolay, Skvortsov Alessandro, Spinuso Zhiming, Zhao

摘要： This special issue is on Canonical Workflow Frameworks for Research (CWFR). A workflow refers to a sequence of activities, which may be more or less computer-based, used with regularity in the research process. CWFR aim to identify common patterns in such scientifically motivated workflows and to offer libraries of components based on FAIR Digital Objects as the integrative standard. Such CWFR components can be reusable independent of particular technologies, benefitting researchers in their daily work by making recurring activities more efficient, using automated workflow methods that would immediately create FAIR compliant data without adding burden.

点击量 402 下载量 133 评论 0

The Specimen Data Refinery: A Canonical Workflow Framework and FAIR Digital Object Approach to Speeding up Digital Mobilisation of Natural History Collections

Canonical Workflows to Make Data FAIR

Editors’ Note: Special Issue on Canonical Workflow Frameworks for Research