Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2022-11-28 Cooperative journals: 《数据智能(英文)》
Abstract: The FAIR principles have been accepted globally as guidelines for improving data-driven science and data management practices, yet the incentives for researchers to change their practices are presently weak. In addition, data-driven science has been slow to embrace workflow technology despite clear evidence of recurring practices. To overcome these challenges, the Canonical Workflow Frameworks for Research (CWFR) initiative suggests a large-scale introduction of self-documenting workflow scripts to automate recurring processes or fragments thereof. This standardised approach, with FAIR Digital Objects as anchors, will be a significant milestone in the transition to FAIR data without adding additional load onto the researchers who stand to benefit most from it. This paper describes the CWFR approach and the activities of the CWFR initiative over the course of the last year or so, highlights several projects that hold promise for the CWFR approaches, including Galaxy, Jupyter Notebook, and RO Crate, and concludes with an assessment of the state of the field and the challenges ahead.
Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2022-11-28 Cooperative journals: 《数据智能(英文)》
Abstract: Modern interactive tools for data analysis and visualisation are designed to expose their functionalities as a service through the Web. We present in this paper a Web API (SWIRRL) that allows Virtual Research Environments (VREs) to easily integrate such tools in their websites and re-purpose them to their users. The API deals, on behalf of the clients, with the underlying complexity of allocating and managing resources within a target cloud platform. By combining storage and containerised services, offering analysis notebooks and other visualisation software, the API creates dedicated working sessions on-demand, which can be accessed collaboratively. Thanks to the API’s support for workflow execution, SWIRRL workspaces can be automatically populated with data of interest collected from external data providers. The system keeps track of updates and changes affecting the data and the tools by adopting versioning and standard provenance technologies. Users are provided with interactive controls enabling traceability and recovery actions, including the possibility of creating executable snapshots of their environments. SWIRRL is built in cooperation with two research infrastructures in the field of solid earth science and climate data modeling. We report on the particular adoptions and use cases.
Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2022-11-28 Cooperative journals: 《数据智能(英文)》
Abstract: We present a set of configurable Web service and interactive tools, s-ProvFlow, for managing and exploiting records tracking data lineage during workflow runs. It facilitates detailed analysis of single executions. It helps users manage complex tasks by exposing the relationships between data, people, equipment and workflow runs intended to combine productively. Its logical model extends the PROV standard to precisely record parallel data-streaming applications. Its metadata handling encourages users to capture the application context by specifying how application attributes, often using standard vocabularies, should be added. These metadata records immediately help productivity as the interactive tools support their use in selection and bulk operations. Users rapidly appreciate the power of the encoded semantics as they reap the benefits. This improves the quality of provenance for users and management. Which in turn facilitates analysis of collections of runs, enabling users to manage results and validate procedures. It fosters reuse of data and methods and facilitates diagnostic investigations and optimisations. We present S-ProvFlow’s use by scientists, research engineers and managers as part of the DARE hyper-platform as they create, validate and use their data-driven scientific workflows.
Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2022-11-28 Cooperative journals: 《数据智能(英文)》
Abstract: This special issue is on Canonical Workflow Frameworks for Research (CWFR). A workflow refers to a sequence of activities, which may be more or less computer-based, used with regularity in the research process. CWFR aim to identify common patterns in such scientifically motivated workflows and to offer libraries of components based on FAIR Digital Objects as the integrative standard. Such CWFR components can be reusable independent of particular technologies, benefitting researchers in their daily work by making recurring activities more efficient, using automated workflow methods that would immediately create FAIR compliant data without adding burden.