ChinaXiv.org 中国科学院科技论文预发布平台

按提交时间

2022
3

按主题分类

计算机科学的集成理论
3

按作者

按机构

当前资源共 3条

隐藏摘要

点击量

时间

下载量

您选择的条件: Zhiming, Zhao

1. ChinaXiv:202211.00430
下载全文

Scaling Notebooks as Re-configurable Cloud Workflows

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-28 合作期刊: 《数据智能（英文）》

Yuandou, Wang Spiros, Koulouzis Riccardo, Bianchi Na, Li Yifang, Shi Joris, Timmermans W. Daniel, Kissling Zhiming, Zhao

摘要： Literate computing environments, such as the Jupyter (i.e., Jupyter Notebooks, JupyterLab, and JupyterHub), have been widely used in scientific studies; they allow users to interactively develop scientific code, test algorithms, and describe the scientific narratives of the experiments in an integrated document. To scale up scientific analyses, many implemented Jupyter environment architectures encapsulate the whole Jupyter notebooks as reproducible units and autoscale them on dedicated remote infrastructures (e.g., high#2; performance computing and cloud computing environments). The existing solutions are still limited in many ways, e.g., 1) the workflow (or pipeline) is implicit in a notebook, and some steps can be generically used by different code and executed in parallel, but because of the tight cell structure, all steps in the Jupyter notebook have to be executed sequentially and lack of the flexibility of reusing the core code fragments, and 2) there are performance bottlenecks that need to improve the parallelism and scalability when handling extensive input data and complex computation. In this work, we focus on how to manage the workflow in a notebook seamlessly. We 1) encapsulate the reusable cells as RESTful services and containerize them as portal components, 2) provide a composition tool for describing workflow logic of those reusable components, and 3) automate the execution on remote cloud infrastructure. Empirically, we validate the solutions usability via a use case from the Ecology and Earth Science domain, illustrating the processing of massive Light Detection and Ranging (LiDAR) data. The demonstration and analysis show that our method is feasible, but that it needs further improvement, especially on integrating distributed workflow scheduling, automatic deployment, and execution to develop as a mature approach.

点击量 1769 下载量 285 评论 0
2. ChinaXiv:202211.00439
下载全文

Canonical Workflows to Make Data FAIR

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-28 合作期刊: 《数据智能（英文）》

Peter, Wittenburg Alex, Hardisty Yann, Le Franc Amirpasha, Mozaffari Limor, Peer Nikolay, A. Skvortsov Zhiming, Zhao Alessandro, Spinuso

摘要： The FAIR principles have been accepted globally as guidelines for improving data-driven science and data management practices, yet the incentives for researchers to change their practices are presently weak. In addition, data-driven science has been slow to embrace workflow technology despite clear evidence of recurring practices. To overcome these challenges, the Canonical Workflow Frameworks for Research (CWFR) initiative suggests a large-scale introduction of self-documenting workflow scripts to automate recurring processes or fragments thereof. This standardised approach, with FAIR Digital Objects as anchors, will be a significant milestone in the transition to FAIR data without adding additional load onto the researchers who stand to benefit most from it. This paper describes the CWFR approach and the activities of the CWFR initiative over the course of the last year or so, highlights several projects that hold promise for the CWFR approaches, including Galaxy, Jupyter Notebook, and RO Crate, and concludes with an assessment of the state of the field and the challenges ahead.

点击量 636 下载量 154 评论 0
3. ChinaXiv:202211.00449
下载全文

Editors’ Note: Special Issue on Canonical Workflow Frameworks for Research

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-28 合作期刊: 《数据智能（英文）》

Peter, Wittenburg Alex, Hardisty Amirpasha, Mozzafari Limor, Peer Nikolay, Skvortsov Alessandro, Spinuso Zhiming, Zhao

摘要： This special issue is on Canonical Workflow Frameworks for Research (CWFR). A workflow refers to a sequence of activities, which may be more or less computer-based, used with regularity in the research process. CWFR aim to identify common patterns in such scientifically motivated workflows and to offer libraries of components based on FAIR Digital Objects as the integrative standard. Such CWFR components can be reusable independent of particular technologies, benefitting researchers in their daily work by making recurring activities more efficient, using automated workflow methods that would immediately create FAIR compliant data without adding burden.

点击量 467 下载量 158 评论 0

Scaling Notebooks as Re-configurable Cloud Workflows

Canonical Workflows to Make Data FAIR

Editors’ Note: Special Issue on Canonical Workflow Frameworks for Research