Open Science and Data Science

作者： Peter, Wittenburg ¹
作者单位：

1. Max Planck Computing and Data Facility, Gießenbachstraße 2, 85748 Garching, Germany
通讯作者： Peter, Wittenburg Email:peter.wittenburg@mpcdf.mpg.de
提交时间：2022-11-27 14:46:16

摘要: Data Science (DS) as defined by Jim Gray is an emerging paradigm in all research areas to help finding non-obvious patterns of relevance in large distributed data collections. “Open Science by Design” (OSD), i.e., making artefacts such as data, metadata, models, and algorithms available and re-usable to peers and beyond as early as possible, is a pre-requisite for a flourishing DS landscape. However, a few major aspects can be identified hampering a fast transition: (1) The classical “Open Science by Publication” (OSP) is not sufficient any longer since it serves different functions, leads to non-acceptable delays and is associated with high curation costs. Changing data lab practices towards OSD requires more fundamental changes than OSP. 2) The classical publication-oriented models for metrics, mainly informed by citations, will not work anymore since the roles of contributors are more difficult to assess and will often change, i.e., other ways for assigning incentives and recognition need to be found. (3) The huge investments in developing DS skills and capacities by some global companies and strong countries is leading to imbalances and fears by different stakeholders hampering the acceptance of Open Science (OS). (4) Finally, OSD will depend on the availability of a global infrastructure fostering an integrated and interoperable data domain—“one data-domain” as George Strawn calls it—which is still not visible due to differences about the technological key pillars. OS therefore is a need for DS, but it will take much more time to implement it than we may have expected.

Open Science by Design Open Science by Publication Data Science Data infrastructure Digital Objects FAIR

期刊： DATA INTELLIGENCE
分类： 计算机科学 >> 计算机科学的集成理论
引用： ChinaXiv:202211.00407 (或此版本 ChinaXiv:202211.00407V1)
DOI:10.1162/dint_a_00082
CSTR:32003.36.ChinaXiv.202211.00407.V1
推荐引用方式： Peter, Wittenburg.(2022).Open Science and Data Science.DATA INTELLIGENCE.doi:10.1162/dint_a_00082 (点此复制)

版本历史

[V1]

2022-11-27 14:46:16

ChinaXiv:202211.00407V1

下载全文

相关论文推荐

1. Turing’s thinking machine and ’t Hooft’s principle of superposition of states	2024-05-14
2. Brief Discussion on Scenes and Strategies in Capital Markets Manipulation Detection: From Influence Diffusion Perspectives	2024-04-24
3. Guiding Large Language Models to Generate Computer-Parsable Content	2024-04-23
4. SteganoDDPM: A high-quality image steganography self-learning method using diffusion model	2024-04-23
5. Multimodal Physical Fitness Monitoring (PFM) Framework Based on TimeMAE-PFM in Wearable Scenarios	2024-04-07
6. Terrain Point Cloud Inpainting via Signal Decomposition	2024-04-05
7. Federated Learning based on Pruning and Recovery	2024-03-16
8. Application of Deep Learning Methods Combined with Physical Background in Wide Field of View Imaging Atmospheric Cherenkov Telescopes	2024-03-10
9. Does GPT-4 Play Dice?	2024-02-20
10. Confident Association for Long-term Tracking	2024-01-07


公开评论匿名评论仅发给作者