ChinaXiv.org 中国科学院科技论文预发布平台

按提交时间

2022
12
2016
1

按主题分类

按作者

按机构

Leiden University, Rapenburg 70, 2311 EZ Leiden, The Netherlands
1
Bioinformatics Group, University of Freiburg, Baden-Württemberg 79098, Germany
1
California Digital Library, Oakland, California 94612-2901, USA
1
Chair Informatik 5, RWTH Aachen University, 52056 Aachen, Germany2Fraunhofer Institute for Applied Information Techniques , 53757 Sankt Augustin, Germany3Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, 7500AE Enschede, The Netherlands4Department of Human Genetics, Leiden University Medical Centre, Leiden 2333 ZA, The Netherlands5Institute of Medical Information, Faculty of Medicine & University Hospital Cologne, University of Cologne, 50674 Cologne, Germany
1
Corporation for National Research Initiatives (CNRI), Reston, Virginia 20191, USA
1
Corporation for National Research Initiatives, Reston, Virginia 20191, USA
1
Data Science Institute, University of Virginia, Charlottesville, VA 22903-1738, USA
1
Department of Computer Science, The University of Manchester, Oxford Road, Manchester M13 9PL, UK
1
Department of Geosciences, University of Oslo, Oslo 0316, Norway
1
Department of Human Genetics, Leiden University Medical Centre, Leiden 2333 ZA, The Netherlands
1
Department of Informatics, University of Bergen Ringgold standard institution, University of Bergen, Bergen, Hordaland 5008, Norway
1
Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
1
Development Sciences Informatics, Genentech Inc., South San Francisco, CA 94080-4990, USA
1
Development Sciences OMNI-Biomarker Development, Genentech Inc., South San Francisco, CA 94080-4990, USA
1
Earthwatch, Mayfield House, 256 Banbury Road, Oxford, Oxfordshire OX2 7DE, UK
1
European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, UK
1
European Citizen Science Association, Invalidenstraße 41, Berlin 10115, Germany
1
Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, 7500AE Enschede, The Netherlands
1
Figshare, Crinan Street, London, N1 9XW, UK
1
Fondazione per la Ricerca Farmacologica Gianni Benzi Onlus, 30 – 70010 Valenzano (BA), Italy
1
Fraunhofer Institute for Applied Information Techniques (FIT), 53757 Sankt Augustin, Germany
1
GESIS – Leibniz Institute for the Social Sciences, Unter Sachsenhausen 6-8, 50667, Cologne, Germany
1
GO FAIR International Support & Coordination Office (GFISCO), Leiden, The Netherlands
1
IBM Research, Cambridge MA 02142, USA
1
Ibercivis Foundation, Campus Río Ebro, C/ Mariano Esquillor s/n Edificio I+D, Zaragoza 50018, Spain
1
Institut de Ciències del Mar, Consejo Superior de Investigaciones Científicas, Passeig Marítim de la Barceloneta, 37-49, Barcelona 08003, Spain
1
Institute of Medical Information, Faculty of Medicine & University Hospital Cologne, University of Cologne, 50674 Cologne, Germany
1
Keith G Jeffery Consultants, 71 Gilligans Way, Faringdon SN7 7FX, UK
1
Learning and Research Resources Centre (CRAI), Universitat de Barcelona, Catalunya 08007, Spain
1
Leiden University Medical Center, Leiden, 2333 ZA, The Netherlands
1
Leiden University Medical Centre, Poortgebouw N-01, Rijnsburgerweg 10 2333 AA Leiden, The Netherlands
1
Mailman School of Public Health, Columbia University, New York NY 10032, USA
1
Max Planck Computing and Data Facility, Gießenbachstraße 2, 85748 Garching, Germany
1
National Space Science Center, Chinese Academy of Sciences, Beijing, China
1
Rancho BioSciences LLC., San Diego, CA 92127, USA2Development Sciences Informatics, Genentech Inc., South San Francisco, CA 94080-4990, USA3Development Sciences OMNI-Biomarker Development, Genentech Inc., South San Francisco, CA 94080-4990, USA
1
School of Information Science, Federal University of Minas Gerais, Belo Horizonte 31270-901, MG, Brazil
1
Technische Hochschule Wildau, Hochschulring 1, 15745 Wildau, Germany
1
Technology and Engineering Centre for Space Utilization, Chinese Academy of Sciences, Beijing, China
1
Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy NY 12180, USA2School of Information Science, Federal University of Minas Gerais, Belo Horizonte 31270-901, MG, Brazil3Mailman School of Public Health, Columbia University, New York NY 10032, USA4IBM Research, Cambridge MA 02142, USA
1
The DOI Foundation, c/o EDItEUR, London N7 9DP, United Kingdom
1
US National Academy of Sciences, Washington DC 20418, USA
1
VIB, Gent, Oost-Vlaanderen 9052, Belgium
1
Woodrow Wilson International Center for Scholars, 1300 Pennsylvania Ave., Washington, DC, District of Columbia 20004, United States
1
ZBW – Leibniz Information Centre for Economics, 24105 Kiel/Neuer Jungfernstieg 21, 20354 Hamburg, Germany
1

当前资源共 13条

隐藏摘要

点击量

时间

下载量

1. ChinaXiv:202211.00411
下载全文

Open Science—A Question of Trust

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-27 合作期刊: 《数据智能（英文）》

Jonathan, Clark

摘要： Collaboration and the sharing of knowledge is at the heart of Open Science (OS). However, we need to know that the knowledge we find and share is really what it purports to be; and we need to know that the authors we hope to collaborate with are really the people they claim to be. In this paper, the author argues that a prerequisite for OS is trust and that persistent identifiers help to build that trust. The persistent identifier systems must themselves be trustworthy and they must be able to connect the user or their machine to the information they need now and into the future. Infrastructure is rather like plumbing: It goes unnoticed and unappreciated until it fails. This paper puts infrastructure for persistent identifiers in the spotlight as a core component of OS.

点击量 1058 下载量 346 评论 0
2. ChinaXiv:202211.00434
下载全文

Galaxy: A Decade of Realising CWFR Concepts

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-28 合作期刊: 《数据智能（英文）》

Beatriz, Serrano-Solano Anne, Fouilloux Ignacio, Eguinoa Matúsˇ, Kalas Björn, Grüning Frederik, Coppens

摘要： Despite recent encouragement to follow the FAIR principles, the day-to-day research practices have not changed substantially. Due to new developments and the increasing pressure to apply best practices, initiatives to improve the efficiency and reproducibility of scientific workflows are becoming more prevalent. In this article, we discuss the importance of well-annotated tools and the specific requirements to ensure reproducible research with FAIR outputs. We detail how Galaxy, an open-source workflow management system with a web-based interface, has implemented the concepts that are put forward by the Canonical Workflow Framework for Research (CWFR), whilst minimising changes to the practices of scientific communities. Although we showcase concrete applications from two different domains, this approach is generalisable to any domain and particularly useful in interdisciplinary research and science-based applications.

点击量 1569 下载量 359 评论 0
3. ChinaXiv:202211.00409
下载全文

On the Complexities of Federating Research Data Infrastructures

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-27 合作期刊: 《数据智能（英文）》

Atif, Latif Fidan, Limani Klaus, Tochtermann

摘要： Federated Research Data Infrastructures aim to provide seamless access to research data along with services to facilitate the researchers in performing their data management tasks. During our research on Open Science (OS), we have built cross-disciplinary federated infrastructures for different types of (open) digital resources: Open Data (OD), Open Educational Resources (OER), and open access documents. In each case, our approach targeted only the resource metadata. Based on this experience, we identified some challenges that we had to overcome again and again: lack of (i) harvesters, (ii) common metadata models and (iii) metadata mapping tools. In this paper, we report on the challenges we faced in the federated infrastructure projects we were involved with. We structure the report based on the three challenges listed above.

点击量 668 下载量 274 评论 0
4. ChinaXiv:202211.00170
下载全文

How to (Easily) Extend the FAIRness of Existing Repositories

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-16 合作期刊: 《数据智能（英文）》

Hahnel, Mark Valen, Dan

摘要： Data repository infrastructures for academics have appeared in waves since the dawn of Web technology. These waves are driven by changes in societal needs, archiving needs and the development of cloud computing resources. As such, the data repository landscape has many flavors when it comes to sustainability models, target audiences and feature sets. One thing that links all data repositories is a desire to make the content they host reusable, building on the core principles of cataloging content for economical and research speed efficiency. The FAIR principles are a common goal for all repository infrastructures to aim for. No matter what discipline or infrastructure, the goal of reusable content, for both humans and machines, is a common one. This is the first time that repositories can work toward a common goal that ultimately lends itself to interoperability. The idea that research can move further and faster as we un-silo these fantastic resources is an achievable one. This paper investigates the steps that existing repositories need to take in order to remain useful and relevant in a FAIR research world.

点击量 1202 下载量 385 评论 0
5. ChinaXiv:202211.00194
下载全文

Unique, Persistent, Resolvable: Identifiers as the Foundation of FAIR

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-16 合作期刊: 《数据智能（英文）》

Juty, Nick Wimalaratne, Sarala M. Soiland-Reyes, Stian Kunze, John Goble, Carole A. Clark, Tim

摘要： The FAIR principles describe characteristics intended to support access to and reuse of digital artifacts in the scientific research ecosystem. Persistent, globally unique identifiers, resolvable on the Web, and associated with a set of additional descriptive metadata, are foundational to FAIR data. Here we describe some basic principles and exemplars for their design, use and orchestration with other system elements to achieve FAIRness for digital research objects.

点击量 848 下载量 298 评论 0
6. ChinaXiv:201605.00265
下载全文

An Ethernet interface solution of space science experiment payloads

分类：地球科学 >> 空间物理学提交时间： 2016-05-04

Jia, Tianxiang Dong, Wenbo

摘要： In order to dynamicly monitor and manage the working state of space science experiment payloads, the large volume of science data should be performed for real-time transmission. According to this requirement, the paper proposes a design scheme of main information network based on Ethernet. A microprocessor TMS320F2812 and the Ethernet interface chip KSZ8851 are applied to set up the Ethernet communication module of the payloads. The paper illuminates the hardware realization of the scheme with these two chips, as well as the software which is composed of drivers of transceiver register, the data definition of the telemetering data format and the data packing upon UDP/IP protocol. UDP protocol allows the fastest and simple way of transmitting data to the receiver. CCSDS packet format as a standard protocol in space data transformation is adopted in the UDP data packets. The experiment results show that the communication module can successfully transmit the data between Science Experiment Payloads and LAPTOP(or Data management unit) with a good real-time performance in the simple point-to-point circumstance. �2014 IEEE.

同行评议状态:待评议

点击量 1691 下载量 920 评论 0
7. ChinaXiv:202211.00341
下载全文

FAIR Science for Social Machines: Let’s Share Metadata Knowlets in the Internet of FAIR Data and Services

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-25 合作期刊: 《数据智能（英文）》

Barend, Mons

摘要： In a world awash with fragmented data and tools, the notion of Open Science has been gaining a lot of momentum, but simultaneously, it caused a great deal of anxiety. Some of the anxiety may be related to crumbling kingdoms, but there are also very legitimate concerns, especially about the relative role of machines and algorithms as compared to humans and the combination of both (i.e., social machines). There are also grave concerns about the connotations of the term open, but also regarding the unwanted side effects as well as the scalability of the approaches advocated by early adopters of new methodological developments. Many of these concerns are associated with mind-machine interaction and the critical role that computers are now playing in our day to day scientific practice. Here we address a number of these concerns and provide some possible solutions. FAIR (machine-actionable) data and services are obviously at the core of Open Science (or rather FAIR science). The scalable and transparent routing of data, tools and compute (to run the tools on) is a key central feature of the envisioned Internet of FAIR Data and Services (IFDS). Both the European Commission in its Declaration on the European Open Science Cloud, the G7, and the USA data commons have identified the need to ensure a solid and sustainable infrastructure for Open Science. Here we first define the term FAIR science as opposed to Open Science. In FAIR science, data and the associated tools are all Findable, Accessible under well defined conditions, Interoperable and Reusable, but not necessarily open; without restrictions and certainly not always gratis. The ambiguous term open has already caused considerable confusion and also opt-out reactions from researchers and other data#2;intensive professionals who cannot make their data open for very good reasons, such as patient privacy or national security. Although Open Science is a definition for a way of working rather than explicitly requestingfor all data to be available in full Open Access, the connotation of openness of the data involved in Open Science is very strong. In FAIR science, data and the associated services to run all processes in the data stewardship cycle from design of experiment to capture to curation, processing, linking and analytics all have minimally FAIR metadata, which specify the conditions under which the actual underlying research objects are reusable, first for machines and then also for humans. This effectively means thatproperly conducted Open Science is part of FAIR science. However, FAIR science can also be done with partly closed, sensitive and proprietary data. As has been emphasized before, FAIR is not identical to open. In FAIR/Open Science, data should be as open as possible and as closed as necessary. Where data are generated using public funding, the default will usually be that for the FAIR data resulting from the study the accessibility will be as high as possible, and that more restrictive access and licensing policies on these data will have to be explicitly justified and described. In all cases, however, even if the reuse is restricted, data and related services should be findable for their major uses, machines, which will make them also much better findable for human users. With a tendency to make good data stewardship the norm, a very significant new market for distributed data analytics and learning is opening and a plethora of tools and reusable data objects are being developed and released. These all need FAIR metadata to be routed to each other and to be effective.

点击量 1172 下载量 446 评论 0
8. ChinaXiv:202211.00403
下载全文

Not Ready for Convergence in Data Infrastructures

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-27 合作期刊: 《数据智能（英文）》

Keith, Jeffery Peter, Wittenburg Larry, Lannom George, Strawn Claudia, Biniossek Dirk, Betz Christophe, Blanchi

摘要： Much research is dependent on Information and Communication Technologies (ICT). Researchers in different research domains have set up their own ICT systems (data labs) to support their research, from data collection (observation, experiment, simulation) through analysis (analytics, visualisation) to publication. However, too frequently the Digital Objects (DOs) upon which the research results are based are not curated and thus neither available for reproduction of the research nor utilization for other (e.g., multidisciplinary) research purposes. The key to curation is rich metadata recording not only a description of the DO and the conditions of its use but also the provenance the trail of actions performed on the DO along the research workflow. There are increasing real-world requirements for multidisciplinary research. With DOs in domain#2;specific ICT systems (silos), commonly with inadequate metadata, such research is hindered. Despite wide agreement on principles for achieving FAIR (findable, accessible, interoperable, and reusable) utilization of research data, current practices fall short. FAIR DOs offer a way forward. The paradoxes, barriers and possible solutions are examined. The key is persuading the researcher to adopt best practices which implies decreasing the cost (easy to use autonomic tools) and increasing the benefit (incentives such as acknowledgement and citation) while maintaining researcher independence and flexibility.

点击量 1184 下载量 380 评论 0
9. ChinaXiv:202211.00192
下载全文

The A of FAIR - As Open as Possible, as Closed as Necessary

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-16 合作期刊: 《数据智能（英文）》

Landi, Annalisa Thompson, Mark Giannuzzi, Viviana Bonifazi, Fedele Labastida, Ignasi Santos, Luiz Olavo Bonino da Silva Roos, Marco

摘要： In order to provide responsible access to health data by reconciling benefits of data sharing with privacy rights and ethical and regulatory requirements, Findable, Accessible, Interoperable and Reusable (FAIR) metadata should be developed. According to the H2020 Program Guidelines on FAIR Data, data should be as open as possible and as closed as necessary, open in order to foster the reusability and to accelerate research, but at the same time they should be closed to safeguard the privacy of the subjects. Additional provisions on the protection of natural persons with regard to the processing of personal data have been endorsed by the European General Data Protection Regulation (GDPR), Reg (EU) 2016/679, that came into force in May 2018. This work aims to solve accessibility problems related to the protection of personal data in the digital era and to achieve a responsible access to and responsible use of health data. We strongly suggest associating each data set with FAIR metadata describing both the type of data collected and the accessibility conditions by considering data protection obligations and ethical and regulatory requirements. Finally, an existing FAIR infrastructure component has been used as an example to explain how FAIR metadata could facilitate data sharing while ensuring protection of individuals.

点击量 1028 下载量 396 评论 0
10. ChinaXiv:202211.00402
下载全文

EU-Citizen.Science: A Platform for Mainstreaming Citizen Science and Open Science in Europe

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-27 合作期刊: 《数据智能（英文）》

Katherin, Wagenknecht Tim, Woods Francisco García Sanz Margaret, Gold Anne, Bowser Simone, Rüfenacht Luigi, Ceccaroni Jaume, Piera

摘要： Citizen Science (CS) is a prominent field of application for Open Science (OS), and the two have strong synergies, such as: advocating for the data and metadata generated through science to be made publicly available [1]; supporting more equitable collaboration between different types of scientists and citizens; and facilitating knowledge transfer to a wider range of audiences [2]. While primarily targeted at CS, the EU-Citizen.Science platform can also support OS. One of its key functions is to act as a knowledge hub to aggregate, disseminate and promote experience and know-how; for example, by profiling CS projects and collecting tools, resources and training materials relevant to both fields. To do this, the platform has developed an information architecture that incorporates the public participation in scientific research (PPSR)Common Conceptual Model. This model consists of the Project Metadata Model, the Dataset Metadata Model and the Observation Data Model, which were specifically developed for CS initiatives. By implementing these, the platform will strengthen the interoperating arrangements that exist between other, similar platforms (e.g.,BioCollect and SciStarter) to ensure that CS and OS continue to grow globally in terms of participants, impact and fields of application.

点击量 1351 下载量 392 评论 0
11. ChinaXiv:202211.00211
下载全文

DAMS: A Distributed Analytics Metadata Schema

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-18 合作期刊: 《数据智能（英文）》

Welten, Sascha Neumann, Laurenz Yediel, Yeliz Ucer da Silva Santos, Luiz Olavo Bonino Decker, Stefan Beyan, Oya

摘要： In recent years, implementations enabling Distributed Analytics (DA) have gained considerable attention due to their ability to perform complex analysis tasks on decentralised data by bringing the analysis to the data. These concepts propose privacy-enhancing alternatives to data centralisation approaches, which have restricted applicability in case of sensitive data due to ethical, legal or social aspects. Nevertheless, the immanent problem of DA-enabling architectures is the black-box-alike behaviour of the highly distributed components originating from the lack of semantically enriched descriptions, particularly the absence of basic metadata for data sets or analysis tasks. To approach the mentioned problems, we propose a metadata schema for DA infrastructures, which provides a vocabulary to enrich the involved entities with descriptive semantics. We initially perform a requirement analysis with domain experts to reveal necessary metadata items, which represents the foundation of our schema. Afterwards, we transform the obtained domain expert knowledge into user stories and derive the most significant semantic content. In the final step, we enable machine-readability via RDF(S) and SHACL serialisations. We deploy our schema in a proof-of-concept monitoring dashboard to validate its contribution to the transparency of DA architectures. Additionally, we evaluate the schemas compliance with the FAIR principles. The evaluation shows that the schema succeeds in increasing transparency while being compliant with most of the FAIR principles. Because a common metadata model is critical for enhancing the compatibility between multiple DA infrastructures, our work lowers data access and analysis barriers. It represents an initial and infrastructure-independent foundation for the FAIRification of DA and the underlying scientific data management.

点击量 1360 下载量 425 评论 0
12. ChinaXiv:202211.00203
下载全文

Implementation of the FAIR Data Principles for Exploratory Biomarker Data from Clinical Trials

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-18 合作期刊: 《数据智能（英文）》

Arefolov, Alexander Adam, Laura Brown, Shoshana Budovskaya, Yelena Chen, Cong Das, Diya Farhy, Chen Ferguson, Rebecca Huang, Hongmei Kanigel, Kimberly Lu, Christina Polesskaya, Oksana Staton, Tracy Tajhya, Rajeev Whitley, Maryann Wong, Jee-Yeon Zeng, Xiangpei McCreary, Mark

摘要： The FAIR data guiding principles have been recently developed and widely adopted to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets in the face of an exponential increase of data volume and complexity. The FAIR data principles have been formulated on a general level and the technological implementation of these principles remains up to the industries and organizations working on maximizing the value of their data. Here, we describe the data management and curation methodologies and best practices developed for FAIRification of clinical exploratory biomarker data collected from over 250 clinical studies. We discuss the data curation effort involved, the resulting output, and the business and scientific impact of our work. Finally, we propose prospective planning for FAIR data to optimize data management efforts and maximize data value.

点击量 1307 下载量 339 评论 0
13. ChinaXiv:202211.00216
下载全文

The Semantic Data Dictionary - An Approach for Describing and Annotating Data

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-18 合作期刊: 《数据智能（英文）》

Rashid, Sabbir M. McCusker, James P. Pinheiro, Paulo Bax, Marcello P. Santos, Henrique Stingone, Jeanette A. Das, Amar K. McGuinness, Deborah L.

摘要： It is common practice for data providers to include text descriptions for each column when publishing data sets in the form of data dictionaries. While these documents are useful in helping an end-user properly interpret the meaning of a column in a data set, existing data dictionaries typically are not machine-readable and do not follow a common specification standard. We introduce the Semantic Data Dictionary, a specification that formalizes the assignment of a semantic representation of data, enabling standardization and harmonization across diverse data sets. In this paper, we present our Semantic Data Dictionary work in the context of our work with biomedical data; however, the approach can and has been used in a wide range of domains. The rendition of data in this form helps promote improved discovery, interoperability, reuse, traceability, and reproducibility. We present the associated research and describe how the Semantic Data Dictionary can help address existing limitations in the related literature. We discuss our approach, present an example by annotating portions of the publicly available National Health and Nutrition Examination Survey data set, present modeling challenges, and describe the use of this approach in sponsored research, including our work on a large National Institutes of Health (NIH)-funded exposure and health data portal and in the RPI-IBM collaborative Health Empowerment by Analytics, Learning, and Semantics project.We evaluate this work in comparison with traditional data dictionaries, mapping languages, and data integration tools.

点击量 1125 下载量 388 评论 0