分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-16 合作期刊: 《数据智能(英文)》
摘要: The FAIR guiding principles aim to enhance the Findability, Accessibility, Interoperability and Reusability of digital resources such as data, for both humans and machines. The process of making data FAIR (FAIRification) can be described in multiple steps. In this paper, we describe a generic step-by-step FAIRification workflow to be performed in a multidisciplinary team guided by FAIR data stewards. The FAIRification workflow should be applicable to any type of data and has been developed and used for Bring Your Own Data (BYOD) workshops, as well as for the FAIRification of e.g., rare diseases resources. The steps are: 1) identify the FAIRification objective, 2) analyze data, 3) analyze metadata, 4) define semantic model for data (4a) and metadata (4b), 5) make data (5a) and metadata (5b) linkable, 6) host FAIR data, and 7) assess FAIR data. For each step we describe how the data are processed, what expertise is required, which procedures and tools can be used, and which FAIR principles they relate to.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-16 合作期刊: 《数据智能(英文)》
摘要: Data repository infrastructures for academics have appeared in waves since the dawn of Web technology. These waves are driven by changes in societal needs, archiving needs and the development of cloud computing resources. As such, the data repository landscape has many flavors when it comes to sustainability models, target audiences and feature sets. One thing that links all data repositories is a desire to make the content they host reusable, building on the core principles of cataloging content for economical and research speed efficiency. The FAIR principles are a common goal for all repository infrastructures to aim for. No matter what discipline or infrastructure, the goal of reusable content, for both humans and machines, is a common one. This is the first time that repositories can work toward a common goal that ultimately lends itself to interoperability. The idea that research can move further and faster as we un-silo these fantastic resources is an achievable one. This paper investigates the steps that existing repositories need to take in order to remain useful and relevant in a FAIR research world.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-16 合作期刊: 《数据智能(英文)》
摘要: Computational workflows describe the complex multi-step methods that are used for data collection, data preparation, analytics, predictive modelling, and simulation that lead to new data products. They can inherently contribute to the FAIR data principles: by processing data according to established metadata; by creating metadata themselves during the processing of data; and by tracking and recording data provenance. These properties aid data quality assessment and contribute to secondary data usage. Moreover, workflows are digital objects in their own right. This paper argues that FAIR principles for workflows need to address their specific nature in terms of their composition of executable software steps, their provenance, and their development.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-16 合作期刊: 《数据智能(英文)》
摘要: The FAIR principles describe characteristics intended to support access to and reuse of digital artifacts in the scientific research ecosystem. Persistent, globally unique identifiers, resolvable on the Web, and associated with a set of additional descriptive metadata, are foundational to FAIR data. Here we describe some basic principles and exemplars for their design, use and orchestration with other system elements to achieve FAIRness for digital research objects.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-27 合作期刊: 《数据智能(英文)》
摘要: The introduction of a new technology or innovation is often accompanied by ups and downs in its fortunes. Gartner Inc. defined a so-called hype cycle to describe a general pattern that many innovations experience: technology trigger, peak of inflated expectations, trough of disillusionment, slope of enlightenment, and plateau of productivity. This article will compare the ongoing introduction of Open Science (OS) with the hype cycle model and speculate on the relevance of that model to OS. Lest the title of this article mislead the reader, be assured that the author believes that OS should happen and that it will happen. However, I also believe that the path to OS will be longer than many of us had hoped. I will give a brief history of the todays semi-open science, define what I mean by OS, define the hype cycle and where OS is now on that cycle, and finally speculate what it will take to traverse the cycle and rise to its plateau of productivity (as described by Gartner).
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-16 合作期刊: 《数据智能(英文)》
摘要: One of the key goals of the FAIR guiding principles is defined by its final principle to optimize data sets for reuse by both humans and machines. To do so, data providers need to implement and support consistent machine readable metadata to describe their data sets. This can seem like a daunting task for data providers, whether it is determining what level of detail should be provided in the provenance metadata or figuring out what common shared vocabularies should be used. Additionally, for existing data sets it is often unclear what steps should be taken to enable maximal, appropriate reuse. Data citation already plays an important role in making data findable and accessible, providing persistent and unique identifiers plus metadata on over 16 million data sets. In this paper, we discuss how data citation and its underlying infrastructures, in particular associated metadata, provide an important pathway for enabling FAIR data reuse.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-16 合作期刊: 《数据智能(英文)》
摘要: Research data currently face a huge increase of data objects with an increasing variety of types (data types, formats) and variety of workflows by which objects need to be managed across their lifecycle by data infrastructures. Researchers desire to shorten the workflows from data generation to analysis and publication, and the full workflow needs to become transparent to multiple stakeholders, including research administrators and funders. This poses challenges for research infrastructures and user-oriented data services in terms of not only making data and workflows findable, accessible, interoperable and reusable, but also doing so in a way that leverages machine support for better efficiency. One primary need to be addressed is that of findability, and achieving better findability has benefits for other aspects of data and workflow management. In this article, we describe how machine capabilities can be extended to make workflows more findable, in particular by leveraging the Digital Object Architecture, common object operations and machine learning techniques
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-18 合作期刊: 《数据智能(英文)》
摘要: Data availability statements can provide useful information about how researchers actually share research data. We used unsupervised machine learning to analyze 124,000 data availability statements submitted by research authors to 176 Wiley journals between 2013 and 2019. We categorized the data availability statements, and looked at trends over time. We found expected increases in the number of data availability statements submitted over time, and marked increases that correlate with policy changes made by journals. Our open data challenge becomes to use what we have learned to present researchers with relevant and easy options that help them to share and make an impact with new research data.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-18 合作期刊: 《数据智能(英文)》
摘要: The FAIR principles were received with broad acceptance in several scientific communities. However, there is still some degree of uncertainty on how they should be implemented. Several self-report questionnaires have been proposed to assess the implementation of the FAIR principles. Moreover, the FAIRmetrics group released 14, general-purpose maturity for representing FAIRness. Initially, these metrics were conducted as open-answer questionnaires. Recently, these metrics have been implemented into a software that can automatically harvest metadata from metadata providers and generate a principle-specific FAIRness evaluation. With so many different approaches for FAIRness evaluations, we believe that further clarification on their limitations and advantages, as well as on their interpretation and interplay should be considered.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-18 合作期刊: 《数据智能(英文)》
摘要: This article investigates expansion of the Internet of FAIR Data and Services (IFDS) to Africa, through the three GO FAIR pillars: GO CHANGE, GO BUILD and GO TRAIN. Introduction of the IFDS in Africa has a focus on digital health. Two examples of introducing FAIR are compared: a regional initiative for digital health by governments in the East Africa Community (EAC) and an initiative by a local health provider (Solidarmed) in collaboration with Great Zimbabwe University in Zimbabwe. The obstacles to introducing FAIR are identified as underrepresentation of data from Africa in IFDS at this moment, the lack of explicit recognition of situational context of research in FAIR at present and the lack of acceptability of FAIR as a foreign and European invention which affects acceptance. It is envisaged that FAIR has an important contribution to solve fragmentation in digital health in Africa, and that any obstacles concerning African participation, context relevance and acceptance of IFDS need to be removed. This will require involvement of African researchers and ICT-developers so that it is driven by local ownership. Assessment of ecological validity in FAIR principles would ensure that the context specificity of research is reflected in the FAIR principles. This will help enhance the acceptance of the FAIR Guidelines in Africa and will help strengthen digital health research and services.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-16 合作期刊: 《数据智能(英文)》
摘要: Thousands of community-developed (meta)data guidelines, models, ontologies, schemas and formats have been created and implemented by several thousand data repositories and knowledge-bases, across all disciplines. These resources are necessary to meet government, funder and publisher expectations of greater transparency and access to and preservation of data related to research publications. This obligates researchers to ensure their data is FAIR, share their data using the appropriate standards, store their data in sustainable and community-adopted repositories, and to conform to funder and publisher data policies. FAIR data sharing also plays a key role in enabling researchers to evaluate, re-analyse and reproduce each others work. We can map the landscape of relationships between community-adopted standards and repositories, and the journal publisher and funder data policies that recommend their use. In this paper, we show how the work of the GO-FAIR FAIR Standards, Repositories and Policies (StRePo) Implementation Network serves as a central integration and cross-fertilisation point for the reuse of FAIR standards, repositories and data policies in general. Pivotal to this effort, the FAIRsharing, an endorsed flagship resource of the Research Data Alliance that maps the landscape of relationships between community-adopted standards and repositories, and the journal publisher and funder data policies that recommend their use. Lastly, we highlight a number of activities around FAIR tools, services and educational efforts to raise awareness and encourage participation.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-16 合作期刊: 《数据智能(英文)》
摘要: Since their publication in 2016 we have seen a rapid adoption of the FAIR principles in many scientific disciplines where the inherent value of research data and, therefore, the importance of good data management and data stewardship, is recognized. This has led to many communities asking What is FAIR? and How FAIR are we currently?, questions which were addressed respectively by a publication revisiting the principles and the emergence of FAIR metrics. However, early adopters of the FAIR principles have already run into the next question: How can we become (more) FAIR? This question is more difficult to answer, as the principles do not prescribe any specific standard or implementation. Moreover, there does not yet exist a mature ecosystem of tools, platforms and standards to support human and machine agents to manage, produce, publish and consume FAIR data in a user-friendly and efficient (i.e., easy) way. In this paper we will show, however, that there are already many emerging examples of FAIR tools under development. This paper puts forward the position that we are likely already in a creolization phase where FAIR tools and technologies are merging and combining, before converging in a subsequent phase to solutions that make FAIR feasible in daily practice.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-18 合作期刊: 《数据智能(英文)》
摘要: This article explores the global implementation of the FAIR Guiding Principles for scientific management and data stewardship, which provide that data should be findable, accessible, interoperable and reusable. The implementation of these principles is designed to lead to the stewardship of data as FAIR digital objects and the establishment of the Internet of FAIR Data and Services (IFDS). If implementation reaches a tipping point, IFDS has the potential to revolutionize how data is managed by making machine and human readable data discoverable for reuse. Accordingly, this article examines the expansion of the implementation of FAIR Guiding Principles, especially how and in which geographies (locations) and areas (topic domains) implementation is taking place. A literature review of academic articles published between 2016 and 2019 on the use of FAIR Guiding Principles is presented. The investigation also includes an analysis of the domains
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-28 合作期刊: 《数据智能(英文)》
摘要: The investigation proposes the application of an ontological semantic approach to describing workflow control patterns, research workflow step patterns, and the meaning of the workflows in terms of domain knowledge. The approach can provide wide opportunities for semantic refinement, reuse, and composition of workflows. Automatic reasoning allows verifying those compositions and implementations and provides machine-actionable workflow manipulation and problem-solving using workflows. The described approach can take into account the implementation of workflows in different workflow management systems, the organization of workflows collections in data infrastructures and the search for them, the semantic approach to the selection of workflows and resources in the research domain, the creation of research step patterns and their implementation reusing fragments of existing workflows, the possibility of automation of problem#2; solving based on the reuse of workflows. The application of the approach to CWFR conceptions is proposed.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-18 合作期刊: 《数据智能(英文)》
摘要: The FAIR data guiding principles have been recently developed and widely adopted to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets in the face of an exponential increase of data volume and complexity. The FAIR data principles have been formulated on a general level and the technological implementation of these principles remains up to the industries and organizations working on maximizing the value of their data. Here, we describe the data management and curation methodologies and best practices developed for FAIRification of clinical exploratory biomarker data collected from over 250 clinical studies. We discuss the data curation effort involved, the resulting output, and the business and scientific impact of our work. Finally, we propose prospective planning for FAIR data to optimize data management efforts and maximize data value.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-16 合作期刊: 《数据智能(英文)》
摘要: The last letter of the FAIR acronym stands for Reusability. Data and metadata should be made available with a clear and accessible usage license. But, what are the choices? How can researchers share data and allow reusability? Are all the licenses available for sharing content suitable for data? Data can be covered by different layers of copyright protection making the relationship between data and copyright particularly complex. Some research data can be considered as a work and therefore covered by full copyright while other data can be in the public domain due to their lack of originality. Moreover, a collection of data can be protected by special rights in Europe to acknowledge the investment in time and money in obtaining, presenting, arranging or verifying the data. The need of using a license when sharing data comes from the fact that, under current copyright laws, when rights exist, the absence of any legal notice must be understood as the default all rights reserved regime. Unless an exception applies, the authorisation of right holders is necessary for reuse. Right holders could use any text to state the reusability of data but it is advisable to use some of the existing licenses, and especially the ones that are suitable for data and databases. We hope that with this paper we can bring some clarity in relation to the rights involved when sharing research data