Loading [MathJax]/extensions/TeX/noErrors.js
按提交时间
按主题分类
按作者
按机构
  • Open Science—A Question of Trust

    分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-27 合作期刊: 《数据智能(英文)》

    摘要: Collaboration and the sharing of knowledge is at the heart of Open Science (OS). However, we need to know that the knowledge we find and share is really what it purports to be; and we need to know that the authors we hope to collaborate with are really the people they claim to be. In this paper, the author argues that a prerequisite for OS is trust and that persistent identifiers help to build that trust. The persistent identifier systems must themselves be trustworthy and they must be able to connect the user or their machine to the information they need now and into the future. Infrastructure is rather like plumbing: It goes unnoticed and unappreciated until it fails. This paper puts infrastructure for persistent identifiers in the spotlight as a core component of OS.

  • Galaxy: A Decade of Realising CWFR Concepts

    分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-28 合作期刊: 《数据智能(英文)》

    摘要: Despite recent encouragement to follow the FAIR principles, the day-to-day research practices have not changed substantially. Due to new developments and the increasing pressure to apply best practices, initiatives to improve the efficiency and reproducibility of scientific workflows are becoming more prevalent. In this article, we discuss the importance of well-annotated tools and the specific requirements to ensure reproducible research with FAIR outputs. We detail how Galaxy, an open-source workflow management system with a web-based interface, has implemented the concepts that are put forward by the Canonical Workflow Framework for Research (CWFR), whilst minimising changes to the practices of scientific communities. Although we showcase concrete applications from two different domains, this approach is generalisable to any domain and particularly useful in interdisciplinary research and science-based applications.

  • On the Complexities of Federating Research Data Infrastructures

    分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-27 合作期刊: 《数据智能(英文)》

    摘要: Federated Research Data Infrastructures aim to provide seamless access to research data along with services to facilitate the researchers in performing their data management tasks. During our research on Open Science (OS), we have built cross-disciplinary federated infrastructures for different types of (open) digital resources: Open Data (OD), Open Educational Resources (OER), and open access documents. In each case, our approach targeted only the resource metadata. Based on this experience, we identified some challenges that we had to overcome again and again: lack of (i) harvesters, (ii) common metadata models and (iii) metadata mapping tools. In this paper, we report on the challenges we faced in the federated infrastructure projects we were involved with. We structure the report based on the three challenges listed above.

  • How to (Easily) Extend the FAIRness of Existing Repositories

    分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-16 合作期刊: 《数据智能(英文)》

    摘要: Data repository infrastructures for academics have appeared in waves since the dawn of Web technology. These waves are driven by changes in societal needs, archiving needs and the development of cloud computing resources. As such, the data repository landscape has many flavors when it comes to sustainability models, target audiences and feature sets. One thing that links all data repositories is a desire to make the content they host reusable, building on the core principles of cataloging content for economical and research speed efficiency. The FAIR principles are a common goal for all repository infrastructures to aim for. No matter what discipline or infrastructure, the goal of reusable content, for both humans and machines, is a common one. This is the first time that repositories can work toward a common goal that ultimately lends itself to interoperability. The idea that research can move further and faster as we un-silo these fantastic resources is an achievable one. This paper investigates the steps that existing repositories need to take in order to remain useful and relevant in a FAIR research world.

  • Unique, Persistent, Resolvable: Identifiers as the Foundation of FAIR

    分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-16 合作期刊: 《数据智能(英文)》

    摘要: The FAIR principles describe characteristics intended to support access to and reuse of digital artifacts in the scientific research ecosystem. Persistent, globally unique identifiers, resolvable on the Web, and associated with a set of additional descriptive metadata, are foundational to FAIR data. Here we describe some basic principles and exemplars for their design, use and orchestration with other system elements to achieve FAIRness for digital research objects.

  • An Ethernet interface solution of space science experiment payloads

    分类: 地球科学 >> 空间物理学 提交时间: 2016-05-04

    摘要: In order to dynamicly monitor and manage the working state of space science experiment payloads, the large volume of science data should be performed for real-time transmission. According to this requirement, the paper proposes a design scheme of main information network based on Ethernet. A microprocessor TMS320F2812 and the Ethernet interface chip KSZ8851 are applied to set up the Ethernet communication module of the payloads. The paper illuminates the hardware realization of the scheme with these two chips, as well as the software which is composed of drivers of transceiver register, the data definition of the telemetering data format and the data packing upon UDP/IP protocol. UDP protocol allows the fastest and simple way of transmitting data to the receiver. CCSDS packet format as a standard protocol in space data transformation is adopted in the UDP data packets. The experiment results show that the communication module can successfully transmit the data between Science Experiment Payloads and LAPTOP(or Data management unit) with a good real-time performance in the simple point-to-point circumstance. �2014 IEEE.

  • FAIR Science for Social Machines: Let’s Share Metadata Knowlets in the Internet of FAIR Data and Services

    分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-25 合作期刊: 《数据智能(英文)》

    摘要: In a world awash with fragmented data and tools, the notion of Open Science has been gaining a lot of momentum, but simultaneously, it caused a great deal of anxiety. Some of the anxiety may be related to crumbling kingdoms, but there are also very legitimate concerns, especially about the relative role of machines and algorithms as compared to humans and the combination of both (i.e., social machines). There are also grave concerns about the connotations of the term open, but also regarding the unwanted side effects as well as the scalability of the approaches advocated by early adopters of new methodological developments. Many of these concerns are associated with mind-machine interaction and the critical role that computers are now playing in our day to day scientific practice. Here we address a number of these concerns and provide some possible solutions. FAIR (machine-actionable) data and services are obviously at the core of Open Science (or rather FAIR science). The scalable and transparent routing of data, tools and compute (to run the tools on) is a key central feature of the envisioned Internet of FAIR Data and Services (IFDS). Both the European Commission in its Declaration on the European Open Science Cloud, the G7, and the USA data commons have identified the need to ensure a solid and sustainable infrastructure for Open Science. Here we first define the term FAIR science as opposed to Open Science. In FAIR science, data and the associated tools are all Findable, Accessible under well defined conditions, Interoperable and Reusable, but not necessarily open; without restrictions and certainly not always gratis. The ambiguous term open has already caused considerable confusion and also opt-out reactions from researchers and other data#2;intensive professionals who cannot make their data open for very good reasons, such as patient privacy or national security. Although Open Science is a definition for a way of working rather than explicitly requestingfor all data to be available in full Open Access, the connotation of openness of the data involved in Open Science is very strong. In FAIR science, data and the associated services to run all processes in the data stewardship cycle from design of experiment to capture to curation, processing, linking and analytics all have minimally FAIR metadata, which specify the conditions under which the actual underlying research objects are reusable, first for machines and then also for humans. This effectively means thatproperly conducted Open Science is part of FAIR science. However, FAIR science can also be done with partly closed, sensitive and proprietary data. As has been emphasized before, FAIR is not identical to open. In FAIR/Open Science, data should be as open as possible and as closed as necessary. Where data are generated using public funding, the default will usually be that for the FAIR data resulting from the study the accessibility will be as high as possible, and that more restrictive access and licensing policies on these data will have to be explicitly justified and described. In all cases, however, even if the reuse is restricted, data and related services should be findable for their major uses, machines, which will make them also much better findable for human users. With a tendency to make good data stewardship the norm, a very significant new market for distributed data analytics and learning is opening and a plethora of tools and reusable data objects are being developed and released. These all need FAIR metadata to be routed to each other and to be effective.

  • Not Ready for Convergence in Data Infrastructures

    分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-27 合作期刊: 《数据智能(英文)》

    摘要: Much research is dependent on Information and Communication Technologies (ICT). Researchers in different research domains have set up their own ICT systems (data labs) to support their research, from data collection (observation, experiment, simulation) through analysis (analytics, visualisation) to publication. However, too frequently the Digital Objects (DOs) upon which the research results are based are not curated and thus neither available for reproduction of the research nor utilization for other (e.g., multidisciplinary) research purposes. The key to curation is rich metadata recording not only a description of the DO and the conditions of its use but also the provenance the trail of actions performed on the DO along the research workflow. There are increasing real-world requirements for multidisciplinary research. With DOs in domain#2;specific ICT systems (silos), commonly with inadequate metadata, such research is hindered. Despite wide agreement on principles for achieving FAIR (findable, accessible, interoperable, and reusable) utilization of research data, current practices fall short. FAIR DOs offer a way forward. The paradoxes, barriers and possible solutions are examined. The key is persuading the researcher to adopt best practices which implies decreasing the cost (easy to use autonomic tools) and increasing the benefit (incentives such as acknowledgement and citation) while maintaining researcher independence and flexibility.

  • The A of FAIR - As Open as Possible, as Closed as Necessary

    分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-16 合作期刊: 《数据智能(英文)》

    摘要: In order to provide responsible access to health data by reconciling benefits of data sharing with privacy rights and ethical and regulatory requirements, Findable, Accessible, Interoperable and Reusable (FAIR) metadata should be developed. According to the H2020 Program Guidelines on FAIR Data, data should be as open as possible and as closed as necessary, open in order to foster the reusability and to accelerate research, but at the same time they should be closed to safeguard the privacy of the subjects. Additional provisions on the protection of natural persons with regard to the processing of personal data have been endorsed by the European General Data Protection Regulation (GDPR), Reg (EU) 2016/679, that came into force in May 2018. This work aims to solve accessibility problems related to the protection of personal data in the digital era and to achieve a responsible access to and responsible use of health data. We strongly suggest associating each data set with FAIR metadata describing both the type of data collected and the accessibility conditions by considering data protection obligations and ethical and regulatory requirements. Finally, an existing FAIR infrastructure component has been used as an example to explain how FAIR metadata could facilitate data sharing while ensuring protection of individuals.

  • EU-Citizen.Science: A Platform for Mainstreaming Citizen Science and Open Science in Europe

    分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-27 合作期刊: 《数据智能(英文)》

    摘要: Citizen Science (CS) is a prominent field of application for Open Science (OS), and the two have strong synergies, such as: advocating for the data and metadata generated through science to be made publicly available [1]; supporting more equitable collaboration between different types of scientists and citizens; and facilitating knowledge transfer to a wider range of audiences [2]. While primarily targeted at CS, the EU-Citizen.Science platform can also support OS. One of its key functions is to act as a knowledge hub to aggregate, disseminate and promote experience and know-how; for example, by profiling CS projects and collecting tools, resources and training materials relevant to both fields. To do this, the platform has developed an information architecture that incorporates the public participation in scientific research (PPSR)Common Conceptual Model. This model consists of the Project Metadata Model, the Dataset Metadata Model and the Observation Data Model, which were specifically developed for CS initiatives. By implementing these, the platform will strengthen the interoperating arrangements that exist between other, similar platforms (e.g.,BioCollect and SciStarter) to ensure that CS and OS continue to grow globally in terms of participants, impact and fields of application.

  • DAMS: A Distributed Analytics Metadata Schema

    分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-18 合作期刊: 《数据智能(英文)》

    摘要: In recent years, implementations enabling Distributed Analytics (DA) have gained considerable attention due to their ability to perform complex analysis tasks on decentralised data by bringing the analysis to the data. These concepts propose privacy-enhancing alternatives to data centralisation approaches, which have restricted applicability in case of sensitive data due to ethical, legal or social aspects. Nevertheless, the immanent problem of DA-enabling architectures is the black-box-alike behaviour of the highly distributed components originating from the lack of semantically enriched descriptions, particularly the absence of basic metadata for data sets or analysis tasks. To approach the mentioned problems, we propose a metadata schema for DA infrastructures, which provides a vocabulary to enrich the involved entities with descriptive semantics. We initially perform a requirement analysis with domain experts to reveal necessary metadata items, which represents the foundation of our schema. Afterwards, we transform the obtained domain expert knowledge into user stories and derive the most significant semantic content. In the final step, we enable machine-readability via RDF(S) and SHACL serialisations. We deploy our schema in a proof-of-concept monitoring dashboard to validate its contribution to the transparency of DA architectures. Additionally, we evaluate the schemas compliance with the FAIR principles. The evaluation shows that the schema succeeds in increasing transparency while being compliant with most of the FAIR principles. Because a common metadata model is critical for enhancing the compatibility between multiple DA infrastructures, our work lowers data access and analysis barriers. It represents an initial and infrastructure-independent foundation for the FAIRification of DA and the underlying scientific data management.

  • Implementation of the FAIR Data Principles for Exploratory Biomarker Data from Clinical Trials

    分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-18 合作期刊: 《数据智能(英文)》

    摘要: The FAIR data guiding principles have been recently developed and widely adopted to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets in the face of an exponential increase of data volume and complexity. The FAIR data principles have been formulated on a general level and the technological implementation of these principles remains up to the industries and organizations working on maximizing the value of their data. Here, we describe the data management and curation methodologies and best practices developed for FAIRification of clinical exploratory biomarker data collected from over 250 clinical studies. We discuss the data curation effort involved, the resulting output, and the business and scientific impact of our work. Finally, we propose prospective planning for FAIR data to optimize data management efforts and maximize data value.

  • The Semantic Data Dictionary - An Approach for Describing and Annotating Data

    分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-18 合作期刊: 《数据智能(英文)》

    摘要: It is common practice for data providers to include text descriptions for each column when publishing data sets in the form of data dictionaries. While these documents are useful in helping an end-user properly interpret the meaning of a column in a data set, existing data dictionaries typically are not machine-readable and do not follow a common specification standard. We introduce the Semantic Data Dictionary, a specification that formalizes the assignment of a semantic representation of data, enabling standardization and harmonization across diverse data sets. In this paper, we present our Semantic Data Dictionary work in the context of our work with biomedical data; however, the approach can and has been used in a wide range of domains. The rendition of data in this form helps promote improved discovery, interoperability, reuse, traceability, and reproducibility. We present the associated research and describe how the Semantic Data Dictionary can help address existing limitations in the related literature. We discuss our approach, present an example by annotating portions of the publicly available National Health and Nutrition Examination Survey data set, present modeling challenges, and describe the use of this approach in sponsored research, including our work on a large National Institutes of Health (NIH)-funded exposure and health data portal and in the RPI-IBM collaborative Health Empowerment by Analytics, Learning, and Semantics project.We evaluate this work in comparison with traditional data dictionaries, mapping languages, and data integration tools.