分类: 物理学 >> 核物理学 提交时间: 2025-03-02
摘要: Photoneutron data are increasingly used in basic research of nuclear physics and in applications of nuclear technology. China has long encountered a bottleneck in the independent measurement of photoneutron cross sections due to the lack of dedicated gamma sources. The Shanghai Laser Electron Gamma Source (SLEGS), based on laser Compton scattering (LCS), provides energy-tunable and quasi-monoenergetic gamma beams, opening up a new avenue for high-precision photonuclear research. This paper focuses on the Flat-Efficiency Detector (FED) array of SLEGS and its application in photoneutron cross section measurements. The systematic uncertainty of FED was verified to be 3.02% through the calibration of a 252Cf neutron source. Additionally, it employs 197Au and 159Tb as case studies to demonstrate the format and processing methods of raw photoneutron data. The results confirm the application potential of SLEGS in the measurement of photoneutron cross sections, with SLEGS capable of supporting the independent acquisition of photoneutron data in China.
分类: 物理学 >> 核物理学 提交时间: 2025-01-08
摘要: Photonuclear data are increasingly used in basic research of nuclear physics and application of nuclear technology. The generation of photonuclear data depends on advanced gamma source devices. SLEGS is a new Laser Compton Scattering (LCS) gamma source in Shanghai Synchrotron Radiation Facility (SSRF). It is a crucial beamline for photonuclear reaction cross section measurement and related dataset generation in China. Photonuclear data, including photoneutron, photo-proton, photo-alpha and photo-fission data, as well as the inelastically scattered photon (usually known as nuclear resonance fluorescence (NRF)) data, are useful in nuclear physics, nuclear astrophysics, polarization physics, and other related fields. SLEGS, with its monochromatic characteristics and Laser Compton Slant Scattering (LCSS) mode, offers unique features and methodologies for the measurement and analysis of photonuclear data. This article thoroughly explains the systematic uncertainties of the Flat-Efficiency Detector (FED) system. Additionally, it employs ^{197}Au and ^{159}Tb as case studies to demonstrate the format and processing methods of raw photoneutron data. The content is aimed at the reuse of data analysis.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-18 合作期刊: 《数据智能(英文)》
摘要: The FAIR data guiding principles have been recently developed and widely adopted to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets in the face of an exponential increase of data volume and complexity. The FAIR data principles have been formulated on a general level and the technological implementation of these principles remains up to the industries and organizations working on maximizing the value of their data. Here, we describe the data management and curation methodologies and best practices developed for FAIRification of clinical exploratory biomarker data collected from over 250 clinical studies. We discuss the data curation effort involved, the resulting output, and the business and scientific impact of our work. Finally, we propose prospective planning for FAIR data to optimize data management efforts and maximize data value.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-29 合作期刊: 《数据智能(英文)》
摘要: It is easy to argue that open data is critical to enabling faster and more effective research discovery. In this article, we describe the approach we have taken at Wiley to support open data and to start enabling more data to be FAIR data (Findable, Accessible, Interoperable and Reusable) with the implementation of four data policies: Encourages, Expects, Mandates and Mandates and Peer Reviews Data. We describe the rationale for these policies and levels of adoption so far. In the coming months we plan to measure and monitor the implementation of these policies via the publication of data availability statements and data citations. With this information, well be able to celebrate adoption of data-sharing practices by the research communities we work with and serve, and we hope to showcase researchers from those communities leading in open research.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-29 合作期刊: 《数据智能(英文)》
摘要: Over the past five years, Elsevier has focused on implementing FAIR and best practices in data management, from data preservation through reuse. In this paper we describe a series of efforts undertaken in this time to support proper data management practices. In particular, we discuss our journal data policies and their implementation, the current status and future goals for the research data management platform Mendeley Data, and clear and persistent linkages to individual data sets stored on external data repositories from corresponding published papers through partnership with Scholix. Early analysis of our data policies implementation confirms significant disparities at the subject level regarding data sharing practices, with most uptake within disciplines of Physical Sciences. Future directions at Elsevier include implementing better discoverability of linked data within an article and incorporating research data usage metrics.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-16 合作期刊: 《数据智能(英文)》
摘要: One of the key goals of the FAIR guiding principles is defined by its final principle to optimize data sets for reuse by both humans and machines. To do so, data providers need to implement and support consistent machine readable metadata to describe their data sets. This can seem like a daunting task for data providers, whether it is determining what level of detail should be provided in the provenance metadata or figuring out what common shared vocabularies should be used. Additionally, for existing data sets it is often unclear what steps should be taken to enable maximal, appropriate reuse. Data citation already plays an important role in making data findable and accessible, providing persistent and unique identifiers plus metadata on over 16 million data sets. In this paper, we discuss how data citation and its underlying infrastructures, in particular associated metadata, provide an important pathway for enabling FAIR data reuse.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-25 合作期刊: 《数据智能(英文)》
摘要: Data-intensive science is reality in large scientific organizations such as the Max Planck Society, but due to the inefficiency of our data practices when it comes to integrating data from different sources, many projects cannot be carried out and many researchers are excluded. Since about 80% of the time in data#2;intensive projects is wasted according to surveys we need to conclude that we are not fit for the challenges that will come with the billions of smart devices producing continuous streams of dataour methods do not scale. Therefore experts worldwide are looking for strategies and methods that have a potential for the future. The first steps have been made since there is now a wide agreement from the Research Data Alliance to the FAIR principles that data should be associated with persistent identifiers (PIDs) and metadata (MD). In fact after 20 years of experience we can claim that there are trustworthy PID systems already in broad use. It is argued, however, that assigning PIDs is just the first step. If we agree to assign PIDs and also use the PID to store important relationships such as pointing to locations where the bit sequences or different metadata can be accessed, we are close to defining Digital Objects (DOs) which could indeed indicate a solution to solve some of the basic problems in data management and processing. In addition to standardizing the way we assign PIDs, metadata and other state information we could also define a Digital Object Access Protocol as a universal exchange protocol for DOs stored in repositories using different data models and data organizations. We could also associate a type with each DO and a set of operations allowed working on its content which would facilitate the way to automatic processing which has been identified as the major step for scalability in data science and data industry. A globally connected group of experts is now working on establishing testbeds for a DO-based data infrastructure.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-16 合作期刊: 《数据智能(英文)》
摘要: Thousands of community-developed (meta)data guidelines, models, ontologies, schemas and formats have been created and implemented by several thousand data repositories and knowledge-bases, across all disciplines. These resources are necessary to meet government, funder and publisher expectations of greater transparency and access to and preservation of data related to research publications. This obligates researchers to ensure their data is FAIR, share their data using the appropriate standards, store their data in sustainable and community-adopted repositories, and to conform to funder and publisher data policies. FAIR data sharing also plays a key role in enabling researchers to evaluate, re-analyse and reproduce each others work. We can map the landscape of relationships between community-adopted standards and repositories, and the journal publisher and funder data policies that recommend their use. In this paper, we show how the work of the GO-FAIR FAIR Standards, Repositories and Policies (StRePo) Implementation Network serves as a central integration and cross-fertilisation point for the reuse of FAIR standards, repositories and data policies in general. Pivotal to this effort, the FAIRsharing, an endorsed flagship resource of the Research Data Alliance that maps the landscape of relationships between community-adopted standards and repositories, and the journal publisher and funder data policies that recommend their use. Lastly, we highlight a number of activities around FAIR tools, services and educational efforts to raise awareness and encourage participation.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-16 合作期刊: 《数据智能(英文)》
摘要: Research data currently face a huge increase of data objects with an increasing variety of types (data types, formats) and variety of workflows by which objects need to be managed across their lifecycle by data infrastructures. Researchers desire to shorten the workflows from data generation to analysis and publication, and the full workflow needs to become transparent to multiple stakeholders, including research administrators and funders. This poses challenges for research infrastructures and user-oriented data services in terms of not only making data and workflows findable, accessible, interoperable and reusable, but also doing so in a way that leverages machine support for better efficiency. One primary need to be addressed is that of findability, and achieving better findability has benefits for other aspects of data and workflow management. In this article, we describe how machine capabilities can be extended to make workflows more findable, in particular by leveraging the Digital Object Architecture, common object operations and machine learning techniques
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-27 合作期刊: 《数据智能(英文)》
摘要: In this paper, we present the virtual knowledge graph (VKG) paradigm for data integration and access, also known in the literature as Ontology-based Data Access. Instead of structuring the integration layer as a collection of relational tables, the VKG paradigm replaces the rigid structure of tables with the flexibility of graphs that are kept virtual and embed domain knowledge. We explain the main notions of this paradigm, its tooling ecosystem and significant use cases in a wide range of applications. Finally, we discuss future research directions.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-16 合作期刊: 《数据智能(英文)》
摘要: Data repository infrastructures for academics have appeared in waves since the dawn of Web technology. These waves are driven by changes in societal needs, archiving needs and the development of cloud computing resources. As such, the data repository landscape has many flavors when it comes to sustainability models, target audiences and feature sets. One thing that links all data repositories is a desire to make the content they host reusable, building on the core principles of cataloging content for economical and research speed efficiency. The FAIR principles are a common goal for all repository infrastructures to aim for. No matter what discipline or infrastructure, the goal of reusable content, for both humans and machines, is a common one. This is the first time that repositories can work toward a common goal that ultimately lends itself to interoperability. The idea that research can move further and faster as we un-silo these fantastic resources is an achievable one. This paper investigates the steps that existing repositories need to take in order to remain useful and relevant in a FAIR research world.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-16 合作期刊: 《数据智能(英文)》
摘要: The FAIR principles describe characteristics intended to support access to and reuse of digital artifacts in the scientific research ecosystem. Persistent, globally unique identifiers, resolvable on the Web, and associated with a set of additional descriptive metadata, are foundational to FAIR data. Here we describe some basic principles and exemplars for their design, use and orchestration with other system elements to achieve FAIRness for digital research objects.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-25 合作期刊: 《数据智能(英文)》
摘要: As the world population continues to increase, world food production is not keeping up. This means that to continue to feed the world, we will need to optimize the production and utilization of food around the globe. Optimization of a process on a global scale requires massive data. Agriculture is no exception, but also brings its own unique issues, based on how wide spread agricultural data are, and the wide variety of data that is relevant to optimization of food production and supply. This suggests that we need a global data ecosystem for agriculture and nutrition. Such an ecosystem already exists to some extent, made up of data sets, metadata sets and even search engines that help to locate and utilize data sets. A key concept behind this is sustainabilityhow do we sustain our data sets, so that we can sustain our production and distribution of food? In order to make this vision a reality, we need to navigate the challenges for sustainable data management on a global scale. Starting from the current state of practice, how do we move forward to a practice in which we make use of global data to have an impact on world hunger? In particular, how do we find, collect and manage the data? How can this be effectively deployed to improve practice in the field? And how can we make sure that these practices are leading to the global goals of improving production, distribution and sustainability of the global food supply? These questions cannot be answered yet, but they are the focus of ongoing and future research to be published in this journal and elsewhere.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-18 合作期刊: 《数据智能(英文)》
摘要: Personalized search is a promising way to improve the quality of Websearch, and it has attracted much attention from both academic and industrial communities. Much of the current related research is based on commercial search engine data, which can not be released publicly for such reasons as privacy protection and information security. This leads to a serious lack of accessible public data sets in this field. The few publicly available data sets have not become widely used in academia because of the complexity of the processing process required to study personalized search methods. The lack of data sets together with the difficulties of data processing has brought obstacles to fair comparison and evaluation of personalized search models. In this paper, we constructed a large-scale data set AOL4PS to evaluate personalized search methods, collected and processed from AOL query logs. We present the complete and detailed data processing and construction process. Specifically, to address the challenges of processing time and storage space demands brought by massive data volumes, we optimized the process of data set construction and proposed an improved BM25 algorithm. Experiments are performed on AOL4PS with some classic and state-of-the-art personalized search methods, and the experiment results demonstrate that AOL4PS can measure the effect of personalized search models.
分类: 物理学 >> 核物理学 提交时间: 2016-09-13
摘要: The Dark Matter Particle Explorer (DAMPE) is an upcoming scientific satellite mission for high energy gamma-ray, electron and cosmic rays detection. The silicon tracker (STK) is a sub detector of the DAMPE payload with an excellent position resolution (readout pitch of 242um), which measures the incident direction of particles, as well as charge. The STK consists 12 layers of Silicon Micro-strip Detector (SMD), equivalent to a total silicon area of 6.5m2. The total readout channels of the STK are 73728, which leads to a huge amount of raw data to be dealt. In this paper, we focus on the on-board data compression algorithm and procedure in the STK, which was initially verified by cosmic-ray measurements.
分类: 物理学 >> 核物理学 提交时间: 2025-04-10
摘要: This article introduces the methodologies and instrumentation for data measurement and propagation at the Back-n white neutron facility of the China Spallation Neutron Source (CSNS). The Back-n facility employs backscattering techniques to generate a broad spectrum of white neutrons. Equipped with advanced detectors such as the Light Particle Detector Array (LPDA) and the Fission Ionization Chamber Detector (FIXM), the facility achieves high-precision data acquisition through a general-purpose electronics system. Data are managed and stored in a hierarchical system supported by the National High Energy Physics Science Data Center (NHEPDC), ensuring long-term preservation and efficient access. The data from Back-n experiments significantly contribute to nuclear physics, reactor design, astrophysics, and medical physics, enhancing the understanding of nuclear processes and supporting interdisciplinary research.
分类: 物理学 >> 核物理学 提交时间: 2025-02-24
摘要: This article introduces the methodologies and instrumentation for data measurement and propagation at the Back-n white neutron facility of the China Spallation Neutron Source (CSNS). The Back-n facility employs backscattering techniques to generate a broad spectrum of white neutrons, which are essential for precise measurements of neutron-induced reactions. Equipped with advanced detectors such as the Light Particle Detector Array (LPDA) and the Fission Ionization Chamber Detector (FIXM), the facility achieves high-precision data acquisition through a general-purpose electronics system. Data are managed and stored in a hierarchical system supported by the National High Energy Physics Science Data Center (NHEPDC), ensuring long-term preservation and efficient access. The data from Back-n experiments significantly contribute to nuclear physics, reactor design, astrophysics, and medical physics, enhancing the understanding of nuclear processes and supporting interdisciplinary research.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-27 合作期刊: 《数据智能(英文)》
摘要: Research Data Management (RDM) has become increasingly important for more and more academic institutions. Using the Peking University Open Research Data Repository (PKU-ORDR) project as an example, this paper will review a library-based university-wide open research data repository project and related RDM services implementation process including project kickoff, needs assessment, partnerships establishment, software investigation and selection, software customization, as well as data curation services and training. Through the review, some issues revealed during the stages of the implementation process are also discussed and addressed in the paper such as awareness of research data, demands from data providers and users, data policies and requirements from home institution, requirements from funding agencies and publishers, the collaboration between administrative units and libraries, and concerns from data providers and users. The significance of the study is that the paper shows an example of creating an Open Data repository and RDM services for other Chinese academic libraries planning to implement their RDM services for their home institutions. The authors of the paper have also observed since the PKU-ORDR and RDM services implemented in 2015, the Peking University Library (PKUL) has helped numerous researchers to support the entire research life cycle and enhanced Open Science (OS) practices on campus, as well as impacted the national OS movement in China through various national events and activities hosted by the PKUL.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-18 合作期刊: 《数据智能(英文)》
摘要: It is common practice for data providers to include text descriptions for each column when publishing data sets in the form of data dictionaries. While these documents are useful in helping an end-user properly interpret the meaning of a column in a data set, existing data dictionaries typically are not machine-readable and do not follow a common specification standard. We introduce the Semantic Data Dictionary, a specification that formalizes the assignment of a semantic representation of data, enabling standardization and harmonization across diverse data sets. In this paper, we present our Semantic Data Dictionary work in the context of our work with biomedical data; however, the approach can and has been used in a wide range of domains. The rendition of data in this form helps promote improved discovery, interoperability, reuse, traceability, and reproducibility. We present the associated research and describe how the Semantic Data Dictionary can help address existing limitations in the related literature. We discuss our approach, present an example by annotating portions of the publicly available National Health and Nutrition Examination Survey data set, present modeling challenges, and describe the use of this approach in sponsored research, including our work on a large National Institutes of Health (NIH)-funded exposure and health data portal and in the RPI-IBM collaborative Health Empowerment by Analytics, Learning, and Semantics project.We evaluate this work in comparison with traditional data dictionaries, mapping languages, and data integration tools.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-18 合作期刊: 《数据智能(英文)》
摘要: With the rise of linked data and knowledge graphs, the need becomes compelling to find suitable solutions to increase the coverage and correctness of data sets, to add missing knowledge and to identify and remove errors. Several approaches mostly relying on machine learning and natural language processing techniques have been proposed to address this refinement goal; they usually need a partial gold standard, i.e., some ground truth to train automatic models. Gold standards are manually constructed, either by involving domain experts or by adopting crowdsourcing and human computation solutions. In this paper, we present an open source software framework to build Games with a Purpose for linked data refinement, i.e., Web applications to crowdsource partial ground truth, by motivating user participation through fun incentive. We detail the impact of this new resource by explaining the specific data linking purposes supported by the framework (creation, ranking and validation of links) and by defining the respective crowdsourcing tasks to achieve those goals. We also introduce our approach for incremental truth inference over the contributions provided by players of Games with a Purpose (also abbreviated as GWAP): we motivate the need for such a method with the specificity of GWAP vs. traditional crowdsourcing; we explain and formalize the proposed process, explain its positive consequences and illustrate the results of an experimental comparison with state#2;of-the-art approaches. To show this resources versatility, we describe a set of diverse applications that we built on top of it; to demonstrate its reusability and extensibility potential, we provide references to detailed documentation, including an entire tutorial which in a few hours guides new adopters to customize and adapt the framework to a new use case.