ChinaXiv.org 中国科学院科技论文预发布平台

按提交时间

按主题分类

按作者

按机构

当前资源共 144条

隐藏摘要

点击量

时间

下载量

1. ChinaXiv:202503.00006
下载全文

Photoneutron Dataset Generation and Analysis at SLEGS

分类：物理学 >> 核物理学提交时间： 2025-03-02

Hao, Dr. Zirui Liu, Dr. Longxiang Zhang, Dr. Yue Wang, Prof. Hongwei Fan, Prof. Gongtao Xu, Dr. Hanghua Jin, Dr. Sheng Yang, Dr. Yuxuan Li, Dr. Zhicai Jiao, Mr. Pu Chen, Dr. Kaijie Sun, Dr. Qiankun Wang, Dr. Zhenwei Zhou, Miss Mengdie Ye, Mr. Shan Xu, Dr. Mengke Wang, Dr. Xiangfei Shen, Dr. Yulong

摘要： Photoneutron data are increasingly used in basic research of nuclear physics and in applications of nuclear technology. China has long encountered a bottleneck in the independent measurement of photoneutron cross sections due to the lack of dedicated gamma sources. The Shanghai Laser Electron Gamma Source (SLEGS), based on laser Compton scattering (LCS), provides energy-tunable and quasi-monoenergetic gamma beams, opening up a new avenue for high-precision photonuclear research. This paper focuses on the Flat-Efficiency Detector (FED) array of SLEGS and its application in photoneutron cross section measurements. The systematic uncertainty of FED was verified to be 3.02% through the calibration of a 252Cf neutron source. Additionally, it employs 197Au and 159Tb as case studies to demonstrate the format and processing methods of raw photoneutron data. The results confirm the application potential of SLEGS in the measurement of photoneutron cross sections, with SLEGS capable of supporting the independent acquisition of photoneutron data in China.

同行评议状态:待评议

点击量 442 下载量 119 评论 0
2. ChinaXiv:202412.00386
下载全文

Photoneutron Dataset Generation and Analysis at SLEGS

分类：物理学 >> 核物理学提交时间： 2025-01-08

Hao, Dr. Zirui Liu, Dr. Longxiang Zhang, Dr. Yue Wang, Prof. Hongwei Fan, Prof. Gongtao Xu, Dr. Hanghua Jin, Dr. Sheng Yang, Dr. Yuxuan Li, Dr. Zhicai Jiao, Mr. Pu Chen, Dr. Kaijie Sun, Dr. Qiankun Wang, Dr. Zhenwei Zhou, Miss Mengdie Ye, Mr. Shan Xu, Dr. Mengke Wang, Dr. Xiangfei Shen, Dr. Yulong

摘要： Photonuclear data are increasingly used in basic research of nuclear physics and application of nuclear technology. The generation of photonuclear data depends on advanced gamma source devices. SLEGS is a new Laser Compton Scattering (LCS) gamma source in Shanghai Synchrotron Radiation Facility (SSRF). It is a crucial beamline for photonuclear reaction cross section measurement and related dataset generation in China. Photonuclear data, including photoneutron, photo-proton, photo-alpha and photo-fission data, as well as the inelastically scattered photon (usually known as nuclear resonance fluorescence (NRF)) data, are useful in nuclear physics, nuclear astrophysics, polarization physics, and other related fields. SLEGS, with its monochromatic characteristics and Laser Compton Slant Scattering (LCSS) mode, offers unique features and methodologies for the measurement and analysis of photonuclear data. This article thoroughly explains the systematic uncertainties of the Flat-Efficiency Detector (FED) system. Additionally, it employs $^{197}$ Au and $^{159}$ Tb as case studies to demonstrate the format and processing methods of raw photoneutron data. The content is aimed at the reuse of data analysis.

同行评议状态:待评议

点击量 1090 下载量 325 评论 0
3. ChinaXiv:202211.00203
下载全文

Implementation of the FAIR Data Principles for Exploratory Biomarker Data from Clinical Trials

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-18 合作期刊: 《数据智能（英文）》

Arefolov, Alexander Adam, Laura Brown, Shoshana Budovskaya, Yelena Chen, Cong Das, Diya Farhy, Chen Ferguson, Rebecca Huang, Hongmei Kanigel, Kimberly Lu, Christina Polesskaya, Oksana Staton, Tracy Tajhya, Rajeev Whitley, Maryann Wong, Jee-Yeon Zeng, Xiangpei McCreary, Mark

摘要： The FAIR data guiding principles have been recently developed and widely adopted to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets in the face of an exponential increase of data volume and complexity. The FAIR data principles have been formulated on a general level and the technological implementation of these principles remains up to the industries and organizations working on maximizing the value of their data. Here, we describe the data management and curation methodologies and best practices developed for FAIRification of clinical exploratory biomarker data collected from over 250 clinical studies. We discuss the data curation effort involved, the resulting output, and the business and scientific impact of our work. Finally, we propose prospective planning for FAIR data to optimize data management efforts and maximize data value.

点击量 1292 下载量 337 评论 0
4. ChinaXiv:202211.00464
下载全文

Paving the Way to Open Data

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-29 合作期刊: 《数据智能（英文）》

Yan Wu Elizabeth Moylan Hope Inman Chris Graf

摘要： It is easy to argue that open data is critical to enabling faster and more effective research discovery. In this article, we describe the approach we have taken at Wiley to support open data and to start enabling more data to be FAIR data (Findable, Accessible, Interoperable and Reusable) with the implementation of four data policies: Encourages, Expects, Mandates and Mandates and Peer Reviews Data. We describe the rationale for these policies and levels of adoption so far. In the coming months we plan to measure and monitor the implementation of these policies via the publication of data availability statements and data citations. With this information, well be able to celebrate adoption of data-sharing practices by the research communities we work with and serve, and we hope to showcase researchers from those communities leading in open research.

点击量 5621 下载量 836 评论 0
5. ChinaXiv:202211.00465
下载全文

Playing Well on the Data FAIRground: Initiatives and Infrastructure in Research Data Management

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-29 合作期刊: 《数据智能（英文）》

Danielle, Descoteaux Chiara, Farinelli Marina Soares e Silva Anita de Waard

摘要： Over the past five years, Elsevier has focused on implementing FAIR and best practices in data management, from data preservation through reuse. In this paper we describe a series of efforts undertaken in this time to support proper data management practices. In particular, we discuss our journal data policies and their implementation, the current status and future goals for the research data management platform Mendeley Data, and clear and persistent linkages to individual data sets stored on external data repositories from corresponding published papers through partnership with Scholix. Early analysis of our data policies implementation confirms significant disparities at the subject level regarding data sharing practices, with most uptake within disciplines of Physical Sciences. Future directions at Elsevier include implementing better discoverability of linked data within an article and incorporating research data usage metrics.

点击量 4635 下载量 731 评论 0
6. ChinaXiv:202211.00189
下载全文

FAIR Data Reuse - the Path through Data Citation

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-16 合作期刊: 《数据智能（英文）》

Groth, Paul Cousijn, Helena Clark, Tim Goble, Carole

摘要： One of the key goals of the FAIR guiding principles is defined by its final principle to optimize data sets for reuse by both humans and machines. To do so, data providers need to implement and support consistent machine readable metadata to describe their data sets. This can seem like a daunting task for data providers, whether it is determining what level of detail should be provided in the provenance metadata or figuring out what common shared vocabularies should be used. Additionally, for existing data sets it is often unclear what steps should be taken to enable maximal, appropriate reuse. Data citation already plays an important role in making data findable and accessible, providing persistent and unique identifiers plus metadata on over 16 million data sets. In this paper, we discuss how data citation and its underlying infrastructures, in particular associated metadata, provide an important pathway for enabling FAIR data reuse.

点击量 1136 下载量 360 评论 0
7. ChinaXiv:202211.00339
下载全文

From Persistent Identifiers to Digital Objects to Make Data Science More Efficient

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-25 合作期刊: 《数据智能（英文）》

Peter， Wittenburg

摘要： Data-intensive science is reality in large scientific organizations such as the Max Planck Society, but due to the inefficiency of our data practices when it comes to integrating data from different sources, many projects cannot be carried out and many researchers are excluded. Since about 80% of the time in data#2;intensive projects is wasted according to surveys we need to conclude that we are not fit for the challenges that will come with the billions of smart devices producing continuous streams of dataour methods do not scale. Therefore experts worldwide are looking for strategies and methods that have a potential for the future. The first steps have been made since there is now a wide agreement from the Research Data Alliance to the FAIR principles that data should be associated with persistent identifiers (PIDs) and metadata (MD). In fact after 20 years of experience we can claim that there are trustworthy PID systems already in broad use. It is argued, however, that assigning PIDs is just the first step. If we agree to assign PIDs and also use the PID to store important relationships such as pointing to locations where the bit sequences or different metadata can be accessed, we are close to defining Digital Objects (DOs) which could indeed indicate a solution to solve some of the basic problems in data management and processing. In addition to standardizing the way we assign PIDs, metadata and other state information we could also define a Digital Object Access Protocol as a universal exchange protocol for DOs stored in repositories using different data models and data organizations. We could also associate a type with each DO and a set of operations allowed working on its content which would facilitate the way to automatic processing which has been identified as the major step for scalability in data science and data industry. A globally connected group of experts is now working on establishing testbeds for a DO-based data infrastructure.

点击量 1279 下载量 366 评论 0
8. ChinaXiv:202211.00179
下载全文

Helping the Consumers and Producers of Standards, Repositories and Policies to Enable FAIR Data

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-16 合作期刊: 《数据智能（英文）》

McQuilton, Peter Batista, Dominique Beyan, Oya Granell, Ramon Coles, Simon Izzo, Massimiliano Lister, Allyson L. Pergl, Robert Rocca-Serra, Philippe Schaap, Ben Shanahan, Hugh Thurston, Milo Sansone, Susanna-Assunta

摘要： Thousands of community-developed (meta)data guidelines, models, ontologies, schemas and formats have been created and implemented by several thousand data repositories and knowledge-bases, across all disciplines. These resources are necessary to meet government, funder and publisher expectations of greater transparency and access to and preservation of data related to research publications. This obligates researchers to ensure their data is FAIR, share their data using the appropriate standards, store their data in sustainable and community-adopted repositories, and to conform to funder and publisher data policies. FAIR data sharing also plays a key role in enabling researchers to evaluate, re-analyse and reproduce each others work. We can map the landscape of relationships between community-adopted standards and repositories, and the journal publisher and funder data policies that recommend their use. In this paper, we show how the work of the GO-FAIR FAIR Standards, Repositories and Policies (StRePo) Implementation Network serves as a central integration and cross-fertilisation point for the reuse of FAIR standards, repositories and data policies in general. Pivotal to this effort, the FAIRsharing, an endorsed flagship resource of the Research Data Alliance that maps the landscape of relationships between community-adopted standards and repositories, and the journal publisher and funder data policies that recommend their use. Lastly, we highlight a number of activities around FAIR tools, services and educational efforts to raise awareness and encourage participation.

点击量 1344 下载量 367 评论 0
9. ChinaXiv:202211.00193
下载全文

Making Data and Workflows Findable for Machines

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-16 合作期刊: 《数据智能（英文）》

Weigel, Tobias Schwardmann, Ulrich Klump, Jens Bendoukha, Sofiane Quick, Robert

摘要： Research data currently face a huge increase of data objects with an increasing variety of types (data types, formats) and variety of workflows by which objects need to be managed across their lifecycle by data infrastructures. Researchers desire to shorten the workflows from data generation to analysis and publication, and the full workflow needs to become transparent to multiple stakeholders, including research administrators and funders. This poses challenges for research infrastructures and user-oriented data services in terms of not only making data and workflows findable, accessible, interoperable and reusable, but also doing so in a way that leverages machine support for better efficiency. One primary need to be addressed is that of findability, and achieving better findability has benefits for other aspects of data and workflow management. In this article, we describe how machine capabilities can be extended to make workflows more findable, in particular by leveraging the Digital Object Architecture, common object operations and machine learning techniques

点击量 946 下载量 321 评论 0
10. ChinaXiv:202211.00456
下载全文

Virtual Knowledge Graphs: An Overview of Systems and Use Cases

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-27 合作期刊: 《数据智能（英文）》

Xiao, Guohui Ding, Linfang Cogrel, Benjamin Calvanese, Diego

摘要： In this paper, we present the virtual knowledge graph (VKG) paradigm for data integration and access, also known in the literature as Ontology-based Data Access. Instead of structuring the integration layer as a collection of relational tables, the VKG paradigm replaces the rigid structure of tables with the flexibility of graphs that are kept virtual and embed domain knowledge. We explain the main notions of this paradigm, its tooling ecosystem and significant use cases in a wide range of applications. Finally, we discuss future research directions.

点击量 964 下载量 366 评论 0
11. ChinaXiv:202211.00170
下载全文

How to (Easily) Extend the FAIRness of Existing Repositories

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-16 合作期刊: 《数据智能（英文）》

Hahnel, Mark Valen, Dan

摘要： Data repository infrastructures for academics have appeared in waves since the dawn of Web technology. These waves are driven by changes in societal needs, archiving needs and the development of cloud computing resources. As such, the data repository landscape has many flavors when it comes to sustainability models, target audiences and feature sets. One thing that links all data repositories is a desire to make the content they host reusable, building on the core principles of cataloging content for economical and research speed efficiency. The FAIR principles are a common goal for all repository infrastructures to aim for. No matter what discipline or infrastructure, the goal of reusable content, for both humans and machines, is a common one. This is the first time that repositories can work toward a common goal that ultimately lends itself to interoperability. The idea that research can move further and faster as we un-silo these fantastic resources is an achievable one. This paper investigates the steps that existing repositories need to take in order to remain useful and relevant in a FAIR research world.

点击量 1176 下载量 371 评论 0
12. ChinaXiv:202211.00194
下载全文

Unique, Persistent, Resolvable: Identifiers as the Foundation of FAIR

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-16 合作期刊: 《数据智能（英文）》

Juty, Nick Wimalaratne, Sarala M. Soiland-Reyes, Stian Kunze, John Goble, Carole A. Clark, Tim

摘要： The FAIR principles describe characteristics intended to support access to and reuse of digital artifacts in the scientific research ecosystem. Persistent, globally unique identifiers, resolvable on the Web, and associated with a set of additional descriptive metadata, are foundational to FAIR data. Here we describe some basic principles and exemplars for their design, use and orchestration with other system elements to achieve FAIRness for digital research objects.

点击量 823 下载量 289 评论 0
13. ChinaXiv:202211.00338
下载全文

Sustainability in Data and Food

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-25 合作期刊: 《数据智能（英文）》

Dean， Allemang

摘要： As the world population continues to increase, world food production is not keeping up. This means that to continue to feed the world, we will need to optimize the production and utilization of food around the globe. Optimization of a process on a global scale requires massive data. Agriculture is no exception, but also brings its own unique issues, based on how wide spread agricultural data are, and the wide variety of data that is relevant to optimization of food production and supply. This suggests that we need a global data ecosystem for agriculture and nutrition. Such an ecosystem already exists to some extent, made up of data sets, metadata sets and even search engines that help to locate and utilize data sets. A key concept behind this is sustainabilityhow do we sustain our data sets, so that we can sustain our production and distribution of food? In order to make this vision a reality, we need to navigate the challenges for sustainable data management on a global scale. Starting from the current state of practice, how do we move forward to a practice in which we make use of global data to have an impact on world hunger? In particular, how do we find, collect and manage the data? How can this be effectively deployed to improve practice in the field? And how can we make sure that these practices are leading to the global goals of improving production, distribution and sustainability of the global food supply? These questions cannot be answered yet, but they are the focus of ongoing and future research to be published in this journal and elsewhere.

点击量 1078 下载量 299 评论 0
14. ChinaXiv:202211.00210
下载全文

AOL4PS: A Large-scale Data Set for Personalized Search

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-18 合作期刊: 《数据智能（英文）》

Guo, Qian Chen, Wei Wan, Huaiyu

摘要： Personalized search is a promising way to improve the quality of Websearch, and it has attracted much attention from both academic and industrial communities. Much of the current related research is based on commercial search engine data, which can not be released publicly for such reasons as privacy protection and information security. This leads to a serious lack of accessible public data sets in this field. The few publicly available data sets have not become widely used in academia because of the complexity of the processing process required to study personalized search methods. The lack of data sets together with the difficulties of data processing has brought obstacles to fair comparison and evaluation of personalized search models. In this paper, we constructed a large-scale data set AOL4PS to evaluate personalized search methods, collected and processed from AOL query logs. We present the complete and detailed data processing and construction process. Specifically, to address the challenges of processing time and storage space demands brought by massive data volumes, we optimized the process of data set construction and proposed an improved BM25 algorithm. Experiments are performed on AOL4PS with some classic and state-of-the-art personalized search methods, and the experiment results demonstrate that AOL4PS can measure the effect of personalized search models.

点击量 1247 下载量 381 评论 0
15. ChinaXiv:201609.00882
下载全文

DAMPE silicon tracker on-board data compression algorithm

分类：物理学 >> 核物理学提交时间： 2016-09-13

DONG Yi -Fan ZHANG Fei QIAO Rui PENG Wen-Xi FAN Rui -Rui GONG Ke WU Di WANG Huan-Yu

摘要： The Dark Matter Particle Explorer (DAMPE) is an upcoming scientific satellite mission for high energy gamma-ray, electron and cosmic rays detection. The silicon tracker (STK) is a sub detector of the DAMPE payload with an excellent position resolution (readout pitch of 242um), which measures the incident direction of particles, as well as charge. The STK consists 12 layers of Silicon Micro-strip Detector (SMD), equivalent to a total silicon area of 6.5m2. The total readout channels of the STK are 73728, which leads to a huge amount of raw data to be dealt. In this paper, we focus on the on-board data compression algorithm and procedure in the STK, which was initially verified by cosmic-ray measurements.

同行评议状态:待评议

点击量 2035 下载量 910 评论 0
16. ChinaXiv:202504.00224
下载全文

Nuclear data measurement and propagation in Back-n experiments: methodologies and instrumentation

分类：物理学 >> 核物理学提交时间： 2025-04-10

Gu, Dr. Minhao Xue, Ms. Jieming Li, Dr. Yakang cao, Dr. ping 曹平 Ren, Dr. Jie Chen, Dr. Yonghao Jiang, Dr. Wei Yi, Dr. Han Hu, Dr. Peng Fan, Dr. Ruirui

摘要： This article introduces the methodologies and instrumentation for data measurement and propagation at the Back-n white neutron facility of the China Spallation Neutron Source (CSNS). The Back-n facility employs backscattering techniques to generate a broad spectrum of white neutrons. Equipped with advanced detectors such as the Light Particle Detector Array (LPDA) and the Fission Ionization Chamber Detector (FIXM), the facility achieves high-precision data acquisition through a general-purpose electronics system. Data are managed and stored in a hierarchical system supported by the National High Energy Physics Science Data Center (NHEPDC), ensuring long-term preservation and efficient access. The data from Back-n experiments significantly contribute to nuclear physics, reactor design, astrophysics, and medical physics, enhancing the understanding of nuclear processes and supporting interdisciplinary research.

同行评议状态:待评议

点击量 326 下载量 55 评论 0
17. ChinaXiv:202502.00186
下载全文

Data Measurement and Propagation in Back-n Experiments: Methodologies and Instrumentation

分类：物理学 >> 核物理学提交时间： 2025-02-24

Gu, Dr. Minhao Xue, Ms. Jieming Li, Dr. Yakang cao, Dr. ping 曹平 Ren, Dr. Jie Chen, Dr. Yonghao Jiang, Dr. Wei Yi, Dr. Han Hu, Dr. Peng Fan, Dr. Ruirui

摘要： This article introduces the methodologies and instrumentation for data measurement and propagation at the Back-n white neutron facility of the China Spallation Neutron Source (CSNS). The Back-n facility employs backscattering techniques to generate a broad spectrum of white neutrons, which are essential for precise measurements of neutron-induced reactions. Equipped with advanced detectors such as the Light Particle Detector Array (LPDA) and the Fission Ionization Chamber Detector (FIXM), the facility achieves high-precision data acquisition through a general-purpose electronics system. Data are managed and stored in a hierarchical system supported by the National High Energy Physics Science Data Center (NHEPDC), ensuring long-term preservation and efficient access. The data from Back-n experiments significantly contribute to nuclear physics, reactor design, astrophysics, and medical physics, enhancing the understanding of nuclear processes and supporting interdisciplinary research.

同行评议状态:待评议

点击量 554 下载量 143 评论 0
18. ChinaXiv:202211.00398
下载全文

Research Data Management Implementation at Peking University Library: Foster and Promote Open Science and Open Data

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-27 合作期刊: 《数据智能（英文）》

Nie, Hua Luo, Pengcheng Fu, Ping

摘要： Research Data Management (RDM) has become increasingly important for more and more academic institutions. Using the Peking University Open Research Data Repository (PKU-ORDR) project as an example, this paper will review a library-based university-wide open research data repository project and related RDM services implementation process including project kickoff, needs assessment, partnerships establishment, software investigation and selection, software customization, as well as data curation services and training. Through the review, some issues revealed during the stages of the implementation process are also discussed and addressed in the paper such as awareness of research data, demands from data providers and users, data policies and requirements from home institution, requirements from funding agencies and publishers, the collaboration between administrative units and libraries, and concerns from data providers and users. The significance of the study is that the paper shows an example of creating an Open Data repository and RDM services for other Chinese academic libraries planning to implement their RDM services for their home institutions. The authors of the paper have also observed since the PKU-ORDR and RDM services implemented in 2015, the Peking University Library (PKUL) has helped numerous researchers to support the entire research life cycle and enhanced Open Science (OS) practices on campus, as well as impacted the national OS movement in China through various national events and activities hosted by the PKUL.

点击量 1208 下载量 387 评论 0
19. ChinaXiv:202211.00216
下载全文

The Semantic Data Dictionary - An Approach for Describing and Annotating Data

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-18 合作期刊: 《数据智能（英文）》

Rashid, Sabbir M. McCusker, James P. Pinheiro, Paulo Bax, Marcello P. Santos, Henrique Stingone, Jeanette A. Das, Amar K. McGuinness, Deborah L.

摘要： It is common practice for data providers to include text descriptions for each column when publishing data sets in the form of data dictionaries. While these documents are useful in helping an end-user properly interpret the meaning of a column in a data set, existing data dictionaries typically are not machine-readable and do not follow a common specification standard. We introduce the Semantic Data Dictionary, a specification that formalizes the assignment of a semantic representation of data, enabling standardization and harmonization across diverse data sets. In this paper, we present our Semantic Data Dictionary work in the context of our work with biomedical data; however, the approach can and has been used in a wide range of domains. The rendition of data in this form helps promote improved discovery, interoperability, reuse, traceability, and reproducibility. We present the associated research and describe how the Semantic Data Dictionary can help address existing limitations in the related literature. We discuss our approach, present an example by annotating portions of the publicly available National Health and Nutrition Examination Survey data set, present modeling challenges, and describe the use of this approach in sponsored research, including our work on a large National Institutes of Health (NIH)-funded exposure and health data portal and in the RPI-IBM collaborative Health Empowerment by Analytics, Learning, and Semantics project.We evaluate this work in comparison with traditional data dictionaries, mapping languages, and data integration tools.

点击量 1098 下载量 380 评论 0
20. ChinaXiv:202211.00220
下载全文

Refining Linked Data with Games with a Purpose

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-18 合作期刊: 《数据智能（英文）》

Celino, Irene Calegari, Gloria Re Fiano, Andrea

摘要： With the rise of linked data and knowledge graphs, the need becomes compelling to find suitable solutions to increase the coverage and correctness of data sets, to add missing knowledge and to identify and remove errors. Several approaches mostly relying on machine learning and natural language processing techniques have been proposed to address this refinement goal; they usually need a partial gold standard, i.e., some ground truth to train automatic models. Gold standards are manually constructed, either by involving domain experts or by adopting crowdsourcing and human computation solutions. In this paper, we present an open source software framework to build Games with a Purpose for linked data refinement, i.e., Web applications to crowdsource partial ground truth, by motivating user participation through fun incentive. We detail the impact of this new resource by explaining the specific data linking purposes supported by the framework (creation, ranking and validation of links) and by defining the respective crowdsourcing tasks to achieve those goals. We also introduce our approach for incremental truth inference over the contributions provided by players of Games with a Purpose (also abbreviated as GWAP): we motivate the need for such a method with the specificity of GWAP vs. traditional crowdsourcing; we explain and formalize the proposed process, explain its positive consequences and illustrate the results of an experimental comparison with state#2;of-the-art approaches. To show this resources versatility, we describe a set of diverse applications that we built on top of it; to demonstrate its reusability and extensibility potential, we provide references to detailed documentation, including an entire tutorial which in a few hours guides new adopters to customize and adapt the framework to a new use case.

点击量 1271 下载量 447 评论 0

1 2 3 4 5 6 7 8 后页尾页