分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-25 合作期刊: 《数据智能(英文)》
摘要: The early concept of knowledge graph originates from the idea of the semantic Web, which aims at using structured graphs to model the knowledge of the world and record the relationships that exist between things. Currently publishing knowledge bases as open data on the Web has gained significant attention. In China, Chinese Information Processing Society of China (CIPS) launched the OpenKG in 2015 to foster the development of Chinese Open Knowledge Graphs. Unlike existing open knowledge-based programs, OpenKG chain is envisioned as a blockchain-based open knowledge infrastructure. This article introduces the first attempt at the implementation of sharing knowledge graphs on OpenKG chain, a blockchain-based trust network. We have completed the test of the underlying blockchain platform, and the on-chain test of OpenKGs data set and tool set sharing as well as fine-grained knowledge crowdsourcing at the triple level. We have also proposed novel definitions: K-Point and OpenKG Token, which can be considered to be a measurement of knowledge value and user value. 1,033 knowledge contributors have been involved in two months of testing on the blockchain, and the cumulative number of on-chain recordings triggered by real knowledge consumers has reached 550,000 with an average daily peak value of more than 10,000. For the first time, we have tested and realized on-chain sharing of knowledge at entity/triple granularity level. At present, all operations on the data sets and tool sets at OpenKG.CN, as well as the triplets at OpenBase, are recorded on the chain, and corresponding value will also be generated and assigned in a trusted mode. Via this effort, OpenKG chain looks forward to providing a more credible and traceable knowledge-sharing platform for the knowledge graph community.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-28 合作期刊: 《数据智能(英文)》
摘要: Entity Linking (EL) aims to automatically link the mentions in unstructured documents to corresponding entities in a knowledge base (KB), which has recently been dominated by global models. Although many global EL methods attempt to model the topical coherence among all linked entities, most of them failed in exploiting the correlations among manifold knowledge helpful for linking, such as the semantics of mentions and their candidates, the neighborhood information of candidate entities in KB and the fine-grained type information of entities. As we will show in the paper, interactions among these types of information are very useful for better characterizing the topic features of entities and more accurately estimating the topical coherence among all the referred entities within the same document. In this paper, we present a novel HEterogeneous Graph-based Entity Linker (HEGEL) for global entity linking, which builds an informative heterogeneous graph for every document to collect various linking clues. Then HEGEL utilizes a novel heterogeneous graph neural network (HGNN) to integrate the different types of manifold information and model the interactions among them. Experiments on the standard benchmark datasets demonstrate that HEGEL can well capture the global coherence and outperforms the prior state-of-the-art EL methods.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-28 合作期刊: 《数据智能(英文)》
摘要: Existing visual scene understanding methods mainly focus on identifying coarse-grained concepts about the visual objects and their relationships, largely neglecting fine-grained scene understanding. In fact, many data-driven applications on the Web (e.g., news-reading and e-shopping) require accurate recognition of much less coarse concepts as entities and proper linking them to a knowledge graph (KG), which can take their performance to the next level. In light of this, in this paper, we identify a new research task: visual entity linking for fine-grained scene understanding. To accomplish the task, we first extract features of candidate entities from different modalities, i.e., visual features, textual features, and KG features. Then, we design a deep modal-attention neural network-based learning-to-rank method which aggregates all features and maps visual objects to the entities in KG. Extensive experimental results on the newly constructed dataset show that our proposed method is effective as it significantly improves the accuracy performance from 66.46% to 83.16% compared with baselines.
分类: 数学 >> 离散数学和组合数学 提交时间: 2024-03-26
摘要: In 1975, P. Erd {o}s proposed the problem of determining the maximum number $f(n)$ of edges in a graph of $n$ vertices in which any two cycles are of different lengths. In this paper, it is proved that $$f(n) geq n+32t-1$$ for $t=27720r+169 , (r geq 1)$ and $n geq frac{6911}{16}t^{2}+ frac{514441}{8}t- frac{3309665}{16}$. Consequently, $ liminf sb {n to infty} {f(n)-n over sqrt n} geq sqrt {2 + {2562 over 6911}}.$
分类: 数学 >> 离散数学和组合数学 提交时间: 2024-03-26
摘要: In 1975,P.Erd {o}sproposedtheproblemofdeterminingthemaximumnumber$f(n)$ofedgesinagraphwith$n$verticesinwhichanytwocyclesareofdifferentlengths.Inthispaper,itisprovedthat$$f(n) geqn+ frac{107}{3}t+ frac{7}{3}$$for$t=1260r+169 , (r geq1)$and$n geq frac{2119}{4}t^{2}+87978t+ frac{15957}{4}$.Consequently,$ liminf sb{n to infty}{f(n)-n over sqrtn} geq sqrt{2+ frac{7654}{19071}},$whichisbetterthanthepreviousbounds$ sqrt2$ Y.Shi,DiscreteMath.71(1988),57-71 ,$ sqrt{2.4}$ C.Lai,Australas.J.Combin.27(2003),101-105 .Theconjecture$ lim_{n rightarrow infty}{f(n)-n over sqrtn}= sqrt{2.4}$isnottrue.
分类: 数学 >> 离散数学和组合数学 提交时间: 2024-03-26
摘要: 设f(n) 是没有等长圈的n个顶点的图的最大可能边数。确定f(n)的问题由Erdos在1975年提出。本文给出了f(n)的下界。
分类: 数学 >> 离散数学和组合数学 提交时间: 2024-02-18
摘要: In 1975, P. Erdős proposed the problem of determining the maximum number $f(n)$ of edges in a graph on $n$ vertices in which any two cycles are of different lengths. Let $f^{\ast}(n)$ be the maximum number of edges in a simple graph on $n$ vertices in which any two cycles are of different lengths. Let $M_n$ be the set of simple graphs on $n$ vertices in which any two cycles are of different lengths and with the edges of $f^{\ast}(n)$. Let $mc(n)$ be the maximum cycle length for all $G \in M_n$. In this paper, it is proved that for $n$ sufficiently large, $mc(n)\leq \frac{15}{16}n$. We make the following conjecture: $$\lim_{n \rightarrow \infty} {mc(n)\over n}= 0.$$
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-29 合作期刊: 《数据智能(英文)》
摘要: Knowledge graph (KG) has played an important role in enhancing the performance of many intelligent systems. In this paper, we introduce the solution of building a large-scale multi-source knowledge graph from scratch in Sogou Inc., including its architecture, technical implementation and applications. Unlike previous works that build knowledge graph with graph databases, we build the knowledge graph on top of SogouQdb, a distributed search engine developed by Sogou Web Search Department, which can be easily scaled to support petabytes of data. As a supplement to the search engine, we also introduce a series of models to support inference and graph based querying. Currently, the data of Sogou knowledge graph that are collected from 136 different websites and constantly updated consist of 54 million entities and over 600 million entity links. We also introduce three applications of knowledge graph in Sogou Inc.: entity detection and linking, knowledge based question answering and knowledge based dialogue system. These applications have been used in Web search products to help user acquire information more efficiently.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-28 合作期刊: 《数据智能(英文)》
摘要: Multi-modal entity linking plays a crucial role in a wide range of knowledge-based modal-fusion tasks, i.e., multi-modal retrieval and multi-modal event extraction. We introduce the new ZEro-shot Multi-modal Entity Linking (ZEMEL) task, the format is similar to multi-modal entity linking, but multi-modal mentions are linked to unseen entities in the knowledge graph, and the purpose of zero-shot setting is to realize robust linking in highly specialized domains. Simultaneously, the inference efficiency of existing models is low when there are many candidate entities. On this account, we propose a novel model that leverages visual#2; linguistic representation through the co-attentional mechanism to deal with the ZEMEL task, considering the trade-off between performance and efficiency of the model. We also build a dataset named ZEMELD for the new task, which contains multi-modal data resources collected from Wikipedia, and we annotate the entities as ground truth. Extensive experimental results on the dataset show that our proposed model is effective as it significantly improves the precision from 68.93% to 82.62% comparing with baselines in the ZEMEL task.
分类: 数学 >> 离散数学和组合数学 提交时间: 2024-03-27
摘要: The set of all non-increasing nonnegative integers sequence π = (d(v1), d(v2), ..., d(vn)) is denoted by NSn. A sequence π ∈ NSn is said to be graphic if it is the degree sequence of a simple graph G on n vertices, and such a graph G is called a realization of π. The set of all graphic sequences in NSn is denoted by GSn. A graphical sequence π is potentially H-graphical if there is a realization of π containing H as a subgraph, while π is forcibly H-graphical if every realization of π contains H as a subgraph. Let Kk denote a complete graph on k vertices. Let Km −H be the graph obtained from Km by removing the edges set E(H) of the graph H (H is a subgraph of Km). This paper summarizes briefly some recent results on potentially Km −G-graphic sequences and give a useful classification for determining σ(H, n).
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-28 合作期刊: 《数据智能(英文)》
摘要: The paper gives a brief introduction about the workflow management platform, Flowable, and how it is used for textual-data management. It is relatively new with its first release on 13 October, 2016. Despite the short time on the market, it seems to be quickly well-noticed with 4.6 thousand stars on GitHub at the moment. The focus of our project is to build a platform for text analysis on a large scale by including many different text resources. Currently, we have successfully connected to four different text resources and obtained more than one million works. Some resources are dynamic, which means that they might add more data or modify their current data. Therefore, it is necessary to keep data, both the metadata and the raw data, from our side up to date with the resources. In addition, to comply with FAIR principles, each work is assigned a persistent identifier (PID) and indexed for searching purposes. In the last step, we perform some standard analyses on the data to enhance our search engine and to generate a knowledge graph. End-users can utilize our platform to search on our data or get access to the knowledge graph. Furthermore, they can submit their code for their analyses to the system. The code will be executed on a High-Performance Cluster (HPC) and users can receive the results later on. In this case, Flowable can take advantage of PIDs for digital objects identification and management to facilitate the communication with the HPC system. As one may already notice, the whole process can be expressed as a workflow. A workflow, including error handling and notification, has been created and deployed. Workflow execution can be triggered manually or after predefined time intervals. According to our evaluation, the Flowable platform proves to be powerful and flexible. Further usage of the platform is already planned or implemented for many of our projects.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-28 合作期刊: 《数据智能(英文)》
摘要: Nowadays, with increasing open knowledge graphs (KGs) being published on the Web, users depend on open data portals and search engines to find KGs. However, existing systems provide search services and present results with only metadata while ignoring the contents of KGs, i.e., triples. It brings difficulty for users comprehension and relevance judgement. To overcome the limitation of metadata, in this paper we propose a content-based search engine for open KGs named CKGSE. Our system provides keyword search, KG snippet generation, KG profiling and browsing, all based on KGs detailed, informative contents rather than their brief, limited metadata. To evaluate its usability, we implement a prototype with Chinese KGs crawled from OpenKG.CN and report some preliminary results and findings.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-27 合作期刊: 《数据智能(英文)》
摘要: Knowledge base plays an important role in machine understanding and has been widely used in various applications, such as search engine, recommendation system and question answering. However, most knowledge bases are incomplete, which can cause many downstream applications to perform poorly because they cannot find the corresponding facts in the knowledge bases. In this paper, we propose an extraction and verification framework to enrich the knowledge bases. Specifically, based on the existing knowledge base, we first extract new facts from the description texts of entities. But not all newly-formed facts can be added directly to the knowledge base because the errors might be involved by the extraction. Then we propose a novel crowd-sourcing based verification step to verify the candidate facts. Finally, we apply this framework to the existing knowledge base CN-DBpedia and construct a new version of knowledge base CN-DBpedia2, which additionally contains the high confidence facts extracted from the description texts of entities.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-27 合作期刊: 《数据智能(英文)》
摘要: Knowledge graph (KG) completion aims at filling the missing facts in a KG, where a fact is typically represented as a triple in the form of (head, relation, tail). Traditional KG completion methods compel two#2;thirds of a triple provided (e.g., head and relation) to predict the remaining one. In this paper, we propose a new method that extends multi-layer recurrent neural networks (RNNs) to model triples in a KG as sequences. It obtains state-of-the-art performance on the common entity prediction task, i.e., giving head (or tail) and relation to predict the tail (or the head), using two benchmark data sets. Furthermore, the deep sequential characteristic of our method enables it to predict the relations given head (or tail) only, and even predict the whole triples. Our experiments on these two new KG completion tasks demonstrate that our method achieves superior performance compared with several alternative methods.
分类: 计算机科学 >> 自然语言理解与机器翻译 提交时间: 2022-03-04
摘要: The rise and application of neural network has successfully promoted the research of pattern recognition and data mining.In recent years, graph neural network has attracted more and more attention. It has some applications in text classification, sequence annotation, neural machine translation, relation extraction, image classification and other fields. This review mainly integrates the existing research on semi-supervised or unsupervised graph neural network. The research work of this paper is mainly classified in three aspects, one is based on the classification of research questions, the other is based on the classification of research methods, and the third is based on the classification of measures.The main research problems are the low-dimensional representation of nodes in graphs and the over-smooth problem in the process of message transfer. The research methods mainly focus on the graph embedding algorithm, such as the graph embedding algorithm based on probability graph and the method based on deep learning. The measurement methods mainly focus on the accuracy and efficiency of the algorithm and model.Finally, this paper also puts forward the feasible future research direction, which provides reference for readers.
分类: 数学 >> 离散数学和组合数学 提交时间: 2024-02-13
摘要: Let $K_k$, $C_k$, $T_k$, and $P_{k}$ denote a complete graph on $k$ vertices, a cycle on $k$ vertices, a tree on $k+1$ vertices, and a path on $k+1$ vertices, respectively. Let $K_{m}-H$ be the graph obtained from $K_{m}$ by removing the edges set $E(H)$ of the graph $H$ ($H$ is a subgraph of $K_{m}$). A sequence $S$ is potentially $K_{m}-H$-graphical if it has a realization containing a $K_{m}-H$ as a subgraph. Let $\sigma(K_{m}-H, n)$ denote the smallest degree sum such that every $n$-term graphical sequence $S$ with $\sigma(S)\geq \sigma(K_{m}-H, n)$ is potentially $K_{m}-H$-graphical. In this paper, we determine the values of $\sigma (K_{r+1}-H, n)$ for $n\geq 4r+10, r\geq 3, r+1 \geq k \geq 4$ where $H$ is a graph on $k$ vertices which contains a tree on $4$ vertices but not contains a cycle on $3$ vertices. We also determine the values of $\sigma (K_{r+1}-P_2, n)$ for $n\geq 4r+8, r\geq 3$.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-27 合作期刊: 《数据智能(英文)》
摘要: With the technological development of entity extraction, relationship extraction, knowledge reasoning, and entity linking, the research on knowledge graph has been carried out in full swing in recent years. To better promote the development of knowledge graph, especially in the Chinese language and in the financial industry, we built a high-quality data set, named financial research report knowledge graph (FR2KG), and organized the automated construction of financial knowledge graph evaluation at the 2020 China Knowledge Graph and Semantic Computing Conference (CCKS2020). FR2KG consists of 17,799 entities, 26,798 relationship triples, and 1,328 attribute triples covering 10 entity types, 19 relationship types, and 6 attributes. Participants are required to develop a constructor that will automatically construct a financial knowledge graph based on the FR2KG. In addition, we summarized the technologies for automatically constructing knowledge graphs, and introduced the methods used by the winners and the results of this evaluation.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-28 合作期刊: 《数据智能(英文)》
摘要: Due to the large-scale spread of COVID-19, which has a significant impact on human health and social economy, developing effective antiviral drugs for COVID-19 is vital to saving human lives. Various biomedical associations, e.g., drug-virus and viral protein-host protein interactions, can be used for building biomedical knowledge graphs. Based on these sources, large-scale knowledge reasoning algorithms can be used to predict new links between antiviral drugs and viruses. To utilize the various heterogeneous biomedical associations, we proposed a fusion strategy to integrate the results of two tensor decomposition-based models (i.e., CP-N3 and ComplEx-N3). Sufficient experiments indicated that our method obtained high performance (MRR=0.2328). Compared with CP-N3, the mean reciprocal rank (MRR) is increased by 3.3% and compared with ComplEx-N3, the MRR is increased by 3.5%. Meanwhile, we explored the relationship between the performance and relationship types, which indicated that there is a negative correlation (PCC=0.446, P-value=2.26e-194) between the performance of triples predicted by our method and edge betweenness.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-27 合作期刊: 《数据智能(英文)》
摘要: In this paper, we present the virtual knowledge graph (VKG) paradigm for data integration and access, also known in the literature as Ontology-based Data Access. Instead of structuring the integration layer as a collection of relational tables, the VKG paradigm replaces the rigid structure of tables with the flexibility of graphs that are kept virtual and embed domain knowledge. We explain the main notions of this paradigm, its tooling ecosystem and significant use cases in a wide range of applications. Finally, we discuss future research directions.
分类: 数学 >> 离散数学和组合数学 提交时间: 2024-03-27
摘要: Let f(n) be the maximum number of edges in a graph on n vertices in which no two cycles have the same length. Erd¨os raised the problem of determining f(n). Erd¨os conjectured that there exists a positive constant c such that ex(n, C2k) ≥ cn1+1/k. Haj´os conjecture that every simple even graph on n vertices can be decomposed into at most n/2 cycles. We present the problems, conjectures related to these problems and we summarize the know results. We do not think Haj´os conjecture is true.