ChinaXiv.org 中国科学院科技论文预发布平台

Submitted Date

2022
1
2018
2

Subjects

Authors

Institution

result total 3.

Hide Summary

Hits

Date

Downloads

Your conditions: 桂林电子科技大学广西可信软件重点实验室

1. ChinaXiv:202210.00200
Download

面向检索服务的词干提取与相关排序优化研究

Subjects: Information Science and Systems Science >> Systematic Application of Information Technology submitted time 2022-10-26 Cooperative journals: 《桂林电子科技大学学报》

朱艳张敬伟杨青胡晓丽单美静

Abstract： The rise of a new generation of information technology and the rapid development of the internet industry have led
to an explosive growth in the amount of data. In order to meet the needs of billions of users to obtain effective information
from massive data quickly, it is of great significance to improve the retrieval quality and query efficiency of search engines,
but it also faces challenges. On the one hand, the query words of users are becoming more and more complex, and the characteristics
of the morphological variation of language vocabulary lead to the diversification of search words, while existing
stemming algorithms generally suffer from under stemming and unsatisfactory stemming accuracy; On the other hand, it is
a very time-consuming task to retrieve document results that meet user query requirements from massive data, and existing
methods of dividing documents into multiple servers to handle query latency often suffer from tail latency problems. In view
of the above problems, in the text preprocessing stage, the word form normalization algorithm APS (advanced porter stemmer)
is designed, the rule function is recoded, and the feature word extraction is optimized; In the related ranking stage, the
anytime ranking algorithm SAR (SAAT anytime ranking) is designed based on the score-at-a-Time query processing strategy,
which can terminate the query process in advance after a given time budget or processing a specified number of inverted
segments and control the query delay effectively. Experiments are carried out on multiple real datasets to verify the effectiveness of the APS algorithm in improving the accuracy of stemming and the authenticity of the SAR algorithm in controlling
query latency.

Hits 3230 Downloads 415 Comment
2. ChinaXiv:201806.00128
Download

基于GRU和注意力机制的远程监督关系抽取

Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-06-19 Cooperative journals: 《计算机应用研究》

黄兆玮常亮宾辰忠孙彦鹏孙磊

Abstract： With the development of deep learning, more and more deep learning models have been applied to the task of relation extraction, but traditional deep learning models cannot solve long distance dependence problems. At the same time, distant supervision will inevitably generate wrong labels. For these two problems, this work proposes a distant supervision relationship extraction method based on GRU (Gated Recurrent Unit) and the attention mechanism. First, the GRU neural network is adopted to extract text features and solve long-distance dependence problems. Second this work constructs a Sentence-Level Attention Mechanism on entity pairs to reduce the weight of noise sentences. Finally, based on the real data set, by calculating the accuracy rate and recall rate, the PR curve is drawn to prove the proposed method has achieved significant progress compared with some existing methods.

Hits 1773 Downloads 1056 Comment
3. ChinaXiv:201805.00271
Download

分布式入侵检测中基于能力与负载的数据分割算法

Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-05-20 Cooperative journals: 《计算机应用研究》

张润莲李豪叶志博赵新红

Abstract： Aiming at the efficiency and detection rate problems of the massive data parallel detecting in high speed network distributed intrusion detection, this paper proposed a data partitioning algorithm based on capacity and workload. In this algorithm, according to the collected performance indicators and running status of data analysis nodes for parallel data processing in the cluster, it evaluated the data processing capacity and the workload of nodes. Based on the node's capacity and load adjustment factor, realized the dynamic data partition among the data analysis nodes by considering the weight of the node for detecting and analyzing data in the cluster. It made the partitioned data granularity of the node matches with node 's capacity and real-time load. The tested results show that the proposed algorithm can reduce the detection time, improve the efficiency and detection rate of the data parallel processing.

Hits 1088 Downloads 605 Comment

面向检索服务的词干提取与相关排序优化研究

基于GRU和注意力机制的远程监督关系抽取

分布式入侵检测中基于能力与负载的数据分割算法