Current Location:home > Browse

1. chinaXiv:201910.00076 [pdf]

Masked Sentence Model based on BERT for Move Recognition in Medical Scientific Abstracts

Yu, Gaihong; Zhang, Zhixiong; Liu, Huan ; Ding, Liangping
Subjects: Computer Science >> Natural Language Understanding and Machine Translation

Purpose: Move recognition in scientific abstracts is an NLP task of classifying sentences of the abstracts into different types of language unit. To improve the performance of move recognition in scientific abstracts, a novel model of move recognition is proposed that outperforms BERT-Base method. Design: Prevalent models based on BERT for sentence classification often classify sentences without considering the context of the sentences. In this paper, inspired by the BERT's Masked Language Model (MLM), we propose a novel model called Masked Sentence Model that integrates the content and contextual information of the sentences in move recognition. Experiments are conducted on the benchmark dataset PubMed 20K RCT in three steps. And then compare our model with HSLN-RNN, BERT-Base and SciBERT using the same dataset. Findings: Compared with BERT-Base and SciBERT model, the F1 score of our model outperforms them by 4.96% and 4.34% respectively, which shows the feasibility and effectiveness of the novel model and the result of our model comes closest to the state-of-the-art results of HSLN-RNN at present. Research Limitations: The sequential features of move labels are not considered, which might be one of the reasons why HSLN-RNN has better performance. And our model is restricted to dealing with bio-medical English literature because we use dataset from PubMed which is a typical bio-medical database to fine-tune our model. Practical implications: The proposed model is better and simpler in identifying move structure in scientific abstracts, and is worthy for text classification experiments to capture contextual features of sentences. Originality: The study proposes a Masked Sentence Model based on BERT which takes account of the contextual features of the sentences in abstracts in a new way. And the performance of this classification model is significantly improved by rebuilding the input layer without changing the structure of neural networks.

submitted time 2019-10-29 Hits36025Downloads934 Comment 0

2. chinaXiv:201910.00073 [pdf]

智慧中医:肺癌处方智能生成模型

阮春阳
Subjects: Computer Science >> Natural Language Understanding and Machine Translation

本文在中医知识挖掘工作积累的基础上,对中医肺癌临床处方数据调研分析,针对其数据特点,构建深度学习模型挖掘处方中症状和中药之间隐藏关系等规律,在此过程中与医生沟通验证模型的准确性,最终实现处方智能生成并达到较高的临床有效性,辅助医生诊断,提升临床效率,推动临床诊断创新发展。

submitted time 2019-10-15 Hits7043Downloads736 Comment 0

3. chinaXiv:201905.00012 [pdf]

Transfer Learning for Scientific Data Chain Extraction in Small Chemical Corpus with BERT-CRF Model

Na Pang; Li Qian; Weimin Lyu; Jin-Dong Yang
Subjects: Computer Science >> Natural Language Understanding and Machine Translation

Abstract. Computational chemistry develops fast in recent years due to the rapid growth and breakthroughs in AI. Thanks for the progress in natural language processing, researchers can extract more fine-grained knowledge in publications to stimulate the development in computational chemistry. While the works and corpora in chemical entity extraction have been restricted in the biomedicine or life science field instead of the chemistry field, we build a new corpus in chemical bond field anno- tated for 7 types of entities: compound, solvent, method, bond, reaction, pKa and pKa value. This paper presents a novel BERT-CRF model to build scientific chemical data chains by extracting 7 chemical entities and relations from publications. And we propose a joint model to ex- tract the entities and relations simultaneously. Experimental results on our Chemical Special Corpus demonstrate that we achieve state-of-art and competitive NER performance.

submitted time 2019-05-12 Hits15845Downloads756 Comment 0

4. chinaXiv:201902.00062 [pdf]

Multimedia Short Text Classification via Deep RNN-CNN Cascade

陶爱山
Subjects: Computer Science >> Natural Language Understanding and Machine Translation

Abstract—With the rapid development of mobile technologies, social networking softwares such as Twitter, Weibo and WeChat are becoming ubiquitous in our every day life. These social networks generate a deluge of data that consists of not only plain texts but also images, videos, and audios. As a consequence, the traditional approaches that classify the short text by counting only the key words become inadequate. In this paper, we propose a multimedia short text classification approach by deep RNN(Recurrent neural network ) and CNN(Convolutional neural network) cascade. We first employ an LSTM(Long short-term memory) net- work to convert the information in the images into text information. Then a convolutional neural network is used to classify the multimedia texts by taking into account both the texts generated from the image as well as those contained in the initial message. It is seen through experiments using MSCOCO dataset that the proposed method exhibits significant performance improvement over the traditional methods.

submitted time 2019-02-22 Hits6733Downloads435 Comment 0

5. chinaXiv:201809.00191 [pdf]

基于代价敏感集成极限学习机的文本分类方法

李明; 肖培伦; 张矩; 顾心盟
Subjects: Computer Science >> Natural Language Understanding and Machine Translation

加权极限学习机对不同类别的样本赋予不同的权值,在一定程度上提高了分类准确 率,但加权极限学习机只考虑了不同类别样本之间差异,忽视了样本噪声和同类样本之间的 差异。本文提出了一种基于文本类别信息熵的极限学习机集成方法,该方法以Adaboost.M1 为算法框架,通过文本的类内分布熵和类间分布熵生成文本类别信息熵,由文本类别信息熵 构造代价敏感矩阵,把代价敏感极限学习机集成到Adaboost.M1 框架中。实验结果表明,该 方法与其他类型的极限学习机相比较有更好的准确性和泛化性。

submitted time 2018-09-27 Hits1642Downloads859 Comment 0

6. chinaXiv:201710.00001 [pdf]

Network of Recurrent Neural Networks

Wang, Chao-Ming
Subjects: Computer Science >> Natural Language Understanding and Machine Translation

We describe a class of systems theory based neural networks called "Network Of Recurrent neural networks" (NOR), which introduces a new structure level to RNN related models. In NOR, RNNs are viewed as the high-level neurons and are used to build the high-level layers. More specifically, we propose several methodologies to design different NOR topologies according to the theory of system evolution. Then we carry experiments on three different tasks to evaluate our implementations. Experimental results show our models outperform simple RNN remarkably under the same number of parameters, and sometimes achieve even better results than GRU and LSTM.

submitted time 2017-10-02 Hits4814Downloads1168 Comment 0

7. chinaXiv:201703.00230 [pdf]

藏文分词及其在藏汉机器翻译中的应用

孙萌; 华却才让; 姜文斌; 吕雅娟; 刘群
Subjects: Computer Science >> Natural Language Understanding and Machine Translation

本文提出一种基于判别式模型的藏文分词方法,并研究了藏文分词在藏汉机器翻译中的应用。根据藏文构词特性,通过最小构词粒度切分、感知机解码和分词结果重排序三个模块,显著提升了藏文分词质量。在此基础上,我们还提出了基于词图的藏汉机器翻译方法,缓解了分词错误在翻译中的传播,可以使翻译质量明显提高。

submitted time 2017-03-10 Hits2300Downloads1671 Comment 0

8. chinaXiv:201703.00228 [pdf]

面向形态丰富语言的翻译规则选择方法

王志洋; 吕雅娟; 孙萌; 姜文斌; 刘群
Subjects: Computer Science >> Natural Language Understanding and Machine Translation

目前的机器翻译模型都是针对形态变化简单的语言(如英语)设计的,不太适合于形态丰富语言(如维吾尔语)。在本文中,我们通过区别对待形态丰富语言中的词干与词缀,提出了一种新型的面向形态丰富语言的翻译规则选择方法。我们用词干作为基本翻译单元以缓解数据稀疏问题,此外,每条词干粒度的翻译规则上还附着一个词缀分布。在翻译时,通过计算待翻译片段的词缀分布与翻译规则词缀分布的相似度,来选择更合适的翻译规则。从三种形态丰富语言(维吾尔语、哈萨克语、柯尔克孜语)到汉语的翻译实验表明,该方法显著改善了翻译质量。

submitted time 2017-03-10 Hits2046Downloads1468 Comment 0

9. chinaXiv:201703.00187 [pdf]

中科院计算所的少数民族语言机器翻译研究进展

吕雅娟; 刘群; 姜文斌
Subjects: Computer Science >> Natural Language Understanding and Machine Translation

本文分析了少数民族语言机器翻译研究的背景、研究现状和发展动态,介绍了中科院计算所在少数民族语言处理和机器翻译方面的研究进展,包括维吾尔语、蒙古语、藏语的语言处理基础技术,形态丰富语言的分析和翻译建模,资源缺乏语言的知识获取和翻译技术,以及组织全国机器翻译研讨会少数民族语言机器翻译评测的情况等。

submitted time 2017-03-09 Hits2126Downloads1595 Comment 0

10. chinaXiv:201611.00727 [pdf]

基于逆序扫描和共现分析的缩略语快速提取算法

王敬东; 张智雄
Subjects: Computer Science >> Natural Language Understanding and Machine Translation

梳理科技资源中缩略语的构成形式;提出一种基于逆序快速扫描和共现分析相结合的术语缩略语快速提取算法。首先从科技资源中提取缩略语、候选全称及上下文信息;接着采用启发式模糊匹配算法,对缩略语及候选术语全称从右向左进行逆序扫描,在不要求缩略语中字母全部正确匹配的情况下,识别出规则的术语缩略语及其全称;最后对不规则候选缩略语及全称进行共现分析。同以往算法相比,该算法无论在时间复杂度上,还是在准确率和召回率上都取得了明显进步。

submitted time 2016-11-14 Hits1519Downloads885 Comment 0

12  Last  Go  [2 Pages/ 12 Totals]