by Olga Davydova
Since artificial neural networks allow modeling of nonlinear processes, they have turned into a very popular and useful tool for solving many problems such as classification, clustering, regression, pattern recognition, dimension reduction, structured prediction, machine translation, anomaly detection, decision making, visualization, computer vision, and others. This wide range of abilities makes it possible to use artificial neural networks in many areas. In this article, we discuss applications of artificial neural networks in Natural Language Processing tasks (NLP).
NLP includes a wide set of syntax, semantics, discourse, and speech tasks. We will describe prime tasks in which neural networks demonstrated state-of-the-art performance.
Text classification is an essential part in many applications, such as web searching, information filtering, language identification, readability assessment, and sentiment analysis. Neural networks are actively used for these tasks.
In Convolutional Neural Networks for Sentence Classification by Yoon Kim, a series of experiments with Convolutional Neural Networks (CNN) built on top of word2vec was presented. The suggested model was tested against several benchmarks. In Movie Reviews (MR) and Customer Reviews (CR), the task was to detect positive/negative sentiment. In Stanford Sentiment Treebank (SST-1), there were already more classes to predict: very positive, positive, neutral, negative, very negative. In Subjectivity data set (Subj), sentences were classified into two types, subjective or objective. In TREC the goal was to classify a question into six question types (whether the question is about person, location, numeric information, etc.) The results of numerous tests described in the paper show that after little tuning of hyperparameters the model performs excellent suggesting that the pre-trained vectors are universal feature extractors and can be utilized for various classification tasks [1].
The article Text Understanding from Scratch by Xiang Zhang and Yann LeCun shows that it?s possible to apply deep learning to text understanding from character-level inputs all the way up to abstract text concepts with help of temporal Convolutional Networks (ConvNets) (CNN). Here, the authors assert that ConvNets can achieve excellent performance without the knowledge of words, phrases, sentences and any other syntactic or semantic structures with regards to a human language [2]. To prove their assertion several experiments were conducted. The model was tested on the DBpedia ontology classification data set with 14 classes (company, educational institution, artist, athlete, office holder, mean of transportation, building, natural place, village, animal, plant, album, film, written work). The results indicate both good training (99.96%) and testing (98.40 %) accuracy, with some improvement from thesaurus augmentation. In addition, the sentiment analysis test was performed on the Amazon Review data set. In this study, the researchers constructed a sentiment polarity data set with two negative and two positive labels. The result is 97.57% training accuracy and 95.07% testing accuracy. The model was also tested on Yahoo! Answers Comprehensive Questions and Answers data set with 10 classes (Society & Culture, Science & Mathematics, Health, Education & Reference, Computers & Internet, Sports, Business & Finance, Entertainment & Music, Family & Relationships, Politics & Government) and on AG?s corpus where the task was a news categorization into four categories (World, Sports, Business, Sci/Tech.). Obtained results confirm that to achieve good text understanding ConvNets require a large corpus in order to learn from scratch.
Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao introduced recurrent convolutional neural networks for text classification without human-designed features in their document Recurrent Convolutional Neural Networks for Text Classification [3]. The team tested their model using four data sets: 20Newsgroup (with four categories such as computers, politics, recreation, and religion), Fudan Set (a Chinese document classification set that consists of 20 classes, including art, education, and energy), ACL Anthology Network (with five languages: English, Japanese, German, Chinese, and French), and Sentiment Treebank (with Very Negative, Negative, Neutral, Positive, and Very Positive labels). After testing, the model was compared to existing text classification methods like Bag of Words, Bigrams + LR, SVM, LDA, Tree Kernels, RecursiveNN, and CNN. It turned out that neural network approaches outperform traditional methods for all four data sets, and the proposed model outperforms CNN and RecursiveNN.
2. Named Entity Recognition (NER)
The main task of named entity recognition (NER) is to classify named entities, such as Guido van Rossum, Microsoft, London, etc., into predefined categories like persons, organizations, locations, time, dates, and so on. Many NER systems were already created, and the best of them use neural networks.
In the paper, Neural Architectures for Named Entity Recognition, two models for NER were proposed. The models require character-based word representations learned from the supervised corpus and unsupervised word representations learned from unannotated corpora [4]. Numerous tests were carried on using different data sets like CoNLL-2002 and CoNLL-2003 in English, Dutch, German, and Spanish languages. The team concluded that without a requirement of any language-specific knowledge or resources, such as gazetteers, their models show state-of-the-art performance in NER.
3. Part-of-Speech Tagging
Part-of-speech (POS) tagging has many applications including parsing, text-to-speech conversion, information extraction, and so on. In the work, Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network a recurrent neural network with word embedding for part-of-speech (POS) tagging task is presented [5]. The model was tested on the Wall Street Journal data from Penn Treebank III data set and achieved a performance of 97.40% tagging accuracy.
4. Semantic Parsing and Question Answering
Question Answering systems automatically answer different types of questions asked in natural languages including definition questions, biographical questions, multilingual questions, and so on. Neural networks usage makes it possible to develop high performing question answering systems.
In Semantic Parsing via Staged Query Graph Generation Question Answering with Knowledge Base Wen-tau Yih, Ming-Wei Chang, Xiaodong He, and Jianfeng Gao described the developed semantic parsing framework for question answering using a knowledge base. Authors say their method uses the knowledge base at an early stage to prune the search space and thus simplifies the semantic matching problem [6]. It also applies an advanced entity linking system and a deep convolutional neural network model that matches questions and predicate sequences. The model was tested on WebQuestions data set, and it outperforms previous methods substantially.
5. Paraphrase Detection
Paraphrase detection determines whether two sentences have the same meaning. This task is especially important for question answering systems since there are many ways to ask the same question.
Detecting Semantically Equivalent Questions in Online User Forums suggests a method for identifying semantically equivalent questions based on a convolutional neural network. The experiments are performed using the Ask Ubuntu Community Questions and Answers (Q&A) site and Meta Stack Exchange data. It was shown that the proposed CNN model achieves high accuracy especially when the words embedded are pre-trained on in-domain data. The authors compared their model?s performance with Support Vector Machines and a duplicate detection approach. They demonstrated that their CNN model outperforms the baselines by a large margin [7].
In the study, Paraphrase Detection Using Recursive Autoencoder, a novel recursive autoencoder architecture is presented. It learns phrasal representations using recursive neural networks. These representations are vectors in an n-dimensional semantic space where phrases with similar meanings are close to each other [8]. For evaluating the system, the Microsoft Research Paraphrase Corpus and English Gigaword Corpus were used. The model was compared to three baselines, and it outperforms them all.
6. Language Generation and Multi-document Summarization
Natural language generation has many applications such as automated writing of reports, generating texts based on analysis of retail sales data, summarizing electronic medical records, producing textual weather forecasts from weather data, and even producing jokes.
In a recent paper, Natural Language Generation, Paraphrasing and Summarization of User Reviews with Recurrent Neural Networks, researchers describe a recurrent neural network (RNN) model capable of generating novel sentences and document summaries. The paper described and evaluated a database of 820,000 consumer reviews in the Russian language. The design of the network permits users control of the meaning of generated sentences. By choosing sentence-level features vector, it is possible to instruct the network; for example, ?Say something good about a screen and sound quality in about ten words? [9]. The ability of language generation allows production of abstractive summaries of multiple user reviews that often have reasonable quality. Usually, the summary report makes it possible for users to quickly obtain the information contained in a large cluster of documents.
7. Machine Translation
Machine translation software is used around the world despite its limitations. In some domains, the quality of translation is not good. To improve the results researchers try different techniques and models, including the neural network approach. The purpose of Neural-based Machine Translation for Medical Text Domain study is to inspect the effects of different training methods on a Polish-English machine translation system used for medical data. To train neural and statistical network-based translation systems The European Medicines Agency parallel text corpus was used. It was demonstrated that a neural network requires fewer resources for training and maintenance. In addition, a neural network often substituted words with other words occurring in a similar context [10].
8. Speech Recognition
Speech recognition has many applications, such as home automation, mobile telephony, virtual assistance, hands-free computing, video games, and so on. Neutral networks are widely used in this area.
In Convolutional Neural Networks for Speech Recognition, scientists explain how to apply CNNs to speech recognition in a novel way, such that the CNN?s structure directly accommodates some types of speech variability like varying speaking rate [11]. TIMIT phone recognition and a large-vocabulary voice search tasks were used.
9. Character Recognition
Character Recognition systems also have numerous applications like receipt character recognition, invoice character recognition, check character recognition, legal billing document character recognition, and so on. The article Character Recognition Using Neural Network presents a method for the recognition of handwritten characters with 85% accuracy [12].
10. Spell Checking
Most text editors let users check if their text contains spelling mistakes. Neural networks are now incorporated into many spell-checking tools.
In Personalized Spell Checking using Neural Networks a new system for detecting misspelled words was proposed. This system is trained on observations of the specific corrections that a typist makes [13]. It outwits many of the shortcomings that traditional spell-checking methods have.
Summary
In this article, we described Natural Language Processing problems that can be solved using neural networks. As we showed, neural networks have many applications such as text classification, information extraction, semantic parsing, question answering, paraphrase detection, language generation, multi-document summarization, machine translation, and speech and character recognition. In many cases, neural networks methods outperform other methods.
Resources
1. http://www.aclweb.org/anthology/D14-1181
2. https://arxiv.org/pdf/1502.01710.pdf
3. https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9745/9552
4. http://www.aclweb.org/anthology/N16-1030
5. https://arxiv.org/pdf/1510.06168.pdf
6. http://www.aclweb.org/anthology/P15-1128
7. https://www.aclweb.org/anthology/K15-1013
8. https://nlp.stanford.edu/courses/cs224n/2011/reports/ehhuang.pdf
9. http://www.meanotek.ru/files/TarasovDS(2)2015-Dialogue.pdf
10. http://www.sciencedirect.com/science/article/pii/S1877050915025910
11. https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/CNN_ASLPTrans2-14.pdf
12. http://www.ijettjournal.org/volume-4/issue-4/IJETT-V4I4P230.pdf
13. http://www.cs.umb.edu/~marc/pubs/garaas_xiao_pomplun_HCII2007.pdf