Question Answering System using NLP
Abstract
Question Answering (QA) System is very useful as most of the deep learning related problems can be modeled as a question answering problem. Consequently, the field is one of the most researched fields in computer science today. The last few years have seen considerable developments and improvement in the state of the art, much of which can be credited to upcoming of Deep Learning. In this paper, a discussion about various approaches starting from the basic NLP and algorithms based approach has been done and the paper eventually builds towards the recently proposed methods of Deep Learning. Implementation details and various tweaks in the algorithms that produced better results have also been discussed. The evaluation of the proposed models was done on twenty tasks of babI dataset of Facebook
Introduction
The problem of making a fully functional question answering system is one problem which has been quite popular among researchers. The new algorithms, especially deep learning based algorithms have made a decent progress in text and image classification. But, these machines have still failed to solve the tasks which involve logical reasoning. One best example of such problems is the question answering problem. It is only recently that with the introduction of memory and attention based architectures there has been some progress in this field.
Types of Question Answering
There are three major modern paradigms of question answering:
a) IR-based Factoid Question Answering goal is to answer a user’s question by finding short text segments on the Web or some other collection of documents. In the question-processing phase a number of pieces of information from the question are extracted. The answer type specifies the kind of entity the answer consists of (person, location, time, etc.). The query specifies the keywords that should be used for the IR system to use in searching for documents.
b) Knowledge-based question answering is the idea of answering a natural language question by mapping it to a query over a structured database. The logical form of the question is thus either in the form of a query or can easily be converted into one. The database can be a full relational database, or simpler structured databases like sets of RDF triples. Systems for mapping from a text string to any logical form are called semantic parsers. Semantic parsers for question answering usually map either to some version of predicate calculus or a query language like SQL or SPARQL.
c) Using multiple information sources: IBM’s Watson [5,6] system from IBM that won the Jeopardy! challenge in 2011 is an example of a system that relies on a wide variety of resources to answer questions. The first stage is question processing. The DeepQA system runs parsing, named entity tagging, and relation extraction on the question. Then, like the text-based systems, the DeepQA system extracts the focus, the answer type (also called the lexical answer type or LAT), and performs question classification and question sectioning. Next DeepQA extracts the question focus. Finally the question is classified by type as definition question, multiple-choice, puzzle or fill-in-the-blank. Next is the candidate answer generation stage according to the question type, where the processed question is combined with external documents and other knowledge sources to suggest many candidate answers. These candidate answers can either be extracted from text documents or from structured knowledge bases. Then it is passed through the candidate answer scoring stage, which uses many sources of evidence to score the candidates. One of the most important is the lexical answer type. In the final answer merging and scoring step, it first merges the candidate answers that are equivalent. The merging and ranking is actually run iteratively; first the candidates are ranked by the classifier, giving a rough first value for each candidate answer, then that value is used to decide which of the variants of a name to select as the merged answer, then the merged answers are re-ranked.
LSTM model is used in this question answering system. Before moving to this we firstly understand about word embeddings.
Word embeddings
A word embedding is a learned representation for text where words that have the same meaning have a similar representation. It is this approach to representing words and documents that may be considered one of the key breakthroughs of deep learning on challenging natural language processing problems
RNN
Recurrent neural network are a type of Neural Network where the output from previous step are fed as input to the current step. In traditional neural networks, all the inputs and outputs are independent of each other, but in cases like when it is required to predict the next word of a sentence, the previous words are required and hence there is a need to remember the previous words. Thus RNN came into existence, which solved this issue with the help of a Hidden Layer. The main and most important feature of RNN is Hidden state, which remembers some information about a sequence.
LSTM
Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in the field of deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. It can not only process single data points (such as images), but also entire sequences of data (such as speech or video). For example, LSTM is applicable to tasks such as unsegmented, connected handwriting recognition, speech recognition and anomaly detection in network traffic or IDS’s (intrusion detection systems).
A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell.
LSTM networks are well-suited to classifying, processing and making predictions based on time series data, since there can be lags of unknown duration between important events in a time series. LSTMs were developed to deal with the exploding and vanishing gradient problems that can be encountered when training traditional RNNs. Relative insensitivity to gap length is an advantage of LSTM over RNNs, hidden Markov models and other sequence learning methods in numerous applications
THANK YOU!!