Sentence Completion

Sentence Completion

A Deep Learning Text Generator

Technologies & Tools

Python
TensorFlow

About

This Sentence Completion AI Model is a comprehensive Deep Learning model designed to generate meaningful and coherent sentences, given the first 5 words. It has been trained on a textual dataset which is a novel titled The Cave Girl by Edgar Rice Burroughs. Therefore, this model generates sentences that lie within the context of this Novel.

Given more computational power, a model like this one can be trained on a larger and more general dataset, and thatwould make the model more flexible in its text generation.

The Mechanism (Layman)

During the training phase, the model is fed with a load of text data from the novel. The novel is first split into its individual words while maintaining their original order. Then the model is set to start learning from the first 5 words to then end of the novel. During this journey, the model takes 5 words, and learns the word that comes immediately after them. Then model then shifts forward by one word and learns the word that follows

Learning Sequence

Learning Sequence

Step by step, the model learns what word comes next, given the prior 5 words.

As shown above, the model takes the five words in blue and learns what word comes after them. This set of words keeps shifting forward by 1 step, allowing the model to learn various combinations of words. In so doing, the model ends up with a general knowledge of what words come after each other.

So, can you guess what word comes after the words: I took a cup of

Advanced Explanation

Sentence completion is the process of predicting the most probable words that come after a given sample text. It is commonly used in email, document writing, text, and keyboard apps. It increases productivity as the user simply types a leading text and an accurate prediction of the next words is generated. This project explores the use of Long and Short-Term Memory (LSTM) Neural Networks in text generation. In this project, the Sentence Completion model uses only the learned data set for text generation and does not use Natural Language Processing (NLP) . Thus, the text will be generated based on what the LSTM model will be taught, regardless of the grammar

The article discusses four "Key Topics" that detail the development of the Sentence Completion Model.

1

Understanding LSTMs

2

Formulating a training strategy

3

Conducting evaluation and fine-tuning

4

Drawing conclusive insights

Introduction to LSTMs

LSTMs, or Long Short-Term Memory neural networks, exhibit significant capabilities in learning patterns within sequential data. When trained on text data, an LSTM can understand the underlying structure of the text and make predictions accordingly. To illustrate this ability, the plan is to employ LSTMs in a project focused on sentence completion using the novel "Cave Girl" by Edgar Rice Burroughs as the training dataset. The LSTM model will be presented with incomplete sentences and tasked with predicting subsequent words until it encounters a sentence terminator, such as a period or question mark.

The natural thought process of humans is not built from scratch; rather, it is influenced by past experiences, which persist in memory and drive consistent behavior. This phenomenon enables us to predict words that others might utter based on context. For instance, when someone begins a sentence with "I am really," we can easily anticipate that they might follow with "sad," "excited," or "tired," depending on the situation—a funeral might elicit "sad," a party "excited," and someone returning home after a busy day "tired." Our minds inherently map experiences, situations, and environments to appropriate actions and responses, effectively learning patterns from the world around us.

While Artificial Neural Networks (ANNs) aim to mimic the brain's functionality and have shown impressive performance, standard ANNs lack the ability to persistently retain information and infer from past events to predict subsequent ones. However, Recurrent Neural Networks (RNNs), specifically LSTMs, overcome this limitation by incorporating loops that enable the retention of crucial information and historical context.

A fundamental advantage of employing RNNs over standard neural networks lies in the sharing of features across time. Unlike standard neural networks, which lack memory of previous inputs, RNNs can remember past inputs, making them well-suited for sequential data processing. The historical context plays a vital role in RNN computation.

RNNs are particularly tailored for handling sequential data and find extensive application in Natural Language Processing (NLP) tasks. Their capacity for maintaining internal memory facilitates efficient learning from sequential data in machine learning problems. In addition to NLP, RNNs are also commonly employed in time series predictions.

Recurrent Neural Network (RNN) Structure

Recurrent Neural Network (RNN) Structure

An RNN at A looks at some input xt and outputs a value ht. The loops in the RNN can be viewed as a chain, where information from the previous unit is passed to the next.

Long Short-Term Memory networks (LSTMs) – are a special kind of RNN, capable of learning long-term dependencies. They were introduced by Hochreiter & Schmidhuber (1997). They work tremendously well on a large variety of problems and are now widely used.

LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn! All recurrent neural networks have the form of a chain of repeating modules of neural networks. In standard RNNs, this repeating module will have a very simple structure, such as a single tanh layer.

The repeating module in a standard RNN contains a single layer

The repeating module in a standard RNN contains a single layer

An RNN at A looks at some input xt and outputs a value ht. The loops in the RNN can be viewed as a chain, where information from the previous unit is passed to the next.

LSTMs also have this chain-like structure, but the repeating module has a different structure. Instead of having a single neural network layer, there are four, interacting in a very special way.

The repeating module in an LSTM contains four interacting layers

The repeating module in an LSTM contains four interacting layers

A detailed view of an LSTM with various gates can be seen in the image below.

A detailed view of an LSTM with various gates can be seen in the image below.

LSTM can capture long-range dependencies. It can have a memory about previous inputs for extended time durations. There are 3 gates in an LSTM cell. Memory manipulations in LSTM are done using these gates. Long short-term memory (LSTM) utilizes gates to control the gradient propagation in the recurrent network’s memory

Forget Gate:

Forget gate removes the information that is no longer useful in the cell state

Input Gate:

Additional useful information to the cell state is added by the input gate

Output Gate:

Additional useful information to the cell state is added by the output gate

Training Strategy

Still Documenting...

Fine-tuning and Evaluation

Still Documenting...

Conclusion

Still Documenting...