Date:

Share:

LSTM-Based Name Generator – First Dive into NLP

Related Articles


After an in-depth study of computer vision techniques, we continue our learning process and begin to dive into the world of natural language processing (NLP) as the world of artificial intelligence is not just limited to vision and causes computers to learn how humans perceive data visually.

It is designed to empower technology to encompass all the umbrella of man-made skills and tasks. One of these skills is language perception. Language is a powerful communication tool that makes the world run the way we know it. In short, language has been the sum of the human experiences since the dawn of time. Human beings will not be able to convey their feelings, ideas, emotions, desires and beliefs without the use of language. There can be no civilization and perhaps even religion without language.

It takes us into the world of natural language processing (NLP). Every Internet user has to use an NLP program. Natural language processing is used by search engines like Google and Bing to offer possible search queries.

When users start typing search parameters, search engines try to fill in the gaps for them. Users can choose from pre-defined criteria or type their own query. The uses of NLP are not limited to search engines.

NLP is used in voice-activated devices like Siri and Alexa for language processing. NLP is used by chatbots to provide more accurate answers to end-user inquiries. For better data sets, the technique can be used to extract vital information from unstructured data. For companies that use NLP, there are a number of clear benefits.

Businesses face a lot of incomprehensible and text-intensive data and require means to deal with it quickly. Natural human language makes up a significant portion of the data generated on the Internet and stored in databases, and organizations have not been able to effectively evaluate this data until recently. Natural language processing is useful here.

In this article, we are going to follow the deep learning approach to solving NLP problems. We will implement a repeating neural network (RNN), which is a long-term memory-based name generator (LSTM). Thus, the article will follow the following structure:


What is NLP?

NLP is an advanced form of linguistics that can be thought of as an extension of classical linguistics to computational linguistics.

linguistics He studied language in its entirety, which includes grammar, semantics and phonetics. Language norms were developed and tested in classical linguistics. Although formal approaches to syntax and semantics have progressed significantly, the most fascinating problems in natural language processing continue to defy orderly orderly mathematical formalism.

The current study of linguistics using computer science methods is known as computational linguistics. Because the adoption of computational tools and thinking has dominated most areas of research, yesterday’s linguistics may be today’s computational linguistics.

Statistical techniques and statistical machine learning began to replace traditional rule-based approaches from top to bottom in the 1990s, due to their excellent results, speed and resilience. The statistical approach to the study of natural language now dominates the discipline, and perhaps defines it.

To represent the more engineering or empirical approach of statistical approaches, computational linguistics has become known as the natural language process, or NLP.

We are interested in tools and approaches from the discipline of natural language processing as machine learning practitioners who deal with text data. For severe problems of natural language processing, deep learning approaches show great potential.

Deep learning approaches have great potential when it comes to solving difficult natural language processing difficulties. Natural language processing allows computers to converse with humans in their native language and handle other language-related activities.

NLP allows computers to read text, hear sound, analyze it, gauge sentiments and identify which bits are significant, for example. Machines can now interpret language-based data more than humans can, without getting tired and consistently and fairly.

Automation will be critical to the rapid processing of text and audio data, given the vast volume of unstructured data generated daily, from medical records to social media.


What are RNNs?

A repeating neural network (RNN) is a type of artificial neural network designed to operate with time series or sequence data. Regular feeder neural networks are designed to handle unrelated data items.

However, if we have data in a sequence where one data point depends on the previous data point, we need to modify the neural network to explain these dependencies.

RNNs include a concept of ‘memory’, which allows them to store the states or information of a previous input in order to construct the next output of the sequence. RNNs are basically developed to handle streams of data such as textual data.

In the case of sentences, textual data has different meanings depending on the context around a word, this makes the interpretation of textual data extremely difficult and almost impossible using the generic Fed-Forward Artificial Neural Network. Here help the RNN memory concepts mentioned above. This clearly helps the contextual data streams.

RNN has a strange ability to remember information. All inputs in other neural networks are not related to each other.

In RNN, however, all inputs are connected. Let us imagine that you need to anticipate the next word in the phrase. In this situation, the connection between all the previous words helps in predicting improved output.

During training, the RNN monitors all of these relationships. To do this, the RNN builds networks with loops in them, which allow it to store information.

The neural network can take the input sequence thanks to its loop structure. You will have a better understanding of this if you look at the unrolled version.

LSTM network diagram

LSTM network diagram

There are four basic types of RNNs. These include:

  • One by one
  • One for many
  • Many to one
  • Many to many

For more information on these types, visit Here
.


Introduction to the LSTM Algorithm

RNN applies a function to the current data to completely change it to add new information.

As a result, the whole information changes, that is, there is no distinction between ‘important’ and ‘not so essential’ information. LSTMs, on the other hand, use duplication and addition to make minor data changes. Information passes through a mechanism known as cell states in LSTMs.

LSTMs may selectively forget or forget information in this way. There are different dependencies on information in a specific cell state.


LSTM architecture

A typical LSTM network consists of a number of memory blocks known as cells (the rectangles we see in the image). The cell state and the latent state are the two states transferred to the next cell. Memory blocks are responsible for remembering things, and are handled by three basic mechanisms called gates. Each of these is detailed below.

LSTM cell architecture

LSTM cell architecture

A forgotten gateway is responsible for deleting the data from the cell state. By duplicating a filter, information that is no longer needed by the LSTM to understand things or has less value is eliminated. This is essential for optimal LSTM network performance.

X_t is the input at this point, and h_t-1 is the hidden state from the previous cell. Weight matrices are multiplied by the inputs provided, and bias is applied.

This value is then given to the sigmoid function. The sigmoid function gives vectors with values ​​ranging from 0 to 1, one for each cell state number.

The sigmoid function is responsible for determining which data should be saved and which should be discarded. When the forget rate returns ‘0’ for a specific value in the cell state, it signifies that the forget gate wants the cell state to completely forget this piece of information.

‘1’, on the other hand, indicates that the gate of oblivion wants to remember the complete piece of data. The cell state is multiplied by the vector output of the sigmoid function.

The input gateway is responsible for updating the cell status with new information. As you can see in the image above, adding information is a three step process.

  • A sigmoid function is used to determine which values ​​should be added to the cell state. This is similar to the forgetfulness gate in that it functions as a filter for all data from h t-1 and x t.
  • Create a vector that contains all the potential values ​​that can be added to the cell state (as determined by h t-1 and xt). The tanh function, which returns values ​​ranging from -1 to +1, is used for this purpose.
  • Adding Information This advantage to the cell state by multiplying the value of the regulatory filter (sigmoid gate) by a generated vector (tanh function).

Output gate operation may be broken down into three parts once more:

  • After applying the tanh function to the cell state, the values ​​are scaled from 1 to +1, resulting in a vector.
  • Using the values ​​of h t-1 and xt, design a filter that can control the values ​​to be derived from the previously determined vector. The sigmoid function is used again in this filter.
  • Doubling the value of this regulatory filter is a vector generated in step 1 and sending it as output as well as to the hidden state of the next cell.

Create a name generator with LSTM

Now that we have examined the basics of repeating neural networks and the LSTM algorithm, we can move on to the LSTM application for name generator development.

We need TensorFlow as a prerequisite before proceeding to the application. We can use the pip command to install TensorFlow.

pip install --upgrade tensorflow

Now, we can start coding. As always, you can access the full code in JupyterNotebook at Google Collabor
.

import pandas as pd
import numpy as np
import tensorflow as tf
import time
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.optimizers import RMSprop
import numpy as np
import random
import os

We first import all the necessary modules. Pandas is a library for handling data and manipulating. Numpy is a math processing library. Tensorflow is a framework for developing machine models and deep learning.

step_length = 1 # The step length we take to get our samples from our corpus
epochs = 50 # Number of times we train on our full data
batch_size = 32 # Data samples in each training step
latent_dim = 64 # Size of our LSTM
dropout_rate = 0.2 # Regularization with dropout
model_path = os.path.realpath('./poke_gen_model.h5') # Location for the model
load_model = False # Enable loading model from disk
store_model = True # Store model to disk after training
verbosity = 1 # Print result for each epoch
gen_amount = 10 # How many

We then declare all the necessary variables we will use in the code.

input_path = os.path.realpath('names.txt')
input_names = []
print('Reading names from file:')
with open(input_path) as f:
    for name in f:
        name = name.rstrip()
        if len(input_names) < 10:
            print(name)
        input_names.append(name)
print('...')

Here, we print some starting examples from the test files of the names.

Reading names from file:
Abbas
Abbey
Abbott
Abdi
Abel
Abraham
Abrahams
Abrams
Ackary
Ackroyd
# Make it all to a long string
concat_names = 'n'.join(input_names).lower()

# Find all unique characters by using set()
chars = sorted(list(set(concat_names)))
num_chars = len(chars)

# Build translation dictionaries, 'a' -> 0, 0 -> 'a'
char2idx = dict((c, i) for i, c in enumerate(chars))
idx2char = dict((i, c) for i, c in enumerate(chars))

# Use longest name length as our sequence window
max_sequence_length = max([len(name) for name in input_names])

print('Total chars: '.format(num_chars))
print('Corpus length:', len(concat_names))
print('Number of names: ', len(input_names))
print('Longest name: ', max_sequence_length)

Now, we need to find the unique characters that make up all the names. This can be done by undoing all the names into one long string and then sorting it into a list.

It is then used to create a dictionary that maps indexes to characters. It converts the cardinal data into regular data and this coding can be used to train the network because models only understand numbers.

sequences = []
next_chars = []

# Loop over our data and extract pairs of sequances and next chars
for i in range(0, len(concat_names) - max_sequence_length, step_length):
    sequences.append(concat_names[i: i + max_sequence_length])
    next_chars.append(concat_names[i + max_sequence_length])

num_sequences = len(sequences)

print('Number of sequences:', num_sequences)
print('First 10 sequences and next chars:')
for i in range(10):
    print('X=[] y=[]'.replace('n', ' ').format(sequences[i], next_chars[i]).replace('n', ' '))

Here, we loop all the data and extract pairs of sequences with the following character. Below, the first 10 sequences and their next characters are shown. It is used to provide the context in the data.

X = np.zeros((num_sequences, max_sequence_length, num_chars), dtype=np.bool)
Y = np.zeros((num_sequences, num_chars), dtype=np.bool)

for i, sequence in enumerate(sequences):
    for j, char in enumerate(sequence):
        X[i, j, char2idx[char]] = 1
        Y[i, char2idx[next_chars[i]]] = 1

print('X shape: '.format(X.shape))
print('Y shape: '.format(Y.shape))

After dividing the data into X (sequences) and Y (next character), we look at the shape of X and Y. These X and Y will be used in training as input and the corresponding character.

model = Sequential()
model.add(LSTM(latent_dim,
               input_shape=(max_sequence_length, num_chars),
               recurrent_dropout=dropout_rate))
model.add(Dense(units=num_chars, activation='softmax'))

optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy',
              optimizer=optimizer)

model.summary()

We now define the LSTM model using Keras’ predefined LSTM layers. The Dense is a normal neural network that is fully connected.

if load_model:
    model.load_weights(model_path)
else:
    start = time.time()
    print('Start training for  epochs'.format(epochs))
    history = model.fit(X, Y, epochs=epochs, batch_size=batch_size, verbose=verbosity)
    end = time.time()
    print('Finished training - time elapsed:', (end - start)/60, 'min')
if store_model:
    print('Storing model at:', model_path)
    model.save(model_path)

Here, we begin the process of training the model by calling the fit module (). We pass to this the number of ages that defines the number of times the model performs the training process on the entire data set.

# Start sequence generation from end of the input sequence
sequence = concat_names[-(max_sequence_length - 1):] + 'n'

new_names = []
print(' new names are being generated'.format(gen_amount))

while len(new_names) < gen_amount:
    # Vectorize sequence for prediction
    x = np.zeros((1, max_sequence_length, num_chars))
    for i, char in enumerate(sequence):
        x[0, i, char2idx[char]] = 1

    # Sample next char from predicted probabilities
    probs = model.predict(x, verbose=0)[0]
    probs /= probs.sum()
    next_idx = np.random.choice(len(probs), p=probs)
    next_char = idx2char[next_idx]
    sequence = sequence[1:] + next_char

    # New line means we have a new name
    if next_char == 'n':
        gen_name = [name for name in sequence.split('n')][1]
        
        # Never start name with two identical chars, could probably also
        if len(gen_name) > 2 and gen_name[0] == gen_name[1]:
            gen_name = gen_name[1:]
        
        # Discard all names that are too short
        if len(gen_name) > 2:
            # Only allow new and unique names
            if gen_name not in input_names + new_names:
                new_names.append(gen_name.capitalize())
        
        if 0 == (len(new_names) % (gen_amount/ 10)):
            print('Generated '.format(len(new_names)))

Now, we create 10 names from the weights learned, i.e. the LSTM model we have just trained is used to create the names.

print_first_n = min(10, gen_amount)
print('First  generated names:'.format(print_first_n))
for name in new_names[:print_first_n]:
    print(name)

Finally, we present the names created by the model.

First 10 generated names:
Zaoui
Palner
Palner
Pane
Panrett
Panm
Parner
Parrey
Parrett
Parrison

Summary

Natural language processing is one of the most essential parts of artificial intelligence working for the technological revolution of the world. The use of repetitive neural networks for the interpretation of contextual data is widely used in the field of NLP.

LSTM is a form of repeating neural network used for consecutive data along with a memory concept service that examines the context as a whole to predict the next word. If you enjoyed this guide, take a look at other studies from us:

Source

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Popular Articles