A compilation of concepts I want to remember...

Navigation
 » Home
 » About Me
 » Github

Bag of Words: Working through some pytorch tutorials

18 Aug 2017 » deeplearning, pytorch

Some NLP with Pytorch

The pytorch tutorial on NLP, really introducing the features of pytorch, is a great crash course introduction to NLP. This exercise for me, is more about getting comfortable with a new frame work then anything (have to jump on the Pytorch band wagon with the release of v2).

The first section is using Bag of words, checkout the wiki for a decent intro and rather comprehensive introduction. In short, in its naive form, the frequency of each word is used as a feature for training a classifier.

The first step is creating a vocabulary, and is done by taking the union of the data sets under consideration. In the pytorch tutorial data and test_data are combined and used to create an index of words, word_to_ix. The simple script provided does the job.

word_to_ix = {}
for sent, _ in data + test_data:
    for word in sent:
        if word not in word_to_ix:
            word_to_ix[word] = len(word_to_ix)

The wiki example is rather clear so will introduce below:

  1. John likes to watch movies. Mary likes movies too.
  2. John also likes to watch football games.

These two texts will describe the sample space, thus the list constructed is as follows.

[
    "John",
    "likes",
    "to",
    "watch",
    "movies",
    "Mary",
    "too",
    "also",
    "football",
    "games"
]

We can get the bag of words representation by backing out the frequency of each word for each text sample, thus we get:

  1. [1, 2, 1, 1, 2, 1, 1, 0, 0, 0]
  2. [1, 1, 1, 1, 0, 0, 0, 1, 1, 1]

Training with BOW

I have left the original comments as they are helpful for a new comer. The texts are converted to feature vectors with the appropriate type as inputs to autograd.Variable() need to be of torch tensor type. The labels in string form are converted to a torch.LongTensor, an integer tensor likewise. The data set is used and trained for 100 epochs, and the results show that the model has learned to classify between and english and spanish.

        # Step 1. Remember that Pytorch accumulates gradients.
        # We need to clear them out before each instance
        model.zero_grad()

        # Step 2. Make our BOW vector and also we must wrap the target in a
        # Variable as an integer. For example, if the target is SPANISH, then
        # we wrap the integer 0. The loss function then knows that the 0th
        # element of the log probabilities is the log probability
        # corresponding to SPANISH
        bow_vec = autograd.Variable(make_bow_vector(instance, word_to_ix))
        target = autograd.Variable(make_target(label, label_to_ix))

        # Step 3. Run our forward pass.
        log_probs = model(bow_vec)

        # Step 4. Compute the loss, gradients, and update the parameters by
        # calling optimizer.step()
        loss = loss_function(log_probs, target)
        loss.backward()
        optimizer.step()

The tutorial tests on the word “creo” . Before we find the log probabilities as spanish: -0.1599 and english: -0.1411, while after 100 epochs of training on such a small data set we find that the results are, spanish: 0.3315 and english: -0.6325.

Recap

Its always fun trying to change the tutorial in some form, to see how different tweaks impact the results. Since we started with such a small data set, a quick and easy adjust, is to add more examples, (in particular spanish text with the use of creo). Increasing the data set by a few samples, quickly improves the results with the log probabilities moving to spanish: 0.5855 and english: -7205. In this case, its probably best to apply the softmax function to represent the log probabilities as a probability distribution between 0 and 1*. The probability of creo being spanish moves from 72.39% to 78.68%. Not bad of an improvement for a small boost to the sample set, but more impressive is that we were able to get such results with such a small data set to start with. *To convert from log probs to a probability distribution between 0 and 1, simply exponentiate the log probs values using a base e, and normalize by the sum for each case).

Reference

  1. http://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html
  2. https://en.wikipedia.org/wiki/Bag-of-words_model
  3. https://en.wikipedia.org/wiki/Softmax_function
  4. http://www.spanishdict.com/translate/yo%20creo