AI Benchmarks and Exercises


The Copying Task

The Copying Task is a benchmarking task in NLP that assesses recurrent models (and other sequential models)' ability to retain information for long sequence lengths. Before transformers, RNNs like LSTM or GRU suffered from the vanishing gradient problem which means that they become less effective as the length of the sequence increases. 
The copying task can be used to illustrate a model's ability to handle long sequences and see how well it deals with the increasingly long sequences of letters/characters/whatever. 
In principle, we show the model a sequence of letters it has to remember, then a long blank sequence and then ask it to spit out the original sequence of characters that we actually care about at the end. 

Task Definition

In the paper they use a to define the alphabet of letters that we care about:
We can define different lengths of sequence that we want to test against - in the paper they use T to represent this value. For example, T=100 means "define a sequence, run 100 blanks through the system, then have it recall the sequence from the beginning".
To set up the test we:
  1. use the characters from a 0 to 7 to generate a random sequence of characters that needs to be memorised - this sequence is 10 characters long. We sample with replacement which means letters are allowed to repeat.
  2. Generate T blank characters
  3. Use the delimiter to signal to the model that it should generate the remaining steps
  4. Have the model infer the sequence - since the length of the sequence is known, we can halt after the correct number of characters have been outputted and check to make sure that it is correct.
Our sequence would look something like this.

Scaling Characteristics


References

Arjovsky, Martin, Amar Shah, and Yoshua Bengio. ‘Unitary Evolution Recurrent Neural Networks’