Attention Models

What are Attention Models?

Attention models, or attention mechanisms, are input processing techniques for neural networks that allows the network to focus on specific aspects of a complex input, one at a time until the entire dataset is categorized. The goal is to break down complicated tasks into smaller areas of attention that are processed sequentially. Similar to how the human mind solves a new problem by dividing it into simpler tasks and solving them one by one.


Attention models require continuous reinforcement or backpopagation training to be effective.

How do Attention Models work?

In broad strokes, attention is expressed as a function that maps a query and “s set” of key value pairs to an output. One in which the query, keys, values, and final output are all vectors. The output is then calculated as a weighted sum of the values, with the weight assigned to each value expressed by a compatibility function of the query with the corresponding key value.


In practice, attention allows neural networks to approximate the visual attention mechanism humans use. Like people processing a new scene, the model studies a certain point of an image with intense, “high resolution” focus, while perceiving the surrounding areas in “low resolution,” then adjusts the focal point as the network begins to understand the scene.

What Types of Problems Do Attention Models Solve?

While this is a powerful technique for improving computer vision, the most work so far with attention mechanisms has focused on Neural Machine Translation (NMT). Traditional automated translation systems rely on massive libraries of data labeled with complex functions mapping each word’s statistical properties. 

Using attention mechanisms in NMT is a much simpler approach. Here, the meaning of a sentence is mapped into a fixed-length vector, which then generates a translation based on that vector as a whole. The goal isn’t to translate the sentence word for word, but rather pay attention to the general, “high level” overall sentiment. Besides drastically improving accuracy, this attention-driven learning approach is much easier to construct and faster to train.