A Theano framework for building and training neural networks
We introduce two Python frameworks to train neural networks on large datasets: Blocks and Fuel. Blocks is based on Theano, a linear algebra compiler with CUDA-support. It facilitates the training of complex neural network models by providing parametrized Theano operations, attaching metadata to Theano's symbolic computational graph, and providing an extensive set of utilities to assist training the networks, e.g. training algorithms, logging, monitoring, visualization, and serialization. Fuel provides a standard format for machine learning datasets. It allows the user to easily iterate over large datasets, performing many types of pre-processing on the fly.READ FULL TEXT VIEW PDF
A Theano framework for building and training neural networks
A data pipeline framework for machine learning
Blocks and Fuel are being developed by the Montreal Institute of Learning Algorithms (MILA) at the University of Montreal. Their focus lies on quick prototyping of complex neural network models. The intended target audience is researchers who design and experiment machine learning algorithms, especially deep learning algorithms.
Several other libraries built on top of Theano exist, including Pylearn2 and GroundHog (also developed by MILA), Lasagne, and Keras. Like its MILA-developed predecessors, Blocks maintains a focus on research and rapid prototyping. Blocks differentiates itself most notably from the above mentioned toolkits in its unique relationship with Theano. Instead of introducing new abstract objects representing ‘models’ or ‘layers’, Blocks annotates the Theano computational graph, maintaining the flexibility of Theano while making large models manageable.
Data processing is an integral part of training neural networks, which is not addressed by many of the aforementioned frameworks. Fuel aims to fill this gap. It provides tools to download datasets and iterate/preprocess them efficiently.
Both Blocks and Fuel were developed from the very beginning with a strong focus on software engineering best practices. The development teams strive for high test coverage, thorough documentation and carefully considered APIs.
Blocks comprises several components, which can be used independently from each other.
Theano is a popular choice for the implementation of neural networks (see e.g. Goodfellow et al. (2013b); Pascanu et al. (2013)). Blocks and many other libraries, such as Pylearn2 (Goodfellow et al., 2013a)
, build on Theano by providing reusable components that are common in neural networks, such as linear transformations followed by non-linear activations, or more complicated components such as LSTM units. In Blocks these components are referred to asbricks or “parametrized Theano operations”.
Bricks consist of a set of Theano shared variables, for example the weight matrix of a linear transformation or the filters of a convolutional layer. Bricks use these parameters to transform symbolic Theano variables.
Bricks can contain other bricks within them. This introduces a hierarchy on top of the flat computational graph defined by Theano, which makes it easier to address and configure complex models programmatically.
The parameters of bricks can be initialized using a variety of schemes that are popular in the neural network literature, such as sparse initialization, orthogonal initialization for recurrent weights, etc.
Blocks comes with a large number of ‘bricks’. Besides standard activations and transformations used in feedforward networks (maxout, convolutional layers, table lookups) these also include a variety of more advanced recurrent neural network components like LSTM, GRU, and support for attention mechanisms (for an overview of different kinds of network architectures, regularization methods, and optimization algorithms seeBengio et al. (2015)).
Large neural networks can often result in Theano computational graphs containing hundreds of variables and operations. Blocks does not attempt to abstract away this complex graph, but to make it manageable by annotating variables in the graph. Each input, output, and parameter of a brick is annotated as such. Variables can also be annotated with the role they play in a model, such as weights, biases, filters, etc.
A series of convenience tools were written that allow users to filter the symbolic computational graph based on these annotations, and apply transformations to the graph. Many regularization methods such as weight decay, weight noise, or dropout can be implemented in a generic, model-agnostic way. Furthermore a complex query mechanism allows for their fine-grained application such as “apply weight noise to all weights that belong to an LSTM unit whose parent is a brick with the name foo”.
Experiment management is performed using a ‘main loop’, which combines a Theano graph with a training algorithm and a Fuel data stream. The main loop has a flexible extension interface, which is used to perform tasks such as monitoring on a validation set, serialization, learning rate scheduling, plotting, printing and saving logs, etc.
Fuel’s goal is to provide a common interface to a variety of data formats and published datasets such as MNIST, CIFAR-10, ImageNet, etc. while making it easy for users to write an interface to new datasets.
Blocks relies on Fuel for its data interface, but Fuel can easily be used by other machine learning frameworks that interface with datasets.
Fuel allows for different ways of iterating over these datasets, such as sequential or shuffled minibatches, support for in-memory and out-of-core datasets, and resampling (cross validation, bootstrapping).
It also provides a variety of on-the-fly preprocessing methods such as random cropping of images, creating n-grams from text files, and the ability to implement many other methods easily. These preprocessing steps can be chained together to form more complex transformations of the input data.
To sidestep Python’s global interpreter lock (GIL) and ensure optimal performance, Fuel can perform all operations in a separate process, transferring the processed data to the training process using TCP sockets.
Datasets are distributed in a wide range of formats. Fuel simplifies dataset storage by converting all built-in datasets to annotated HDF5 files (The HDF Group, 1997-2015). In addition to being an efficient format for large datasets that don’t fit into memory, HDF5 is easy to organize and document. All of the data is stored in a single HDF5 file, with the following metadata attached:
What are the data sources available (e.g. features, targets, etc.)?
How are these data sources officially split (e.g. training, validation, and test sets)?
Are some data sources unavailable for some splits (e.g. test set only offers unlabeled examples)?
What are the axes semantics for a given data source (e.g. batch, feature, width, height, channel, time, etc.)?
Integrating user data into Fuel via HDF5 is straightforward, and simply requires the data to be written to an HDF5 file with metadata according to the specifications. Finally, while standardizing by convention on HDF5, the Fuel dataset API is independent of it; users are free to implement dataset objects employing other backends and use them with the rest of Fuel’s components.
Fuel offers built-in scripts that automate the task of downloading datasets, (similar to e.g. skdata111https://jaberg.github.io/skdata/) and converting them to Fuel’s HDF5 specification.
The fuel-download script is used to download raw data files. Downloading the raw MNIST data files is as easy as typing fuel-download mnist. The fuel-convert script is used to convert raw data files into HDF5-format.
Reproducibility being an important feature of both Fuel and Blocks, the fuel-convert script automatically tags all files it creates with relevant module and interface versions and the exact command that was used to generate these files. Inspection of this metadata is done with the fuel-info script.
The training of large, deep neural networks can often take days or even weeks. Hence, regular checkpointing of training progress is important. Blocks aims to make the resumption of experiments entirely transparent, even across platforms, while ensuring the reproducibility of these experiments.
This goal is complicated by shortcomings in Python’s Pickle serialization module, which is unable to serialize many iterators, which Fuel heavily depends on in order to iterate over large datasets efficiently. To circumvent this we reimplemented the itertools module from the Python standard library to be serializable222https://github.com/mila-udem/picklable-itertools.
As a result, Blocks experiments are able to be interrupted in the middle of a pass over the dataset, serialized, and resumed later, without affecting the final training results.
Blocks and Fuel are well documented, with both API documentation and tutorials available online. Two active mailing lists333https://groups.google.com/d/forum/blocks-users and https://groups.google.com/d/forum/fuel-users support users of the libraries. A separate repository444https://github.com/mila-udem/blocks-examples
is maintained for users to contribute non-trivial examples of the use of Blocks. Implementations of neural machine translation models (NMT,Bahdanau et al. (2015)) and the Deep Recurrent Attentive Writer (DRAW, Gregor et al. (2015)) model are publicly available examples of state-of-the-art models succesfully implemented using Blocks.
The authors would like to acknowledge the support of the following agencies for research funding and computing support: NSERC, Calcul Québec, Compute Canada, the Canada Research Chairs and CIFAR. Bahdanau thanks Planet Intelligent Systems GmbH for their financial support. We would also like to thank the developers of Theano.