While deep neural networks are all the rage, the complexity of the major frameworks has been a barrier to their use for developers new to machine learning. There have been several proposals for improved and simplified high-level APIs for building neural network models, all of which tend to look similar from a distance but show differences on closer examination.
Keras is one of the leading high-level neural networks APIs. It is written in Python and supports multiple back-end neural network computation engines.
Keras and TensorFlow
Given that the TensorFlow project has adopted Keras as the high-level API for the upcoming TensorFlow 2.0 release, Keras looks to be a winner, if not necessarily the winner. In this article, we'll explore the principles and implementation of Keras, with an eye towards understanding why it’s an improvement over low-level deep learning APIs.
Even in TensorFlow 1.12, the official Get Started with TensorFlow tutorial uses the high-level Keras API embedded in TensorFlow, tf.keras. By contrast, the TensorFlow Core API requires working with TensorFlow computational graphs, tensors, operations, and sessions, some of which can be hard to understand when you're just beginning to work with TensorFlow. There are some advantages to using the low-level TensorFlow Core API, mostly when debugging, but fortunately you can mix the high-level and low-level TensorFlow APIs as needed.
Keras principles
Keras was created to be user friendly, modular, easy to extend, and to work with Python. The API was “designed for human beings, not machines,” and “follows best practices for reducing cognitive load.”
Neural layers, cost functions, optimizers, initialization schemes, activation functions, and regularization schemes are all standalone modules that you can combine to create new models. New modules are simple to add, as new classes and functions. Models are defined in Python code, not separate model configuration files.
Why Keras?
The biggest reasons to use Keras stem from its guiding principles, primarily the one about being user friendly. Beyond ease of learning and ease of model building, Keras offers the advantages of broad adoption, support for a wide range of production deployment options, integration with at least five back-end engines (TensorFlow, CNTK, Theano, MXNet, and PlaidML), and strong support for multiple GPUs and distributed training. Plus, Keras is backed by Google, Microsoft, Amazon, Apple, Nvidia, Uber, and others.
Keras back ends
Keras proper does not do its own low-level operations, such as tensor products and convolutions; it relies on a back-end engine for that. Even though Keras supports multiple back-end engines, its primary (and default) back end is TensorFlow, and its primary supporter is Google. The Keras API comes packaged in TensorFlow as tf.keras
, which as mentioned earlier will become the primary TensorFlow API as of TensorFlow 2.0.
To change back ends, simply edit your $HOME/.keras/keras.json
file and specify a different back-end name, such as theano
or CNTK
. Alternatively, you can override the configured back end by defining the environment variable KERAS_BACKEND
, either in your shell or in your Python code using the os.environ["KERAS_BACKEND"]
property.
Keras models
The Model is the core Keras data structure. There are two main types of models available in Keras: the Sequential
model, and the Model
class used with the functional API.
Keras Sequential models
The Sequential
model is a linear stack of layers, and the layers can be described very simply. Here is an example from the Keras documentation that uses model.add()
to define two dense layers in a Sequential
model:
import keras
from keras.models import Sequential
from keras.layers import Dense
#Create Sequential model with Dense layers, using the add method
model = Sequential()
#Dense implements the operation:
# output = activation(dot(input, kernel) + bias)
#Units are the dimensionality of the output space for the layer,
# which equals the number of hidden units
#Activation and loss functions may be specified by strings or classes
model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=10, activation='softmax'))
#The compile method configures the model’s learning process
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
#The fit method does the training in batches
# x_train and y_train are Numpy arrays --just like in the Scikit-Learn API.
model.fit(x_train, y_train, epochs=5, batch_size=32)
#The evaluate method calculates the losses and metrics
# for the trained model
loss_and_metrics = model.evaluate(x_test, y_test, batch_size=128)
#The predict method applies the trained model to inputs
# to generate outputs
classes = model.predict(x_test, batch_size=128)
The comments in the code above are worth reading. It’s also worth noting how little cruft there is in the actual code compared to, say, the low-level TensorFlow APIs. Each layer definition requires one line of code, the compilation (learning process definition) takes one line of code, and fitting (training), evaluating (calculating the losses and metrics), and predicting outputs from the trained model each take one line of code.
Keras functional API
The Keras Sequential model is simple but limited in model topology. The Keras functional API is useful for creating complex models, such as multi-input/multi-output models, directed acyclic graphs (DAGs), and models with shared layers.
The functional API uses the same layers as the Sequential model but provides more flexibility in putting them together. In the functional API you define the layers first, and then create the Model, compile it, and fit (train) it. Evaluation and prediction are essentially the same as in a Sequential model, so have been omitted in the sample code below.
from keras.layers import Input, Dense
from keras.models import Model
# This returns a tensor
inputs = Input(shape=(784,))
# a layer instance is callable on a tensor, and returns a tensor
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)
# This creates a model that includes
# the Input layer and three Dense layers
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(data, labels) # starts training
Keras layers
In the previous examples we only used Dense
layers. Keras has a wide selection of predefined layer types, and also supports writing your own layers.
Core layers include Dense
(dot product plus bias), Activation
(transfer function or neuron shape), Dropout
(randomly set a fraction of input units to 0 at each training update to avoid overfitting), Lambda
(wrap an arbitrary expression as a Layer
object), and several others. Convolution layers (the use of a filter to create a feature map) run from 1D to 3D and include the most common variants, such as cropping and transposed convolution layers for each dimensionality. 2D convolution, which was inspired by the functionality of the visual cortex, is commonly used for image recognition.
Pooling (downscaling) layers run from 1D to 3D and include the most common variants, such as max and average pooling. Locally connected layers act like convolution layers, except that the weights are unshared. Recurrent layers include simple (fully connected recurrence), gated, LSTM, and others; these are useful for language processing, among other applications. Noise layers help to avoid overfitting.
Keras datasets
Keras supplies seven of the common deep learning sample datasets via the keras.datasets
class. That includes cifar10 and cifar100 small color images, IMDB movie reviews, Reuters newswire topics, MNIST handwritten digits, MNIST fashion images, and Boston housing prices.
Keras applications and examples
Keras also supplies ten well-known models, called Keras Applications, pretrained against ImageNet: Xception, VGG16, VGG19, ResNet50, InceptionV3, InceptionResNetV2, MobileNet, DenseNet, NASNet, MobileNetV2TK. You can use these to predict the classification of images, extract features from them, and fine-tune the models on a different set of classes.
By the way, fine-tuning existing models is a good way to speed up training. For example, you can add layers as you wish, freeze the base layers to train the new layers, then unfreeze some of the base layers to fine-tune the training. You can freeze a layer with by setting layer.trainable = False
.
The Keras examples repository contains more than 40 sample models. They cover vision models, text and sequences, and generative models.
Deploying Keras
Keras models can be deployed across a vast range of platforms, perhaps more than any other deep learning framework. That includes iOS, via CoreML (supported by Apple); Android, via the TensorFlow Android runtime; in a browser, via Keras.js and WebDNN; on Google Cloud, via TensorFlow-Serving; in a Python webapp back end; on the JVM, via DL4J model import; and on Raspberry Pi.
To get started with Keras, read the documentation, check out the code repository, install TensorFlow (or another backend engine) and Keras, and try out the Getting Started tutorial for the Keras Sequential model. From there you can advance to other tutorials, and eventually explore the Keras Examples.