Layer¶
A layer in Mocha is an isolated computation component that (optionally) takes some input blobs and (optionally) produces some output blobs. See Networks for an overview of the abstraction of layer and network in Mocha. Implementing a layer in Mocha means
- Characterizing the layer (e.g. does this layer define a loss function?) so that the network topology engine knows how to properly glue the layers together to build a network.
- Implementing the computation of the layer, either in a backend-independent way, or separately for each backend.
Defining a Layer¶
A layer, like many other computational components in Mocha, consists of two parts:
- A layer configuration, a subtype of
Layer
. - A layer state, a subtype of
LayerState
.
Layer
defines how a layer should be constructed and it should behave, while
LayerState
is the realization of a layer which actually holds the data
blobs.
Mocha has a helper macro @defstruct
to define a Layer
subtype. For
example
@defstruct PoolingLayer Layer (
name :: AbstractString = "pooling",
(bottoms :: Vector{Symbol} = Symbol[], length(bottoms) > 0),
(tops :: Vector{Symbol} = Symbol[], length(tops) == length(bottoms)),
(kernel :: NTuple{2, Int} = (1,1), all([kernel...] .> 0)),
(stride :: NTuple{2, Int} = (1,1), all([stride...] .> 0)),
(pad :: NTuple{2, Int} = (0,0), all([pad...] .>= 0)),
pooling :: PoolingFunction = Pooling.Max(),
neuron :: ActivationFunction = Neurons.Identity(),
)
@defstruct
can be used to define a general immutable struct. The first
parameter is the struct name, the second parameter is the super-type and then
a list of struct fields follows. Each field requires a name, a type and
a default value. Optionally, an expression can be added to verify the
user-supplied value meets the requirements.
This macro will automatically define a constructor with keyword arguments for each field. This makes the interface easier to use for the end-user.
Each layer needs to have a field name
. When the layer produce output blobs, it
has to have a property tops
, allowing the user to specify a list of names
for the output blobs the layer is producing. If the layer takes any number of
blobs as input, it should also have a property bottoms
for the user to
specify the names for the input blobs. Mocha will use the information specified
in tops
and bottoms
to wire the blobs in a proper data path for network
forward and backward iterations.
A subtype of LayerState
should be defined for each layer, correspondingly.
For example
type PoolingLayerState <: LayerState
layer :: PoolingLayer
blobs :: Vector{Blob}
blobs_diff :: Vector{Blob}
etc :: Any
end
A layer state should have a field layer
referencing to the corresponding
Layer
object. If the layer produce output blobs, the state should have
a field called blobs
, and the layer will write output into blobs
during
each forward iteration. If the layer needs back-propagation from the upper
layers, the state should also have a field called blobs_diff
. Mocha will
pass the blobs in blobs_diff
to the function computing backward iteration
in the corresponding upper layer. The back-propagated gradients will be
written into blobs_diff
by the upper layer, and the layer can make use of this
when computing the backward iteration.
Other fields and/or behaviors are required depending on the layer type (see below).
Characterizing a Layer¶
A layer is characterized by applying the macro @characterize_layer
to the
defined subtype of Layer
. The default characterizations are given by
@characterize_layer(Layer,
is_source => false, # data layer, takes no bottom blobs
is_sink => false, # top layer, produces no top blobs (loss, accuracy, etc.)
has_param => false, # contains trainable parameters
has_neuron => false, # has a neuron
can_do_bp => false, # can do back-propagation
is_inplace => false, # does inplace computation, does not have own top blobs
has_loss => false, # produces a loss
has_stats => false, # produces statistics
)
Characterizing a layer can be omitted if all the behaviors are consists with the default specifications. The characterizations should be self-explanatory by the name and comments above. Some characterizations come with extra requirements:
is_source
- The layer will be used as a source layer of a network. Thus it should take no
input blob and the
Layer
object should have nobottoms
property. is_sink
- The layer will be used as a sink layer of a network. Thus it should produce no
output blob, and the
Layer
object should have notops
property. has_param
- The layer has trainable parameters. The
LayerState
object should have aparameters
field, containing a list ofParameter
objects. has_neuron
- The
Layer
object should have a property calledneuron
of typeActivationFunction
. can_do_bp
- Should be true if the layer has the ability to do back propagation.
is_inplace
- An inplace
Layer
object should have notops
property because the output blobs are the same as the input blobs. has_loss
- The
LayerState
object should have aloss
field. has_stats
The layer computes statistics (e.g. accuracy). The statistics should be accumulated across multiple mini-batches, until the user explicit reset the statistics. The following functions should be implemented for the layer
-
dump_statistics
(storage, layer_state, show)¶ storage
is a data storage (typically aCoffeeLounge
object) that is used to dump statistics into, via the functionupdate_statistics(storage, key, value)
.show
is a boolean value, when true, indicating that a summary of the statistics should also be printed to stdout.
-
reset_statistics
(layer_state)¶ Reset the statistics.
-
Layer Computation API¶
The life cycle of a layer is
- The user defines a
Layer
- The user uses
Layer
s to construct aNet
. TheNet
will callsetup_layer
on eachLayer
to construct the correspondingLayerState
. - During training, the solver use a loop to call the
forward
andbackward
functions of theNet
. TheNet
will then callforward
andbackward
of each layer in a proper order. - The user destroys the
Net
, which will call theshutdown
function of each layer.
-
setup_layer
(backend, layer, inputs, diffs)¶ Construct a corresponding
LayerState
object given aLayer
object.inputs
is a list of blobs, corresponding to the blobs specified by thebottoms
property of theLayer
object. If theLayer
does not have abottoms
property, then it will be an empty list.diffs
is a list of blobs. Each blob indiffs
corresponds to a blob ininputs
. When computing back propagation, the back-propagated gradients for each input blob should be written into the corresponding one indiffs
. Blobs ininputs
anddiffs
are taken fromblobs
andblobs_diff
of theLayerState
objects of lower layers.diffs
is guaranteed to be a list of blobs of the same length asinputs
. However, when some input blobs do not need back-propagated gradients, the corresponding blob indiffs
will be aNullBlob
.This function should set up its own
blobs
andblobs_diffs
(if any), matching the shape of its input blobs.
-
forward
(backend, layer_state, inputs)¶ Do forward computing. It is guaranteed that the blobs in
inputs
are already computed by the lower layers. The output blobs (if any) should be written into the blobs in theblobs
field of the layer state.
-
backward
(backend, layer_state, inputs, diffs)¶ Do backward computing. It is guaranteed that the back-propagated gradients with respect to all the output blobs for this layer are already computed and written into the blobs in the
blobs_diff
field of the layer state. This function should compute the gradients with respect to its parameters (if any). It is also responsible to compute the back-propagated gradients and write them into the blobs indiffs
. If a blob indiffs
is aNullBlob
, computation for the back-propagated gradients for that blob can be omitted.The contents in the blobs in
inputs
are the same as in the last call offorward
, and can be used if necessary.If a layer does not do backward propagation (e.g. a data layer), an empty
backward
function still has to be defined explicitly.
-
shutdown
(backend, layer_state)¶ Release all the resources allocated in
setup_layer
.
Layer Parameters¶
If a layer has train-able parameters, it should define a parameters
field in
the LayerState
object, containing a list of Parameter
objects. It
should also define the has_param
characterization. The only computation
the layer needs to do, is to compute the gradients with respect to each
parameter and write them into the gradient
field of each Parameter
object.
Mocha will handle the updating of parameters during training automatically. Other parameter-related issues like initialization, regularization and norm constraints will also be handled automatically.
Layer Activation Function¶
When it makes sense for a layer to have an activation function, it can add
a neuron
property to the Layer
object and define the has_neuron
characterization. Everything else will be handled automatically.