Individual assignment
Due Wednesday, October 11th 2023
Introduction
In this assignment you will explore components of modern convolutional networks by implementing
them in either PyTorch or TensorFlow. This will solidify important concepts discussed in class
and develop skills that will help you in your class projects.
Concepts covered in this assignment include skip connections, depthwise convolution, 1x1 convolution, and implementation of custom layers.
Background
EfficientNetV2 is a family of convolutional networks with exceptional parameter efficiency and that
are fast to train [2]. These networks were developed by Google using hyperparameter tuning to
search for the most performant architectures, which produced this family of models that outperformed vision transformers of that time. The EfficientNetV2 networks use of several components
including squeeze-and-excite networks (SE) and mobile inverted bottleneck layers (MBConv). In this assignment you will 1. Implement the MBConv and fused-MBConv elements used
by EfficientNetV2, 2. Create blocks composed of MBConv and fused-MBConv elements, and 3.
Train a simple network using these elements for a basic cell classification task.
I am providing you with a Jupyter notebook that illustrates how to load the data and train
a classification model using a pretrained network from Keras. This notebook also contains the
implementation of the squeeze-and-excite network as a custom layer for reference.
Step 1: Implement MBConv and fused-MBConv elements
Squeeze-and-excite network (SE): The SE essentially implements a “channel attention” mechanism that selectively re-weights channels to highlight those that are informative [1]. Refer to the
SE network illustrated in Figure 1 and compare to the implementation in the provided notebook.
This implementation is parameterized with the channel compression parameter s (default value
0.25).
Mobile inverted bottleneck (MBConv): The MBConv starts with an expansion of the input
channels by a 1x1 convolution. A depthwise convolution is then applied, followed by an SE network.
A 1x1 convolution compresses the channels again, followed by a dropout. Finally, a skip connection
combines this output with the MBConv inputs. Implement the MBConv network illustrated in
Figure 2. Parameterize your implementation with the expansion parameter e (default 4), the
depthwise convolution stride (default 2), the SE compression s, and the dropout rate d (default
0.1).
Fused-MBConv: The fused-MBConv replaces the first two layers of the MBConv with a stan-
dard convolution. Implement the fused-MBConv network illustrated in Figure 3 with the same
parameters as the MBConv.
Suggestion: Implementing the MBConv and fused-MBConv as custom layers is the simplest approach. This process is similar in TensorFlow and PyTorch. Implementing these as simple functions presents challenges in handling the batch normalization which behaves differently during training and inference. This issue is handled correctly by the layer class of your framework which receives
information on whether the layer is being called in training or inference mode. This behavior can
be inherited by your custom layer.
Figure 1: Squeeze-and-excite (SE) network. The SE network generates “channel attentions” that
re-weight the channel dimension of the input. The network has a single parameter s ∈ (0, 1] that
controls the compression on this branch (default value 0.25).
Figure 2: Mobile inverted bottleneck (MBConv) network. The MBConv features convolutions with
batch normalization (BN), including a 3x3 depthwise convolution (dwconv), an SE network, and a
dropout layer. The parameters for the MBConv are the expansion factor e, the stride of the depthwise
convolution, the SE compression parameter s, and the dropout rate d.
Figure 3: Fused-MBConv network. The fused-MBConv replaces the separable depthwise convolution
of the MBConv with a full convolution, but is otherwise the same. This increases the number of
trainable parameters but can execute faster (we will discuss this in class).
Step 2: Create MBConv and fused-MBConv blocks
EfficientNetV2 arranges MBConv and fused-MBConv elements into blocks, which are sequences
of repeated MBConv or fused-MBConv elements. Using your elements from step 1, implement
functions to create a blocks of repeated MBConv or fused-MBConv elements. This function should
accept the number of repeats, the number of input channels, and the element parameters.
There are two important details when creating a block (see Figure 4): 1. The stride parameter is
only applied to the first element in the block (otherwise the H and W dimensions will shrink to
nothing); 2. The skip connection is removed from the first element in the block, which allows the
channel dimension to be altered by each block.
Figure 4: A Fused-MBConv block. The block combines multiple fused-MBConv elements stacked
sequentially. The skip connection in the first element of the block is removed to allow the number
of channels to change between blocks. The stride is only applied to the first element of the block to
avoid shrinking the H and W dimensions too much.
Step 3: Train a simple cell classification network
The data provided contains 90x90 images of cell nuclei split into training and validation sets, and
given one of three labels (tumor, stromal, TIL).
Implement the network depicted in Figure 5 using your blocks from step 2. Train the network
using the training and validation split provided in the data archive, and plot the training and
validation loss over 10 epochs. Display the model contents using model.summary()(TensorFlow)
or the pytorch-summary package.
Use same padding mode for all convolutions.
Figure 5: A basic network.
References