Historical Notes

Fukushima’s Neocognitron

Hierarchical feature extraction
Local connectivity field
Hand crafted weight (before BP was developed)

Fukushima

LeNet

Convolution
Subsampling = pooling

LeNet

Modern CNNs

General Architecture and Design Guide

ConvBlock(module): convolution, activation, batch normalization, pooling
Classification: Linear + Activation + Softmax
Regression: Linear
Accuracy: large representation capacity $\to$ overfitting

AlexNet 2012

GPUs instead of CPUs
Emsemble modelling
ReLU: reduce the chance of gradient vanishing
Dropout: avoid overfitting
Image Augmentation
AlexNet

Dropout

Training
Randomly set some neurons to 0 with probability p
Different layers may have different dropout rate
Inference
Do Nothing ! Stay deterministic during prediction

Dropout

Image Augmentation

Increase the training data to cover more types of (test) data.
image_augment

Training
Random augmentation operations
Test
No random operations, otherwise model may generate different predictions for the same input run twice.
Make predictions by aggregating the results from all augmented images.

VGG

Unified kernel size
Computational cost: 2 3x3 kernels < 1 5x5 Kernel
Deeper structure(16, 19 convolution layers)
More parameters(138M).
More non-linear tranformation; Larger capacity.
Consecutive convolution.
VGG

Inception V1

Inception Block : $1\times 1$ convolution (reduce channel number); ensemble multiple paths with different kernel sizes.
Achieve fusion of feature maps in a single level using concatenation of output from kernels with various size.

Average pooling: reduce model size & time complexity.

InceptionNet
concat