Boostcamp AI Tech (Day 032)

My assignment: Semantic Segmentation

CNN architectures (2)

Deeper layers
- Larger receptive fields
- More capacity and non-linearity
- Gradient vanishing/exploding
- Computationally complex
- Degradation problem
GoogLeNet (CVPR 2015)
- Inception module
  - 1x1 Conv
  - 1x1 Conv $\rightarrow$ 3x3 Conv
  - 1x1 Conv $\rightarrow$ 5x5 Conv
  - 3x3 pool $\rightarrow$ 1x1 Conv
  - Concatenate all filter (channel axis)
- 1x1 Conv (bottleneck)
  - reduce the number of channels
- Architecture
  - Stem network (vanilla CNN)
  - Stacked inception modules
  - Auxiliary classifiers
    - injecting additional gradients into lower layers
    - used only during training
ResNet (CVPR 2016)
- 152 layers
- Degradation problem (optimazation problem)
  - Solve with Residual block
- Residual block
  - Shortcut connection (skip connection) ($+ x$)
- Architecture
  - He initialization (not too high)
  - 7x7 Conv at the beginning
  - Stack residual blocks
    - 3x3 Conv + Batch Norm
  - Block 마다 한 번씩 stride 2
    - doubling no. of filters
    - spatially down-sampling
  - Single FC layer
Beyond ResNet
- DenseNet
  - 훨씬 이전의 layer의 정보도 skip connection
  - concate along channel axis
- SENet (Squeeze and Excitation)
  - Recalibrate channel-wise reponses by modeling interdepenencies b/w channels
  - SE
    - Squeeze: capturing distributions of channel-wise reponses by global average pooling
    - Excitation: gating channeling by channel-wise attention weights obtained by a FC layer
- EfficientNet
  - Compound scaling
    - width scaling
      - more channels (GoogLeNet, DenseNet)
    - depth scaling
      - deeper layers (ResNet)
    - resolution scaling
      - higher input resolution
- Deformable convolution
  - standard CNN + grid sampling with 2D offsets

Semantic segmentation

Definition
- Classify each pixel of an image into a semantic category
- Applications
  - Medical image
  - Autonomous driving
  - Computational photography
FCN (CVPR 2015)
- Fully Convolutional Networks
- first end-to-end architecture for semantic segmentation
- Input/Ouput
  - Input: arbitrary size
  - Output: segmentation map (corresponding size to Input)
- Difference from Fully connected
  - output has spatial coordinates (heat map)
- Upsampling
  - 일단 작게 만들어 receptive field 키운 뒤 upsampling을 통해 input과 같은 resolution으로 만들어줌
  - Transposed convolution만 수행시 overlap issue 발생
    - Nearest-neighbor, Bilinear와 같은 interpolation을 추가적으로 수행
- Add skip connections for enlarging score map
  - Layers 차이
    - lower
      - fine, low-level, detail, local
    - higher
      - coarse, semantic, holistic, global
  - Integrates activations from lower layers into prediction
  - Preserves higher spatial resolution
  - Captures lower-level semantics at the same time
U-Net (MICCAI 2015)
- 특징
  - Share same FCN property
  - Predict a dense map by concat feature maps from contracting path
    - similar to skip connections in FCN
  - Yield more precise segmentations
- Architecture
  - Contracting path
  - Expanding path
- Skip connection
  - Concatenation of feature maps provides localized information
DeepLab (ICLR 2015)
- 참고 자료
- CRFs (Conditional Random Fields)
  - 1st row: score map (before softmax)
  - 2nd row: belief map (after softmax)
- Dilated(확장된) convolution
  - Atrous convolution
  - Insert spaces b\w kernel element (dilation factor)
  - Exponentially expand receptive field (same parameter number)
- Depthwise separable convolution
  1. Depthwise convolution
  2. Pointwise convolution

CNN architectures (2)

Deeper layers

GoogLeNet (CVPR 2015)

ResNet (CVPR 2016)

Beyond ResNet

Semantic segmentation

Definition

FCN (CVPR 2015)

U-Net (MICCAI 2015)

DeepLab (ICLR 2015)