Peer Session Badge

My assignment: Semantic Segmentation

CNN architectures (2)

  • Deeper layers

    • Larger receptive fields
    • More capacity and non-linearity
    • Gradient vanishing/exploding
    • Computationally complex
    • Degradation problem
  • GoogLeNet (CVPR 2015)

    • Inception module
      • 1x1 Conv
      • 1x1 Conv $\rightarrow$ 3x3 Conv
      • 1x1 Conv $\rightarrow$ 5x5 Conv
      • 3x3 pool $\rightarrow$ 1x1 Conv
      • Concatenate all filter (channel axis)
    • 1x1 Conv (bottleneck)
      • reduce the number of channels
    • Architecture
      • Stem network (vanilla CNN)
      • Stacked inception modules
      • Auxiliary classifiers
        • injecting additional gradients into lower layers
        • used only during training
  • ResNet (CVPR 2016)

    • 152 layers
    • Degradation problem (optimazation problem)
      • Solve with Residual block
    • Residual block
      • Shortcut connection (skip connection) ($+ x$)
    • Architecture
      • He initialization (not too high)
      • 7x7 Conv at the beginning
      • Stack residual blocks
        • 3x3 Conv + Batch Norm
      • Block 마다 한 번씩 stride 2
        • doubling no. of filters
        • spatially down-sampling
      • Single FC layer
  • Beyond ResNet

    • DenseNet
      • 훨씬 이전의 layer의 정보도 skip connection
      • concate along channel axis
    • SENet (Squeeze and Excitation)
      • Recalibrate channel-wise reponses by modeling interdepenencies b/w channels
      • SE
        • Squeeze: capturing distributions of channel-wise reponses by global average pooling
        • Excitation: gating channeling by channel-wise attention weights obtained by a FC layer
    • EfficientNet
      • Compound scaling
        • width scaling
          • more channels (GoogLeNet, DenseNet)
        • depth scaling
          • deeper layers (ResNet)
        • resolution scaling
          • higher input resolution
    • Deformable convolution
      • standard CNN + grid sampling with 2D offsets

Semantic segmentation

  • Definition

    • Classify each pixel of an image into a semantic category
    • Applications
      • Medical image
      • Autonomous driving
      • Computational photography
  • FCN (CVPR 2015)

    • Fully Convolutional Networks
    • first end-to-end architecture for semantic segmentation
    • Input/Ouput
      • Input: arbitrary size
      • Output: segmentation map (corresponding size to Input)
    • Difference from Fully connected
      • output has spatial coordinates (heat map)
    • Upsampling
      • 일단 작게 만들어 receptive field 키운 뒤 upsampling을 통해 input과 같은 resolution으로 만들어줌
      • Transposed convolution만 수행시 overlap issue 발생
        • Nearest-neighbor, Bilinear와 같은 interpolation을 추가적으로 수행
    • Add skip connections for enlarging score map
      • Layers 차이
        • lower
          • fine, low-level, detail, local
        • higher
          • coarse, semantic, holistic, global
      • Integrates activations from lower layers into prediction
      • Preserves higher spatial resolution
      • Captures lower-level semantics at the same time
  • U-Net (MICCAI 2015)

    • 특징
      • Share same FCN property
      • Predict a dense map by concat feature maps from contracting path
        • similar to skip connections in FCN
      • Yield more precise segmentations
    • Architecture
      • Contracting path
      • Expanding path
    • Skip connection
      • Concatenation of feature maps provides localized information
  • DeepLab (ICLR 2015)

    • 참고 자료
    • CRFs (Conditional Random Fields)
      • 1st row: score map (before softmax)
      • 2nd row: belief map (after softmax)
    • Dilated(확장된) convolution
      • Atrous convolution
      • Insert spaces b\w kernel element (dilation factor)
      • Exponentially expand receptive field (same parameter number)
    • Depthwise separable convolution
      1. Depthwise convolution
      2. Pointwise convolution