Boostcamp AI Tech (Day 032)
My assignment: Semantic Segmentation
CNN architectures (2)
-
Deeper layers
- Larger receptive fields
- More capacity and non-linearity
- Gradient vanishing/exploding
- Computationally complex
- Degradation problem
-
GoogLeNet (CVPR 2015)
- Inception module
- 1x1 Conv
- 1x1 Conv $\rightarrow$ 3x3 Conv
- 1x1 Conv $\rightarrow$ 5x5 Conv
- 3x3 pool $\rightarrow$ 1x1 Conv
- Concatenate all filter (channel axis)
- 1x1 Conv (bottleneck)
- reduce the number of channels
- Architecture
- Stem network (vanilla CNN)
- Stacked inception modules
- Auxiliary classifiers
- injecting additional gradients into lower layers
- used only during training
- Inception module
-
ResNet (CVPR 2016)
- 152 layers
- Degradation problem (optimazation problem)
- Solve with Residual block
- Residual block
- Shortcut connection (skip connection) ($+ x$)
- Architecture
- He initialization (not too high)
- 7x7 Conv at the beginning
- Stack residual blocks
- 3x3 Conv + Batch Norm
- Block 마다 한 번씩 stride 2
- doubling no. of filters
- spatially down-sampling
- Single FC layer
-
Beyond ResNet
- DenseNet
- 훨씬 이전의 layer의 정보도 skip connection
- concate along channel axis
- SENet (Squeeze and Excitation)
- Recalibrate channel-wise reponses by modeling interdepenencies b/w channels
- SE
- Squeeze: capturing distributions of channel-wise reponses by global average pooling
- Excitation: gating channeling by channel-wise attention weights obtained by a FC layer
- EfficientNet
- Compound scaling
- width scaling
- more channels (GoogLeNet, DenseNet)
- depth scaling
- deeper layers (ResNet)
- resolution scaling
- higher input resolution
- width scaling
- Compound scaling
- Deformable convolution
- standard CNN + grid sampling with 2D offsets
- standard CNN + grid sampling with 2D offsets
- DenseNet
Semantic segmentation
-
Definition
- Classify each pixel of an image into a semantic category
- Applications
- Medical image
- Autonomous driving
- Computational photography
-
FCN (CVPR 2015)
- Fully Convolutional Networks
- first end-to-end architecture for semantic segmentation
- Input/Ouput
- Input: arbitrary size
- Output: segmentation map (corresponding size to Input)
- Difference from Fully connected
- output has spatial coordinates (heat map)
- Upsampling
- 일단 작게 만들어 receptive field 키운 뒤 upsampling을 통해 input과 같은 resolution으로 만들어줌
- Transposed convolution만 수행시 overlap issue 발생
- Nearest-neighbor, Bilinear와 같은 interpolation을 추가적으로 수행
- Add skip connections for enlarging score map
- Layers 차이
- lower
- fine, low-level, detail, local
- higher
- coarse, semantic, holistic, global
- lower
- Integrates activations from lower layers into prediction
- Preserves higher spatial resolution
- Captures lower-level semantics at the same time
- Layers 차이
-
U-Net (MICCAI 2015)
- 특징
- Share same FCN property
- Predict a dense map by concat feature maps from contracting path
- similar to skip connections in FCN
- Yield more precise segmentations
- Architecture
- Contracting path
- Expanding path
- Skip connection
- Concatenation of feature maps provides localized information
- 특징
-
DeepLab (ICLR 2015)
- 참고 자료
- CRFs (Conditional Random Fields)
- 1st row: score map (before softmax)
- 2nd row: belief map (after softmax)
- Dilated(확장된) convolution
- Atrous convolution
- Insert spaces b\w kernel element (dilation factor)
- Exponentially expand receptive field (same parameter number)
- Depthwise separable convolution
- Depthwise convolution
- Pointwise convolution