Boostcamp AI Tech (Day 034)

My assignment: Hourglass network

Segmentation (2)

Instance segmentation
- Mask R-CNN (CVPR 2017)
  - RoI Align (참고 자료)
  - Faster R-CNN + Mask branch
    - Mask: binary classification for each class (BG / FG)
- YOLACT (You Only Look at coefficientTs)
  - Real-time instance segmentation network
- YolactEdge
Panoptic segmentation
- Difference
  - Semantic segmentation
    - stuff + things
  - Panoptic segmentation
    - stuff + instances of things
- UPSNet (CVPR 2019)
  - semantic & instance head $\rightarrow$ panoptic head $\rightarrow$ panoptic logits
- VPSNet (for video)
Landmark localization
- Key point estimation
  - predicting the coordinates of keypoints
- Gaussian heatmap
- Hourglass network (ECCV 2016)
- DensePose (CVPR 2018)
  - Faster R-CNN + 3D suface regression branch
- RetinalFace (CVPR 2020)
  - FPN + Multi-task branches
Detecting objects as keypoints
- CornerNet (ECCV 2018)
- CenterNet (ICCV 2019)

Conditional generative model

생성모델 vs Conditional 생성모델
- 생성모델
  - 확률분포 모델링 $\rightarrow$ 새 이미지 샘플링
  - 조작 불가
  - $P(\text{random bag images})$
- Conditional 생성모델
  - 확률분포 모델링 $\rightarrow$ 조건 하에 새 이미지 샘플링
  - $P(X = \text{random bag images} \vert \text{sketch})$
  - generate random sample under the given condition
- 사례
  - low quality audio $\rightarrow$ high quality audio
  - machine translation
    - $P(\text{English sentence} \vert \text{Chinese sentence})$
  - article generation
Conditional GAN
- GAN (NIPS 2014)
- Generator 모델에 z(optional)와 조건 c를 넣음
- Image-to-Image translation
  - Style transfer
  - Super resolution
  - Colorization
- Super resolution
  - MAE/MSE loss produce a safe average-looking image (real data의 manifold에서 멀리 떨어진)
  - GAN loss: 평균적인 blur 데이터는 discriminator가 real data가 아니라고 판별하기 때문에 방지됨
Image translation
- Pix2Pix
  - translating image to corresponding image in another domain
  - loss
    - cGAN + L1
    - $G^* = \text{argmin}G \text{max}_D \mathcal{L}{cGAN}(G,D) + \lambda \mathcal{L}_{L1}(G)$
- CycleGAN
  - pariwise dataset 없이 학습 가능
  - loss
    - GAN + Cycle-consistency
    - $L_{GAN}(X \rightarrow Y) + L_{GAN}(Y \rightarrow X) + L_{cycle}(G,F)$
  - cycle-consistency: 원본 복원 해보고 비슷한 정도
    - $x \rightarrow \hat{Y} \rightarrow \hat{x}$
    - $y \rightarrow \hat{X} \rightarrow \hat{y}$
- Perceptual loss
  - GAN loss (Adversarial loss)
    - difficult to train (Generator & Discriminator)
    - pre-trained net 필요 없음
    - 다양한 application 가능
  - Perceptual loss
    - simple to train (simple foward & backward computation)
    - loss measure를 위해 pre-trained net 필요
  - 참고 자료
- GAN application
  - Deepfake
  - Face de-identification
- Video translation

조교님께 내 질문

Q

Faster R-CNN 다음에 배운 Mask R-CNN에서 RoI Align이 Mask layer 보다 훨씬 중요한 역할을 하는 것 같습니다. (Input을 받아 같은 class의 instance를 분리하는 역할을 하기 때문)
그러면 Mask R-CNN의 contribution이 단지 “RoI Align + R-CNN”에서 오는 것이고, Mask layer를 추가적으로 활용한 것은 성능 향상에만 관련된 건가요?
A

물론 RoI pooling 대신 RoI Align을 사용했기 때문에 instance segmentation이라는 task를 가능케 했다는 것은 맞지만, Mask R-CNN의 contribution은 단순히 Mask layer를 추가함으로써 현재까지도 의미 있을 만한 성능을 보여준다는 점도 매우 큽니다.

Segmentation (2)

Instance segmentation

Panoptic segmentation

Landmark localization

Detecting objects as keypoints

Conditional generative model

생성모델 vs Conditional 생성모델

Conditional GAN

Image translation

조교님께 내 질문

Q

A