Peer Session Badge

My assignment: VGGNet

Computer vision

  • “Inverse graphics”

    • Graphics
      • Rendering
      • 3D 모델 $\rightarrow$ 영상
    • Computer vision
      • Inverse rendering
      • 영상 정보 $\rightarrow$ 본질 (ex. 3D 모델)
  • Visual perception

    • Perception
      • (input, output) data
      • visual world - sensing device - interpreting device - interpretation
    • Class of visual perception
      • Color
      • Motion
      • 3D
      • Semantic-level
      • Social (emotion)
      • Visuomotor
      • Understanding human visual perception, etc
  • Computer vision

    • Machine learning
      • Input - feature extraction - classification - output
    • Deep learning
      • End-to-End
        • Input - feature extraction & classification - output

CNN

  • Image classification

    • Input x $-$classifier$\rightarrow$ output f(x)
      • x: image
      • f(x): category level
    • k-NN (Nearest Neighbors)
      • Search all data
      • Time complexity: O($\infty$)
      • Memory complexity: O($\infty$)
    • CNN
      • Locally connected NN
        • (cf. Fully connected NN)
        • Local feature learning
        • Parameter sharing
      • Backbone (base) of many CV tasks
        • Image-level classification
        • Classification + regression
        • Pixel-level classification
  • CNN architectures for Image classification

    • 참고 데일리노트
    • AlexNet
      • Conv - Pool - Conv - Pool - FC - FC
      • LRN (Local Response Normalization)
        • 더 이상 사용되지 않음 (Batch Norm에게 대체됨)
        • activation map에서 명암을 normalization
      • Receptive field
        • The region in the INPUT SPACE that a particular CNN feature is looking at
        • AlexNet uses 11 $\times$ 11 conv to get bigger receptive field (더 이상 사용되지 않음)
    • VGGNet (ICLR 2015)
      • 16, 19 layers
      • Simpler architecture
        • No LRM
        • 3 $\times$ 3 conv filters blocks
          • many small conv layers instead of a small number of large conv filters
          • keeping receptive field sizes large enough
          • deeper with more non-linearities
          • fewer parameters
        • 2 $\times$ 2 max pooling
      • Better performance & generalization

Annotation data efficient learning

  • Data augmenttation

    • Learning representation of data
      • Dataset is always biased
      • Training data is sparse samples of real data
    • Augmentation
      • Fill more space and close the gap (make a dataset denser)
    • Methods
      • crop
      • affine transformation (shear)
      • brightness
      • perspective
      • rotate
      • flip
      • CutMix
      • RandAugment
  • Leveraging pre-trained information

    • Needs
      • supervised learning requires a very large-scale dataset
      • annotating data is very expensive + quality not ensured
    • Motivation
      • similar datasets share common information
      • knowledge learned from one dataset can be applied to other datasets
    • Transfer learning
      • Approch 1
        • given pre-trained model
        • chop off FC layer and re-train new FC layer
      • Approch 2
        • given pre-trained model
        • chop off FC layer and re-train the whole model
        • low learning rate in Conv layers
        • high learning rate in FC layers
    • Knowledge distillation
      • Distillate knowledge of a trained model $\rightarrow$ another smaller model
      • Used for Model compression
      • Used for pseudo-labeling
      • Teacher-Studnent network
      • 참고 자료
  • Self-training

    • containing methods
      • Augmentation
      • Knowledge distillation (teacher-student)
      • Semi-supervised learning (peudo-label)
    • 순서

      T: Teacher model
      S: Student model

      1. 초기 T를 labeled data로 학습
      2. T를 이용해 unlabeled data를 pseudo-label
      3. S를 labeled + unlabeled data (+augmentation)로 학습
      4. S를 새로운 T로 설정, 더 큰 새 모델을 S로 설정
      5. 위 과정을 반복