Peer Session Badge

Multi-model: Captioning and speaking

  • Multi-modal learning

    • Modality
      • Vision, Audio, Taste, Texture, Odor, Social, Depth, Force, Text, …
    • Challenge
      • 다른 representations b\w modalities
      • Feature space가 서로 unbalance
      • Model이 specific modality에 biased 되기 쉽다 (예측하기 까다로운 데이터를 덜 신경쓰는 방향으로 학습)
  • Task (1) - Visual data & Text

    • Text embedding
      • word2vec (skip-gram model)
    • Joint embedding
      • text data $\rightarrow$ word counts $\rightarrow$ replicated softmax $\rightarrow$ joint embedding
      • image data $\rightarrow$ real-valued feature $\rightarrow$ gaussian model $\rightarrow$ joint embedding
      • applications
        • imgae tagging
        • image & food recipe retrieval
    • Metric learning
      • joint visual-semantic embedding space