Peer Session Badge

My assignment: NMT model 전처리

Basics of RNN

  • Basic structure

    • $h_t = f_W(h_{t-1}, x_t)$
      • $h_{t-1}$: old hidden-state vector
      • $x_t$: input vector at some time step
      • $h_t$: new hidden-state vector
      • $f_w$: RNN function with parameters $W$
      • $y_t$: output vector at time step $t$
    • recurrence fomula를 매 step 마다 적용함으로써, sequence of vectors를 처리할 수 있음
    • $h_t = \text{tanh}(W_{xh}x_t) + W_{hh}h_{t-1}$
    • $y_t = W_{hy}h_t$
  • Types of RNN

    • one-to-one
      • standard nerual net
    • one-to-many
      • image captioning
    • many-to-one
      • sentiment classification
    • many-to-many
      • machine translation
      • video classification on frame level
  • Character-level language model

    • vocabulary: h, e, l, o
    • many to many
    • training sequence: “hello”
    • “h”: $[1, 0, 0, 0]$
    • “l”: $[0,0,1,0]$
    • 참고 자료
    • c 언어 code 학습 task의 경우 성능 괜찮음
    • Backpropagation through time (BPTT)
      1. forward throught entire sequence to compute loss, and then backward through entire sequence to compute gradient
      2. sequence를 split해 제한된 길이의 sequence 별로 학습하는 방법
  • Vanilla RNN의 한계

    • Multiplying the same matrix at each time step during backprop causes gradient vanishing or exploding
  • LSTM, GRU