Boostcamp AI Tech (Day 017)

Basics of RNN

Basic structure
- $h_t = f_W(h_{t-1}, x_t)$
  - $h_{t-1}$: old hidden-state vector
  - $x_t$: input vector at some time step
  - $h_t$: new hidden-state vector
  - $f_w$: RNN function with parameters $W$
  - $y_t$: output vector at time step $t$
- recurrence fomula를 매 step 마다 적용함으로써, sequence of vectors를 처리할 수 있음
- $h_t = \text{tanh}(W_{xh}x_t) + W_{hh}h_{t-1}$
- $y_t = W_{hy}h_t$
Types of RNN
- one-to-one
  - standard nerual net
- one-to-many
  - image captioning
- many-to-one
  - sentiment classification
- many-to-many
  - machine translation
  - video classification on frame level
Character-level language model
- vocabulary: h, e, l, o
- many to many
- training sequence: “hello”
- “h”: $[1, 0, 0, 0]$
- “l”: $[0,0,1,0]$
- 참고 자료
- c 언어 code 학습 task의 경우 성능 괜찮음
- Backpropagation through time (BPTT)
  1. forward throught entire sequence to compute loss, and then backward through entire sequence to compute gradient
  2. sequence를 split해 제한된 길이의 sequence 별로 학습하는 방법
Vanilla RNN의 한계
- Multiplying the same matrix at each time step during backprop causes gradient vanishing or exploding
LSTM, GRU
- Day14 리포트 참고
- 참고 자료