Boostcamp AI Tech (Day 017)
My assignment: NMT model 전처리
Basics of RNN
-
Basic structure
- $h_t = f_W(h_{t-1}, x_t)$
- $h_{t-1}$: old hidden-state vector
- $x_t$: input vector at some time step
- $h_t$: new hidden-state vector
- $f_w$: RNN function with parameters $W$
- $y_t$: output vector at time step $t$
- recurrence fomula를 매 step 마다 적용함으로써, sequence of vectors를 처리할 수 있음
- $h_t = \text{tanh}(W_{xh}x_t) + W_{hh}h_{t-1}$
- $y_t = W_{hy}h_t$
- $h_t = f_W(h_{t-1}, x_t)$
-
Types of RNN
- one-to-one
- standard nerual net
- one-to-many
- image captioning
- many-to-one
- sentiment classification
- many-to-many
- machine translation
- video classification on frame level
- one-to-one
-
Character-level language model
- vocabulary: h, e, l, o
- many to many
- training sequence: “hello”
- “h”: $[1, 0, 0, 0]$
- “l”: $[0,0,1,0]$
- 참고 자료
- c 언어 code 학습 task의 경우 성능 괜찮음
- Backpropagation through time (BPTT)
- forward throught entire sequence to compute loss, and then backward through entire sequence to compute gradient
- sequence를 split해 제한된 길이의 sequence 별로 학습하는 방법
-
Vanilla RNN의 한계
- Multiplying the same matrix at each time step during backprop causes gradient vanishing or exploding
-
LSTM, GRU