by Wojtek Czarnecki
Designed for education purposes. Please do not distribute without permission.
Questions/Correspondence: [email protected]
- basic (vanilla RNN) implementation
- observing exploding/vanishing gradients
- intepretability by plotting and analysing activations of a network:
- identifying interpretable neurons
- identifying neurons-gates interactions
- identifying hidden state dynamics through time
- training an LSTM on character level langugage modelling task
- comparing training of an LSTM and RNN, playing with architectures
First three sections are almost independent, one can go switch between them without any code dependencies (apart from being unable to use vanilla RNN in section 4, if it was not implemented in 1.).
Cells that include "starting point" in their title require filling in some code gaps; all remaining ones are complete (but feel free to play with them if you want!)
Please pay attention to questions after each section. Finding out answers to these is crucial to make sure one understands various modes of RNN operation.
Language model exercises are based on Sonnet LSTM example.