Data Science at Home
More powerful deep learning with transformers (Ep. 84) (Rebroadcast)
Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture. Such architecture is built on top of another important concept already known to the community: self-attention. In this episode I explain what these mechanisms are, how they work and why they are so powerful.
Don't forget to subscribe to our Newsletter or join the discussion on our Discord server
References
- Attention is all you need https://arxiv.org/abs/1706.03762
- The illustrated transformer https://jalammar.github.io/illustrated-transformer
- Self-attention for generative models http://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture14-transformers.pdf