DEV Community

Kevin Lubick
Kevin Lubick

Posted on

Building a ML Transformer in a Spreadsheet

Attention Is All You Need introduced Transformer models, which have been wildly effective in solving various Machine Learning problems. However, the 10 page paper is incredibly dense. There are so many details, it was difficult for me to gain high-level insights about how they work and why they are effective.

After several months of reading other blog posts about them, I understood them well enough to create a Transformer in a spreadsheet and made a video walking through it.

Diagram showing how Transformers have alternating "data converter" sublayers and "pattern finder" sublayers. Zooming in on the "data converter" sublayer shows how Query and Key matrices combine to form a self-attention matrix, which focuses on and remembers important parts of the value matrix

At a high level, Transformers are effective because they convert the data in a way that can make it easier to find patterns. They build on ideas from Convolutional Neural Networks and Recurrent Neural Networks (Focus and Memory), combining them in something called self-attention.

The video covers these ideas in more details and this is the link to the spreadsheet with the implemented Transformer. Skip to the "Appendix" sheet if you want to see a layer with all the bells and whistles, including multi-headed attention and residual connections.

Implementing the Transformer really helped me understand all the components. I'm especially proud of my metaphor of "scoring points" for explaining self-attention.

Other resources I found useful when researching transformers:

More of my work:

Top comments (0)