LLM Architectures & Transformers
Lecture Notes
## The Transformer Revolution Introduced by Google in the 2017 paper "Attention Is All You Need", the Transformer architecture fundamentally changed Natural Language Processing. It discarded Recurrent Neural Networks (RNNs) in favor of the **Self-Attention Mechanism**. ### Self-Attention Self-attention allows the model to look at other words in the input sequence to gain a better understanding of a specific word's context. For example, in the sentence: > "The animal didn't cross the street because **it** was too tired." The self-attention mechanism assigns a high weight between the word "it" and "animal". ### Encoder vs Decoder - **Encoder-Only (BERT)**: Great for classification and understanding tasks. - **Decoder-Only (GPT series)**: Auto-regressive models optimized for generating text one token at a time. - **Encoder-Decoder (T5)**: Good for translation and summarization.