🤖 100 Days of Generative AI - Day 2 - LLM Tokens vs Parameters ? 🤖

#llm #ai #chatgpt #gpt3

We often hear/read this terminology: Llama 3.1 is trained with 8 billion, 70 billion, and 405 billion parameters and 15 trillion tokens, whereas gpt3 is trained with 175 billion parameters and with hundreds of billions of tokens(there is no official figure for gpt4 but it trained with much more parameters and token). But what are parameters and tokens? Here is my attempt to explain in simple terms

✅ Parameters: Consider parameters as the settings or weights a language model learns during training. These settings determine how the model processes and generates text. More parameters generally mean a more powerful model that can understand and generate more complex language patterns.

✅ Tokens: Tokens are chunks of text that the model processes. Depending on the model's design, a token can be as short as one character or as long as one word or more. During training, the model reads and learns from many tokens. More tokens mean the model has been exposed to more text and has a richer language understanding.

✅ Why do we need tokens? Can't we feed directly to the model?
One of the main reasons we use tokens instead of feeding direct text to a deep learning model is that deep learning models, especially those based on neural networks, cannot process raw text directly. They require numerical inputs. Text in its raw form (strings) cannot be processed directly by neural networks. Tokenization converts text into numerical representations (usually integers or vectors) that the model can process. After tokenization, tokens are typically converted into embeddings before being processed by a deep learning model.

DEV Community

🤖 100 Days of Generative AI - Day 2 - LLM Tokens vs Parameters ? 🤖

Top comments (0)

Read next

AI Generation Workflow with Azure Speech Service and Dify: Full Automation of Podcast and Image Creation

Assemble AI Challenge

Introduction to Hadoop:)

VoiceScribe: Elevating Transcriptions with AssemblyAI's Universal-2 Model