LLM (Large Language Model)
Definition
A neural network trained on massive text datasets that can generate, understand and manipulate human language. Examples: GPT-4, Qwen, Claude.
Why It Matters
LLMs are the generic substrate behind almost every text-based AI feature shipped since 2022, chat, summarisation, translation, code generation, classification, extraction. Whether the product calls itself an "AI assistant," a "copilot," or a "search engine," the underlying model is usually one.
Key Points
- Pretraining objective: next-token prediction on trillions of tokens. The loss is cross-entropy; accuracy is not the training metric.
- Instruction tuning (supervised fine-tuning on instruction-following examples) converts a raw language model into a usable assistant, typically 10K–1M curated examples.
- Major model families: GPT / o-series (OpenAI), Claude (Anthropic), Gemini (Google), Llama (Meta), Qwen (Alibaba), Mistral (Mistral AI), DeepSeek (DeepSeek AI).
- Typical transformer depth in large models: GPT-3 96 layers, Llama 3 70B 80 layers, depth correlates with compositional reasoning ability.
- Chinchilla scaling laws (2022): optimal training uses ~20× as many tokens as parameters. A 7B model should train on ~140B tokens; GPT-4 likely trained on 10T+.
Example
GPT-4 has ~1.7 trillion parameters; Qwen 2.5 7B has 7 billion; the smallest open Llama has 1 billion. Bigger usually means smarter but slower and more expensive to run, the trade-off you face when picking a model for a task.
Common Misconception
LLMs do not look up facts from a database. They encode statistical correlations from training text. A model 'knowing' a fact means it reproduces patterns consistent with that fact, not that it has a queryable knowledge base. This is why hallucination happens, the model produces what is statistically plausible, not what is verifiably true.
Related Terms
- TransformerThe neural network architecture behind modern AI models. Introduced in the 2017 paper "Attention Is All You Need."
- ParameterA trainable weight in an AI model. Larger models have more parameters (7B, 70B, 400B).
- Context WindowThe maximum amount of text an AI model can process at once, measured in tokens. GPT-4o has 128K tokens.
LLM (Large Language Model) on Rewind.ai
Rewind.ai's chat picker exposes 400+ LLMs, self-hosted open-source (Qwen, Mistral, DeepSeek, Llama) for free daily-pool calls, premium (Claude, GPT-4o, Gemini) for token-priced calls.
Explore the ToolsQuick Facts
| Term | LLM (Large Language Model) |
| Related | Transformer, Parameter, Context Window |