Parameter
Definition
A trainable weight in an AI model. Larger models have more parameters (7B, 70B, 400B).
Why It Matters
A model's parameter count is the rough proxy for both capability and cost. A 7B-parameter model fits on a single consumer GPU; a 70B needs 4–8 GPUs; a 400B model needs a data-centre rack. The math is roughly 2 bytes per parameter in 16-bit precision, half that quantised to 8-bit.
Key Points
- Storage rule of thumb: 1B parameters ≈ 2 GB in FP16, 1 GB in INT8, 0.5 GB in INT4.
- MoE (Mixture-of-Experts) models like Mixtral 8×7B have 47B total parameters but only ~12B active per forward pass, faster than a dense 47B at similar quality.
- Chinchilla scaling laws (2022): optimal training uses ~20× as many tokens as parameters, a 7B model should train on 140B tokens.
- Task fit rules of thumb: 7B = competent assistant, 14B–32B = strong reasoning, 70B+ = near-frontier, 400B+ = frontier-class.
- At the same model family and training recipe, doubling parameter count roughly provides a consistent, predictable quality improvement, the basis of scaling laws.
Example
Llama 3.1 ships in 8B, 70B and 405B variants, same architecture, same training recipe, just more parameters. The 8B answers a one-line question in 200 ms; the 405B takes 2–3 seconds but reasons through multi-step problems the 8B fumbles.
Common Misconception
More parameters do not guarantee better performance on your specific task. A 7B model fine-tuned on domain-specific data frequently outperforms a 70B general model on that same domain. Parameter count sets a capability ceiling, not a capability guarantee.
Related Terms
- LLM (Large Language Model)A neural network trained on massive text datasets that can generate, understand and manipulate human language. Examples: GPT-4, Qwen, Claude.
- Fine-TuningTraining a pre-trained AI model on specialized data to improve performance on specific tasks.
- QuantizationA technique to compress AI models (e.g., from 16-bit to 4-bit) so they use less memory while maintaining quality.
Parameter on Rewind.ai
The model picker shows parameter counts. For most chat workloads, a quantised 14B–32B beats a non-quantised 7B; the 70B+ tier matters when you genuinely need reasoning depth.
Explore the ToolsQuick Facts
| Term | Parameter |
| Related | LLM (Large Language Model), Fine-Tuning, Quantization |