Temperature
Definition
A parameter that controls AI output randomness. Low temperature = more focused. High temperature = more creative.
Why It Matters
Temperature controls the trade-off between deterministic and creative output. Low temperature picks the highest-probability next token every time: reliable, consistent, often repetitive. High temperature samples from a wider distribution: varied, surprising, sometimes nonsensical. It's the single most useful generation parameter to know.
Key Points
- Temperature mathematically scales the logit vector before the softmax: values < 1 sharpen the distribution, values > 1 flatten it.
- Top-p (nucleus sampling): sample only from the smallest set of tokens whose cumulative probability exceeds p. It interacts with temperature, both are typically set together.
- Top-k: restrict sampling to the k highest-probability tokens. Temperature is still applied before the top-k cutoff.
- Practical presets: 0.0–0.2 for deterministic extraction / code; 0.3–0.5 for factual Q&A; 0.6–0.8 for general chat; 0.9–1.2 for creative writing / brainstorming.
- Repetition penalty (also called frequency penalty) is the complementary parameter, it reduces the probability of tokens already used, preventing looping.
Example
Temperature 0.0–0.2: code generation, factual Q&A, data extraction. 0.5–0.7: general chat, summarisation, balanced writing. 0.9–1.2: brainstorming, fiction, creative variations. Set to 0 and run the same prompt twice, same answer. At 1.0, two different answers.
Common Misconception
Setting temperature to 0 does not guarantee determinism across all hardware configurations. Floating-point arithmetic differences between GPU types and firmware versions can produce slightly different results at temperature 0. For strict reproducibility, also set a fixed random seed if the API supports it.
Related Terms
- LLM (Large Language Model)A neural network trained on massive text datasets that can generate, understand and manipulate human language. Examples: GPT-4, Qwen, Claude.
- InferenceThe process of running an AI model to generate a response. When you send a message to ChatGPT, the model performs inference.
- PromptThe input text you give to an AI model. Better prompts lead to better outputs.
Temperature on Rewind.ai
Most Rewind.ai tools expose temperature in the advanced panel. Defaults are tuned per task (0.2 for code, 0.7 for writing), override when the output feels too rigid or too unhinged.
Explore the ToolsQuick Facts
| Term | Temperature |
| Related | LLM (Large Language Model), Inference, Prompt |