Temperature
Nhiệt độ (tham số lấy mẫu)
A sampling parameter (0–2) that controls the randomness of LLM output — lower values produce deterministic responses, higher values produce creative ones.
Temperature is a hyperparameter that controls how "adventurous" the model is when selecting the next token. At each step, the model produces a probability distribution over all possible next tokens. Temperature reshapes that distribution before sampling.
**The math (intuitive version)**
- *Temperature 0*: Always pick the most likely token. Output is deterministic and repetitive. - *Temperature 1.0*: Sample from the raw distribution. Balanced between coherence and variety. - *Temperature > 1.0*: Flatten the distribution — low-probability tokens become more likely. Output becomes more diverse but also more likely to be nonsensical.
**Practical settings**
| Use case | Recommended temperature | |---|---| | Code generation, data extraction | 0.0–0.2 | | Factual Q&A, summarization | 0.3–0.5 | | Conversational chat | 0.7–0.9 | | Creative writing, brainstorming | 1.0–1.3 |
**Temperature vs top-p**
Top-p (nucleus sampling) is a related parameter: only sample from the smallest set of tokens whose cumulative probability exceeds p. Most models let you tune both, but changing one at a time and leaving the other at default is usually the right approach.
**Pitfalls**
Temperature 0 is not "safe" — it can still hallucinate, it just hallucinates the same thing every time. High temperature does not make models more accurate, only more varied. For production data extraction or classification, keep temperature near 0 for consistency.