Tokens — What It Is and Why It Matters

Tokens are the atomic units that LLMs process. Before your text reaches the model, a tokenizer splits it into tokens — these are not exactly words, but sub-word pieces. The tokenization depends on the model's vocabulary.

**How tokenization works**

Common tokenizers (like GPT's cl100k_base or Google's SentencePiece) split text using byte-pair encoding (BPE). Frequent words become single tokens; rare words are split into sub-pieces. "Tokenization" might become ["Token", "ization"] or ["Token", "iz", "ation"] depending on the vocabulary.

**Rough rules of thumb**

- 1 token ≈ 4 characters of English text - 1 token ≈ 0.75 English words - 1,000 tokens ≈ 750 words ≈ 1.5 pages of prose - Code and non-English text typically use more tokens per word than English

**Why tokens matter for costs**

API pricing is almost always per 1,000 tokens (or per million). Input tokens and output tokens may be priced differently — output is often more expensive. A production system processing millions of documents at 2K tokens each accumulates significant cost quickly.

**Context window is measured in tokens**

A 128K context window holds 128,000 tokens total — input plus output. If your system prompt is 1,000 tokens and you paste a 120K-token document, you have only 7,000 tokens left for output.

**Pitfalls**

Developers underestimate token counts and hit context limits unexpectedly. Prompts with large amounts of JSON, code, or structured data consume tokens faster than prose. Always count tokens programmatically (tiktoken for OpenAI, similar tools for others) before assuming a document will fit.