AI Fundamentals

What Is Tokenization in AI?

Tokenization breaks text into smaller units, called tokens, that AI models process.

Definition

Tokens can be words, word pieces, characters, or punctuation. Large language models read and generate tokens rather than whole documents as humans see them.

How it works

A tokenizer maps text into token IDs. The model processes those IDs inside a context window, then predicts new token IDs that are converted back into text.

Why it matters at work

Tokenization affects cost, latency, context limits, and output quality. Long documents, tables, and code can consume more tokens than teams expect.

Workplace example

A legal team trying to summarize a 120-page contract needs chunking or retrieval because the whole document may exceed the model's context window.

Frequently Asked Questions

Is one token the same as one word?

No. One token is often part of a word or a short word, and the exact split depends on the tokenizer. This is why token counts and word counts differ.

Ready to Level Up on AI?

Book a personalised demo for your team.