Token

AI & Generative Search

Also: AI Token · LLM Token · Language Model Token

What it isThe unit an AI reads and writes in

SizeRoughly three quarters of a word

Why it mattersDetermines cost and context limits

Watch forContext window limits in long prompts

Quick definition

A token is the smallest unit a large language model (LLM) processes when reading or generating text. Tokens are fragments of words, not whole words. Most English words are one or two tokens. Longer or rarer words can be more. Tokens set the cost and the context limit for every AI interaction.

How it varies across Australia

Token consumption varies sharply by use case. Short conversational prompts use far fewer tokens than long-form content generation, document summarisation or retrieval-augmented generation (RAG) pipelines. Australian businesses using AI for content at scale typically find their token costs grow faster than expected once they move from experimentation to production.

See AI adoption patterns across Australian industries →

The three token concepts that matter for marketers

Input tokens

Every word, instruction and piece of context you send to the model counts. Longer prompts and pasted documents push this up fast.

Output tokens

The text the model generates in response. Usually billed at a higher rate than input tokens on most platforms.

Context window

The total number of tokens a model can hold in memory at once, covering both input and output. When you hit the limit, earlier content gets dropped.

What it actually means

When you type a prompt into an AI tool, the model does not read your words the way you wrote them. It breaks the text into tokens first, where a token is roughly a syllable or a short word. 'Marketing' becomes two tokens. 'The' is one. 'Tokenisation' might be three or four. The model reads the token sequence, predicts the next token, and repeats until the output is complete.

This matters for three practical reasons. First, cost. Every major AI application programming interface (API), including OpenAI, Anthropic, and Google Gemini, bills by the token. Input tokens and output tokens are priced separately, and the output rate is usually higher. A prompt that includes a 10,000-word document costs far more than one that references only a summary.

Second, context limits. Every model has a context window, the maximum number of tokens it can process in one interaction. Once you hit it, the model cannot see earlier parts of the conversation. This is why chatbots occasionally forget what you told them at the start of a long session.

Third, quality. Cramming too much into a prompt degrades output because the model dilutes its attention across more tokens. Tighter prompts that give the model focused context tend to produce better results than long ones that dump everything in.

For marketers building AI workflows, prompt engineering, content generation and retrieval-augmented generation (RAG) systems, token awareness is the difference between a workflow that scales economically and one that becomes expensive before it becomes useful.

A token is not a word, not a character, not a sentence. It is the unit the model actually thinks in, and understanding it is the first step to using AI without wasting money.

How it shows up

Tokens show up in your AI platform billing dashboard as the first thing that explains an unexpected invoice. They show up in your prompt engineering when a long instruction starts producing worse results than a short one. They show up in your RAG or AI content pipeline when document chunking decisions affect how much context the model can see at once.

If you are building with a large language model (LLM) API directly, most platforms include a tokeniser tool that lets you paste text and see the exact token count before you send it. For marketers using wrapped tools like ChatGPT or Claude.ai, token costs are bundled into the subscription, but context window limits still apply and will cause degraded responses in very long threads.

The Australian context

Australian businesses using AI tools via US-based APIs face pricing in US dollars and occasionally higher latency due to routing through overseas infrastructure. Many enterprise teams using Microsoft Azure OpenAI or AWS Bedrock can access models with Australian region endpoints, which reduces latency without changing token economics.

For Australian content teams generating large volumes of copy, the token cost of using AI across a full content calendar can be meaningfully higher than teams realise at the pilot stage. Running a token estimate against a production content brief before committing to a workflow is worth the ten minutes it takes.

Where people get this wrong

Assuming tokens equal words.They do not. A rough rule is that one token is about three quarters of a word in English. Technical content, non-English text and code all tokenise differently, sometimes at much higher token-per-word ratios.

Ignoring context window limits in long workflows.When a conversation or document exceeds the model's context window, earlier content is silently dropped. The model does not warn you. Output quality degrades without explanation.

Treating longer prompts as always better.More tokens in a prompt does not mean better output. Models spread attention across all tokens. A focused 200-token prompt often outperforms a padded 2,000-token one for the same task.

Common questions

How many tokens is a typical marketing email?

A standard 300-word marketing email is roughly 400 to 500 tokens. Add a detailed system prompt and persona instruction on top of that and a generation request can easily reach 700 to 900 tokens before the model writes a single word of output.

Do I need to understand tokens to use AI tools?

Not for basic use. But if you are building workflows, integrating APIs, or using AI at any kind of production volume, token awareness is how you control costs, avoid context window surprises, and write better prompts. It is the foundational unit of how the technology works.

What happens when I hit the context window limit?

The model silently drops the oldest tokens from the conversation to make room for new ones. You will not get an error. You will get responses that appear to forget earlier context, repeat themselves, or give contradictory answers. The fix is to start a fresh session or summarise earlier context into a shorter block.

Is a higher token limit always better in a model?

A larger context window gives you more flexibility, but models with very large context windows sometimes show reduced attention quality on content buried in the middle of the window. For most marketing tasks, a focused shorter prompt inside a moderate context window will outperform a sprawling prompt that fills a large one.

Debrief

Get the next one

No spam. No fluff. Just the next article, straight to your inbox.

Keep exploring

About New Rebellion

New Rebellion is a marketing intelligence consultancy. We build tools, score Australian businesses on how their marketing actually performs, and publish Debrief every day. This dictionary is part of how we work in the open.

How we think →