The economics of generative AI are increasingly being measured in tokens. Every prompt submitted, document analysed, agent action executed, and response generated consumes tokens.
It is the price you pay for a language model to read, reason over and generate a response. When you pay for tokens, you are paying for the model's attention. And like all forms of attention, it can be focused well or wasted entirely. The problem with how most enterprises currently think about token spend is that they treat it like a utility bill. Electricity in, output out. That analogy breaks down the moment you ask the question that should be at the centre of every AI governance conversation: what did we actually get for that?
Token consumption is a vanity metric
When token consumption is measured as a standalone metric, it behaves like a vanity number. It tells you something about activity. It tells you almost nothing about value. The moment an organisation starts reporting "tokens consumed this quarter" as a performance indicator without pairing it with an output or outcome metric, it is introducing a metric that will actively degrade decision-making. Maximising token usage in the belief that doing so maximises capability surfaces in various ways:
- Bloated context windows: Every API call is packed with the entire document, the full conversation history, all the metadata, and all the edge cases because no further thinking has gone into what is actually needed. The assumption is that more context provides better answers. Sometimes that is true. Often it is not. Irrelevant context degrades response quality, and it costs you every single time.
- Retry-by-default: When a model returns something suboptimal, the path of least resistance is to call it again. No error handling, no routing logic, no reflection on whether the prompt was the problem. Just more tokens, more cost, same underlying issue.
- Model mismatching: Using frontier models for tasks that a smaller, faster, cheaper model could handle with identical fidelity. Summarising a short email with a $20-per-million-token model is tokenmaxxing in its purest, most expensive form.
- Prompt sprawl: System prompts that have grown through accretion — layer after layer of instruction, correction, and caveat — until they cost thousands of tokens before the actual user request even begins.
These are not theoretical inefficiencies. In production AI systems we have reviewed, prompt and context overhead alone accounts for 40 to 60 percent of total token cost, with no corresponding uplift in output quality.
The valuemaxxing reframe
Valuemaxxing starts from a different question. Not how many tokens did we spend? but what outcome did those tokens produce, and what would we have paid for that outcome through any other means?
This reframe matters because it changes what you measure, what you optimise, and crucially, what you consider a success. The discipline is to ensure that every token you spend is in service of an outcome worth having, and that the architecture of your AI systems is designed around that principle from the ground up.
A workflow that spends 50,000 tokens to draft a regulatory report that would otherwise take a compliance analyst three hours is not expensive. It is extraordinarily cheap. A workflow that spends 50,000 tokens to produce a summary a junior employee could have written in ten minutes is not impressive. It is wasteful — and the fact that it happened automatically does not make it less wasteful.
Ultimately, tokens are best understood as an input cost, not a measure of success. The most mature AI organisations are beginning to recognise that the objective is not to minimise tokens at all costs, nor to maximise their use in pursuit of capability. The objective is to maximise the value generated per token spent. Viewed through this lens, the question is no longer whether an AI system consumed 10,000 or 100,000 tokens. The question is whether those tokens produced an outcome that was materially better, faster, cheaper, or otherwise unattainable through conventional means.
That is the foundation of valuemaxxing: not spending fewer tokens, but ensuring that every token spent contributes to an outcome worth paying for.
Read the full series
This is part two of a four-part series on the economics of enterprise AI:
- Tokenmaxxing vs Valuemaxxing: The AI Cost Debate Every Enterprise Is About to Get Wrong
- From Token Count to Economic Value: The Limits of Token Accounting
- The Capital Allocation Question: How to Measure the Economics of AI Operations
- The Architecture of Token Efficiency