Management theorist Peter Drucker observed that "what gets measured gets managed." The problem, as organisations have repeatedly discovered over the subsequent seventy years, is that what gets measured is not always what matters. Generative AI presents the latest version of this challenge: can token cost be considered a relevant operational metric? The answer is yes, but not in isolation and not in the rudimentary way in which it is currently being discussed.
Token cost as a standalone metric is like measuring a consulting engagement by the number of slides produced. It tells you something about activity. It tells you almost nothing about value. The moment an organisation starts reporting "tokens consumed this quarter" as a performance indicator without pairing it with an output or outcome metric, it has introduced a metric that will actively degrade decision-making.
The metric that actually works: token efficiency
The metric that actually works is token efficiency: the ratio of value delivered to tokens consumed. Expressed operationally, this looks like:
- Cost per resolved query: for internal helpdesks, support tools, knowledge retrieval
- Cost per completed deliverable: for content generation, document drafting, code generation
- Cost per decision supported: for analytics, risk flagging, compliance review
- Time saved per token spent: for augmentation use cases where the alternative is human hours
These ratios are measurable. They require you to define what "value" means for each workflow, which is the productive work that most organisations skip. Once you have defined it, token cost becomes a meaningful input into a meaningful calculation, rather than a number that sits in an infrastructure dashboard and quietly terrifies people.
The ROI calculation in practice
The ROI on tokens looks exactly like any other capital return calculation, adjusted for the time and human effort displaced. For example, take a workflow common in financial services: first-pass transaction-monitoring alert triage. A compliance analyst at a mid-size asset manager, costed internally at £60 to £80 per hour, spends between 45 minutes and two hours reviewing and categorising each alert batch against current AML typologies before escalating or clearing it.
A well-designed AI workflow — with appropriately scoped context, the right model tier, and structured output validation — can produce a comparable first-pass triage in under three minutes, at a token cost of roughly £0.05 to £0.15. The ROI is not 100 percent. It is not 200 percent. It is in the thousands of percent, per analyst, per week, compounded across a compliance function.
The question is not whether there is a return. The question is whether the organisation has the governance infrastructure to capture it — and whether the AI system is designed to preserve answer integrity rather than cut corners on the path to those savings.
Answer integrity is non-negotiable. An AI workflow that reduces analyst hours but introduces a 5 percent false-negative rate on suspicious activity that goes undetected is not delivering positive ROI. In a regulated financial services environment, it is accumulating regulatory liability. Valuemaxxing requires you to hold both dimensions simultaneously: the cost of tokens, and the quality of outputs. You cannot optimise one without measuring the other.
Tokens as part of the people budget
It gets more complex when evaluating tokens as part of a compensation model for a development team and assessing the associated budget requirement. The question being asked in forward-thinking engineering teams is this: if I am allocating budget to a developer role, should I think about that allocation in two parts — the person cost and the model cost? And if so, how do I balance them?
The simple version of the argument runs like this: a mid-level developer augmented with a meaningful AI tooling budget might outperform a more senior developer with no AI tooling, at a comparable or lower total cost. This is the kind of claim that gets shared at conferences and retweeted widely. It is not entirely wrong. It is also not a framework. The moment you try to operationalise it — to decide what the right split is, for which roles, on which tasks — the simplicity collapses, and what you are left with is a series of questions that require genuine thinking about how work actually gets done.
The more honest version is considerably more nuanced, and it runs like this:
- Junior developers do not need fewer tokens, they need different tokens. The assumption that a junior engineer requires more AI assistance and should therefore receive a larger token allocation misunderstands what good augmentation looks like. A junior developer using a frontier model for first-draft code generation, debugging assistance, and test case creation is learning faster than any cohort in the history of the profession. The token cost is not a subsidy for their inexperience. It is an investment in their acceleration.
- Senior developers do not need fewer tokens because they are more capable, they need more tokens because they are tackling harder problems. A principal engineer reasoning through a distributed systems architecture, stress-testing a security model, or reviewing a new regulatory requirement for its technical implications is engaged in exactly the kind of complex, high-context work where frontier models deliver the most value. Reducing their allocation in the name of efficiency is the wrong trade.
- The stratification risk is real, and it mirrors the broader inequality dynamic it could be used to address. If organisations allocate token budgets on a seniority curve — more tokens to more senior roles, less to juniors — they risk encoding the same asymmetries that already exist in knowledge access and career progression. The senior team compound their advantage. The junior team get just enough to keep pace, but not enough to close the gap. This is not a hypothetical. It is a predictable organisational outcome of applying cost-optimisation logic to a learning and development question.
Define what you want each role tier to produce — the deliverables, the quality bar, and the time frame — then calculate the token budget that enables that outcome at the appropriate level of fidelity.
From technology metric to capital allocation decision
The most important shift organisations must make is to stop thinking about tokens as a technology metric and start treating them as a capital allocation decision. Viewed this way, token budgets begin to resemble any other investment portfolio. Some expenditures generate exceptional returns by accelerating high-value work, improving decision quality, reducing operational risk, or unlocking entirely new capabilities. Others simply automate low-value activity at scale. Both consume tokens. Only one creates meaningful value.
The organisations that succeed in the next phase of AI adoption will not be those that spend the fewest tokens or even those that spend the most. They will be those that develop the capability to allocate tokens with the same rigour they apply to capital, talent, and strategic investment.
Read the full series
This is part three of a four-part series on the economics of enterprise AI:
- Tokenmaxxing vs Valuemaxxing: The AI Cost Debate Every Enterprise Is About to Get Wrong
- From Token Count to Economic Value: The Limits of Token Accounting
- The Capital Allocation Question: How to Measure the Economics of AI Operations
- The Architecture of Token Efficiency