Tokenmaxxing vs Valuemaxxing: The AI Cost Debate Every Enterprise Is About to Get Wrong

A recent note from Goldman Sachs Research projects that the rapid deployment of AI agents will materially lift free cash flow across the technology sector as inference volumes scale into the trillions of tokens. The forecast is clean, directional, and already shaping how capital is being allocated. It also describes precisely the supply-side dynamic that will most effectively mask a crisis on the demand side: the actual enterprises funding the explosion. What the supply-side narrative omits is laid bare by a growing body of enterprise data: spending on generative AI is scaling dramatically, but the link between that expenditure and hard business outcomes remains dangerously thin.

Consider the numbers. Enterprise spending on generative AI surged roughly 500% year-over-year in 2024, reaching $13.8 billion, according to Menlo Ventures' annual survey of enterprise adoption. Yet in a parallel McKinsey Global Survey, only 11% of organisations reported that their generative AI initiatives had contributed more than 5% to revenue or other bottom-line metrics.

The conversation has become increasingly predictable: spending rises, usage grows, and dashboards fill with activity metrics. What remains elusive is a clear line between consumption and value. Until that gap is understood, the most fundamental question remains unanswered: what did the organisation get for all those tokens?

Tokenmaxxing: a structurally flawed operating model

This absence of inquiry is not a curiosity. It is the precise mechanism by which a structurally flawed operating model embeds itself into enterprise cost bases for years. I call that model tokenmaxxing. This is an organisational reflex that treats token volume, context window expansion, and inference frequency as legitimate proxies for deployed intelligence.

The data flatly contradicts this assumption. A 2024 Gartner survey of CIOs found that at least 30% of generative AI projects are expected to be abandoned before the end of 2025, with escalating costs and unclear business value cited as the primary reasons. Internal telemetry from enterprise deployments adds granularity to the picture: inference-cost analyses regularly show that a significant fraction of tokens in production AI systems are consumed by unnecessarily long contexts or multi-step reasoning chains that yield no improvement on the target task. Researchers at Stanford's Institute for Human-Centered AI have highlighted that the same output quality can often be achieved with a fraction of the compute when prompting and retrieval are systematically optimised, yet most enterprises default to "more inference" rather than "smarter inference."

The result is not greater intelligence. It is merely more noise. And it is being funded by a budgetary tolerance that will not survive the current cycle of scrutiny, no matter how bullish the sector-wide cash-flow projections look from the outside.

Valuemaxxing: the counter-model

The counter-model is not cost-cutting. Cost-cutting alone leaves the correlation problem untouched; it simply reduces the denominator while preserving the same negligible yield per token. The alternative is what I call valuemaxxing. This is a systematic re-engineering of AI expenditure around value capture per unit of inference.

Early evidence from organisations that have adopted this discipline is compelling. McKinsey's research on AI scaling has found that companies that tightly govern their AI spending and measure returns rigorously generate up to three times the bottom-line impact of their peers per dollar invested. Separately, the State of AI report by Nathan Benaich and Air Street Capital has documented that leading enterprises are now explicitly managing AI cost as a first-order metric, using caching, retrieval-augmented generation, and small fine-tuned models to reduce token consumption while improving task accuracy. This combination unlocks precisely the kind of structural cost-of-intelligence advantage that no amount of model commoditisation can close.

This series is not an argument for spending less on AI. It is an examination of how to make AI yield: how to design systems, metrics, and governance that treat every token as an investment with an expected return, not an entitlement. The distinction will define which enterprises build durable advantage from artificial intelligence, and which ones simply finance an expensive, well-documented experiment in consumption without consequence, while their vendors report record cash flow.

Read the full series

This is part one of a four-part series on the economics of enterprise AI: