The AI trade has began complicated consumption with intelligence. Greater context home windows turned a characteristic warfare. Extra tokens turned an indication of sophistication. Quietly, token utilization turned a proxy for progress.
That ought to concern us.
We’re normalizing AI techniques that repeatedly ask for a similar context and use compute to unravel issues they need to already bear in mind methods to remedy. The result’s an rising anti-pattern groups now describe as “token maxxing”: treating greater token consumption as proof of deeper intelligence or higher productiveness. It isn’t. In lots of instances, it alerts the other.
A stateless system shouldn’t be clever just because it generates loads of exercise. If something, extreme token consumption typically signifies that the mannequin’s underlying structure is failing.
I’ve seen this sample earlier than. We as soon as measured engineering productiveness by way of strains of code written. Then we discovered that extra code meant extra complexity and extra methods for techniques to interrupt. Mature engineering organizations finally stopped rewarding quantity and began
rewarding class, effectivity, and reliability as an alternative. I consider AI techniques are heading towards the identical reckoning.
Stateless techniques are creating synthetic work
Proper now, many groups are constructing workflows the place the mannequin spends extra time rebuilding context than fixing the precise drawback. Each immediate begins from zero, each session requires rehydrating historical past, and orchestration layers inject extra context and instruments simply to recreate the
understanding the mannequin already had 5 minutes in the past.
Ask a coding assistant a couple of bug you have been debugging yesterday, and it behaves just like the dialog by no means occurred. You paste the identical repository construction into a number of prompts as a result of the system forgot it. You repeatedly clarify the identical inner APIs and rewrite prompts, not as a result of the duty modified, however as a result of the mannequin misplaced the thread. Then we surprise why token counts explode.
A working paper from the Stanford Digital Financial system Lab states that agentic AI duties devour 1,000x extra tokens than commonplace code chat, pushed by enter tokens – as a result of the agent should re-read your entire dialog historical past earlier than each motion. This creates a harmful phantasm. Groups begin believing that the rising complexity of the interplay itself is proof that significant reasoning is occurring. Giant prompts and orchestration graphs look subtle. Big token consumption begins feeling like computational seriousness. However typically, the system is solely compensating for lacking reminiscence. And the individual on the opposite finish, the developer, the client, the top consumer, is the one absorbing that price in slower responses, damaged context, and interactions that begin over each time.
A shocking quantity of what’s marketed immediately as “agentic intelligence” is context-reconstruction overhead. A workflow that wants a number of brokers and repeated immediate injection simply to reply a deterministic query shouldn’t be scaling intelligence. It’s scaling inefficiency.
Greater context home windows aren’t the identical factor as reminiscence
This drawback turns into much more apparent in enterprise environments the place AI techniques function throughout fragmented instruments, codebases, tickets, paperwork, chats, and operational techniques. With out sturdy reminiscence, each interplay turns into costly reassembly work.
The irony is that software program engineering solved variations of this drawback many years in the past. Databases don’t recompute every part from scratch for each question as a result of rebuilding context repeatedly is inefficient, costly, and pointless. But many AI techniques successfully function like goldfish with huge vocabularies.
The present obsession with context home windows dangers making this worse. Increasing the quantity of knowledge a mannequin can devour is beneficial, however greater context home windows aren’t the identical factor as reminiscence. Feeding extra tokens right into a stateless system doesn’t magically create continuity. It merely will increase the short-term info the mannequin should course of earlier than forgetting it once more.
Of their Tokenomics paper, researchers from the Information-driven Evaluation of Software program (DAS) Lab at Concordia College discovered that enter tokens common 53.9% of complete consumption, a price created by re-reading gathered context, not producing new solutions. Builders must be cautious to not confuse short-term context accumulation with sturdy intelligence. In some unspecified time in the future, builders will cease asking what number of tokens a workflow consumes and begin asking why it wanted so many within the first place.
AI growth is changing into a techniques design drawback
As an alternative of treating AI primarily as a prompting drawback, we have to begin treating it as a techniques design drawback. The vital questions turn out to be very completely different. How will we scale back redundant inference cycles? How will we keep persistent context throughout classes and protect codebase reminiscence over time?
These are infrastructure and structure questions. Not immediate engineering methods. In my expertise, the groups making actual progress have already figured that out.
Efficient AI techniques will possible begin to look much less like endlessly chatting assistants and extra like memory-aware computational techniques. They’ll protect relationships between selections, code modifications, incidents, workflows, and operational historical past. They’ll perceive continuity
with out requiring builders to restate every part repeatedly. Most significantly, they may shift the worth equation away from interplay quantity and towards end result high quality. As a result of builders aren’t paid to generate tokens. They’re paid to unravel issues.
The long run belongs to techniques that bear in mind
The present AI cycle rewards exercise extra visibly than outcomes. I see organizations celebrating AI exercise slightly than engineering outcomes. Groups more and more measure progress by way of interplay quantity: extra prompts, extra orchestration layers, extra brokers, and extra generated output. In some instances, builders are spending extra time managing AI than doing the work that truly issues – the architectural selections, the product pondering, the client influence.
One of the best infrastructure techniques are sometimes those you barely discover as a result of they take away friction as an alternative of making ceremony. A very clever growth system shouldn’t require builders to consistently reconstruct context, supervise orchestration chains, or handle immediate gymnastics simply to take care of continuity. For me, one of the best techniques are those you barely discover. They bear in mind sufficient to cease asking the identical questions.
