Why Stateless Agents Recompute Everything

BACK TO BLOGS

Engineering

Why Stateless Agents Recompute Everything

Every time a returning user connects with a stateless AI agent, the agent redoes work it already completed. It retrieves context already retrieved yesterday. It re-establishes facts already confirmed last week. It reconstructs reasoning already articulated and agreed upon.

For a single user, this redundancy is invisible. Across thousands of returning users, the cumulative cost is significant and entirely avoidable.

The Hidden Cost of Forgetting

Stateless agents process every session from zero. No prior work carries forward because no prior work was stored.

Consider a user managing a multi-week project. In session one, the agent retrieves documentation, establishes requirements, and recommends an approach. In session two, the user returns with a follow-up — and the agent must retrieve the same documentation, re-establish the same requirements, and reconstruct the same reasoning before addressing the new question.

The agent is not doing new work. It is redoing old work that could have been persisted and accessed directly.

Where Costs Accumulate

Three categories compound with each repeated session.

API calls multiply. Each session triggers fresh retrieval queries — the same chunks fetched again because the agent has no record of having seen them. Token usage inflates as prior context must be re-injected into every prompt. And latency increases as the agent reconstructs context that should have been immediately available.

A user's project context that could be summarized in 200 tokens from persistent memory instead requires re-processing thousands of tokens of source documents — every single session.

The Scale Problem

These costs scale linearly with returning users. An agent serving a hundred daily users with five return visits each runs five hundred sessions. If each return session spends 30% of computation on redundant re-retrieval, that is one hundred fifty sessions worth of wasted computation daily.

At enterprise scale — thousands of users, dozens of returns per week — the waste becomes a material line item in API bills, latency percentiles, and user feedback about slow follow-up responses.

Amortizing Through Persistence

Stateful agents amortize context-gathering cost across sessions. Information retrieved once is stored for future access. Decisions extracted from conversations persist as structured memory, accessible in milliseconds rather than re-derived from knowledge base queries.

The first session costs the same as stateless architecture. Every subsequent session costs less because the agent builds on prior work. The more a user returns, the greater the advantage — the opposite of stateless systems where frequent users are most expensive.

Frequently Asked Questions

How much does recomputation actually cost?

Teams with high returning-user rates report 30-50% of per-session computation goes to redundant context re-establishment. The percentage grows as conversations span more sessions.

Does caching solve this?

Caching helps for short-term repetition but not cross-session persistence. Cache entries expire or get evicted. Persistent memory retains and evolves context indefinitely.

Conclusion

Stateless agents pay full context assembly cost on every interaction regardless of how many times the same user has connected. This recomputation is invisible at prototype scale and significant in production. Stateful architectures eliminate the waste by persisting context — making returning users cheaper to serve rather than more expensive.

Enjoying this article?

Get the latest blogs and insights straight to your inbox.

Security

SOC2 in the Loop

Written by:

Sarah Vance

8 min read

Performance

Latency: The New Gold

Written by:

David Kim

4 min read

Architecture

Beyond the Vector DB

Written by:

Elena Ro

7 min read