Why Top-K Retrieval Is a Terrible Proxy for What Your Agent Actually Needs

BACK TO BLOGS

Engineering

Why Top-K Retrieval Is a Terrible Proxy for What Your Agent Actually Needs

You set K=10. Your agent queries your vector database and gets back 10 results, ranked by cosine similarity. Looks clean. Feels scientific. And it's almost certainly wrong for what you're actually trying to do.

The problem isn't the ranking. It's the K part. You're asking your system to return a fixed number of results regardless of what the query actually needs.

Some queries need one answer. Others need twenty. Your agent is planning a workflow and needs to understand all available actions and constraints. Top-K=10 cuts off crucial information. The 11th result might be the one that prevents failure.

K is arbitrary because queries are different. A smart system could look at query complexity and adjust K dynamically. Standard vector databases don't do that. You pick a number and hope it works for everything. It doesn't.

Here's what happens in production. You run your agent on a small test set with K=10 and it works great. You ship to production. On day three, you hit a query type you didn't test extensively. Your agent gets 10 results when it needed 1 or 30. Performance tanks. You either accept degraded behavior or raise K globally, wasting resources on queries that never needed it.

The deeper issue is that top-K assumes relevance is a continuous spectrum where the Kth result is always "barely relevant" and the K+1th is "not relevant." That's not how information works. A result is either relevant to what your agent needs right now or it isn't. Sometimes you have 3 relevant results. Sometimes you have 15. K doesn't care.

This is why vector database limitations become painful at scale. You're making a bet about the structure of all future queries. When that bet is wrong—and it eventually is—you're stuck tuning K upward or accepting incompleteness.

The real cost shows up in context assembly. If 7 of 10 results are irrelevant, you're paying the hidden cost of irrelevant context. Every irrelevant result increases latency and token costs.

What should you do instead? Stop thinking of retrieval as "return K things that are similar" and start thinking of it as "return what this query actually needs." That requires understanding query intent, entity relationships, temporal constraints, and user context. Your vector database can't do that.

You need smarter context assembly and systems that actually understand context the way your agent does. You need to retrieve based on relevance to task, not similarity to embedding. You need to vary result count based on query structure.

FAQ

Should I set K higher to be safe? No. Higher K adds cost, latency, and noise. If K=10 isn't enough, your retrieval approach is wrong, not your K value.

How do I know if my K is too low? If your agent frequently reports missing information or makes worse decisions with certain query types, K is probably cutting off relevant results.

Can reranking fix this? Reranking helps with ranking order but doesn't solve the fundamental problem: K results might be the wrong number regardless of how you rank them.

Conclusion

Top-K is a convenient default, not a solution. It pretends that all queries have the same structure and that you can optimize globally for something that's deeply local—what each specific agent needs right now.

The systems winning in production treat K as a variable, not a constant. They understand that retrieval isn't about similarity at scale. It's about assembling the exact context that matters.

Stop defending K. Start building systems that know what they actually need to retrieve.

Enjoying this article?

Get the latest blogs and insights straight to your inbox.

Security

SOC2 in the Loop

Written by:

Sarah Vance

8 min read

Performance

Latency: The New Gold

Written by:

David Kim

4 min read

Architecture

Beyond the Vector DB

Written by:

Elena Ro

7 min read