You've added a reranker to your pipeline — a small LLM that takes your top-100 vector database results and reorders them by actual relevance. In theory, this fixes ranking. In practice, it's a partial solution at best.
Here's why: reranking can't fix a retrieval set that doesn't contain the right answer.
Your vector database returns top-K results by cosine similarity. The reranker evaluates and reorders those candidates. Better reranking means better ordering of what you already have. But if the documents that actually answer your query aren't in that top-K, the reranker is powerless. It can't create information that wasn't retrieved.
This is the difference between ranking and retrieval. Ranking optimizes order. Retrieval decides which documents surface in the first place. Reranking solves a ranking problem. It can't solve a retrieval problem.
The failure mode is subtle. Your reranker works beautifully on the results you show it. Metrics improve. But you're not measuring what escaped the initial retrieval entirely. You measure reranking quality on the candidate set and miss the documents that should have been retrieved but weren't.
This becomes obvious with low-performing queries. You investigate, discover the right document exists in your knowledge base — it just wasn't in the top-100 from the vector database. The reranker never saw it.
Where does this happen? Constantly. Technical questions retrieve marketing content because it has more keyword overlap. Rare edge-case questions retrieve common issues because similar doesn't mean relevant. Queries about deprecated features retrieve current documentation because "current" occupies more semantic space.
The real fix requires improving retrieval, not ranking. Multi-signal retrieval that doesn't rely exclusively on similarity. Document relationships for meaningful context. Metadata, temporal information, and structural context in initial candidate selection.
Teams often skip this because reranking is easier. No reindexing, no architecture changes — just add a model on top. For some queries it helps. For others, you're polishing results that should never have been retrieved.
This ties directly to why similarity is not context. If your retrieval relies solely on similarity, you're choosing the wrong documents for many queries. A reranker improves the ordering of wrong documents but can't correct the fundamental strategy. Understanding why top-K is a terrible proxy explains why larger K plus reranking is often less effective than better retrieval. And seeing how similar results still fail agents shows the full scope of what similarity-based retrieval misses.
FAQ
Should I use a reranker or improve retrieval? Both, in that order. Good retrieval gets right documents into the candidate set. Reranking optimizes their ordering. If you only have budget for one, invest in retrieval.
How much does reranking actually improve metrics? If your top-100 already contains right answers, reranking improves metrics 5-15%. If relevant documents are missing entirely, improvement is under 2%.
Conclusion
Reranking is popular because it's accessible. You don't rethink your pipeline — just add a model on top. But you're optimizing within constraints that shouldn't exist.
Fix retrieval first. Reranking second. That's the order that matters.