Your retrieval system found the right document. It is sitting in the results, ranked fourth out of ten. The cosine similarity score is high, and the correct answer is technically in the context window.
The agent still gets it wrong.
This is the lost-in-the-middle problem. Research has consistently shown that large language models pay disproportionate attention to information at the beginning and end of their context windows, with content in the middle receiving significantly less processing weight.
Why Retrieval Metrics Mislead
Most teams measure retrieval quality with recall@k or precision@k. These check whether the relevant document appears somewhere in the top k results. By this standard, the retrieval succeeded.
But the agent uses whatever lands in its context window, weighted by position. A relevant chunk buried at position six might as well not exist if the model's attention never reaches it with sufficient focus.
Your retrieval pipeline can report excellent accuracy while end-to-end agent performance is mediocre. The disconnect happens because retrieval and context assembly are treated as the same step when they should be separate, deliberate processes.
Position Matters More Than Relevance
Several studies have demonstrated that even state-of-the-art models lose accuracy when key information is placed in the middle third of the context window. Moving the same relevant passage from position one to position five reduced answer accuracy by over 20 percentage points in some experiments.
This is not a model bug that will be fixed with the next generation. It is a consequence of how attention mechanisms work. Teams need retrieval systems that do not just find relevant content but place it where the model will actually use it — understanding what the agent actually needs for the current task, not just what is semantically similar.
What Better Context Assembly Looks Like
Effective context assembly goes beyond ranked retrieval. First, deduplication — removing chunks that convey the same information in slightly different words. Second, prioritization — placing the most task-critical information at the beginning of the context window. Third, compression — summarizing supporting details rather than including full chunks.
Some architectures use iterative retrieval, where the agent processes an initial context window, identifies gaps, and issues follow-up queries. This treats retrieval as a conversation between the agent and the knowledge base, not a single lookup. Agents that synthesize across multiple retrieved chunks benefit most from this approach.
Frequently Asked Questions
Does increasing context window size fix this problem?
Not reliably. Larger windows give more room, but the attention distribution problem persists. Models still prioritize the edges over the middle, and adding more content can push critical information further from the attention peaks.
How can I detect if my agent is losing information in the middle?
Test with the relevant chunk at different positions. If accuracy drops significantly when the answer moves from position one to position five, your system has a context assembly problem, not a retrieval problem.
Conclusion
Finding the right information is only half the problem. Placing it where the agent will actually use it is the other half. The lost-in-the-middle problem requires treating context assembly as a distinct engineering challenge — one where position and structure matter as much as relevance scores.