A product manager asks an AI agent: "How did our churn rate change after we revised the onboarding flow last quarter?" Answering this requires finding the onboarding change date, pulling churn metrics before and after, and comparing the two periods.
Vector search handles each lookup independently. It has no mechanism for the first retrieval to inform the second, no way to carry context between steps.
Single-Query Architecture Meets Multi-Step Problems
Vector databases are optimized for one pattern: query in, ranked results out. This works for factual lookups. It fails when the answer requires synthesizing information from multiple retrieval steps.
Multi-step reasoning appears in virtually every production use case beyond FAQ lookup. Financial analysis combines data from different periods. Legal review cross-references clauses across documents. Customer support escalations require understanding case history.
In each scenario, retrievals depend on each other. Vector databases treat every query as stateless — no mechanism passes context from one search to the next.
Why Application-Layer Chaining Is Fragile
Teams build multi-step logic at the application layer. The agent issues a first query, parses results, formulates a second based on what it found, and repeats.
This works for predictable chains. It breaks when the chain is dynamic — when the agent does not know how many steps are needed or what each query should look like. Each handoff introduces latency and error potential. The approach also lacks the ability to model relationships between retrieved facts, forcing the agent to infer connections through prompt engineering.
The Compounding Error Problem
In multi-step retrieval, errors compound. If each step has 85% accuracy, a three-step chain drops to 61%. A five-step chain falls to 44%.
Vector databases offer no error correction across steps. Each retrieval is independent, so there is no mechanism to detect inconsistency between intermediate results. Systems with learning loops can identify which patterns fail and adjust. Static search repeats the same failure every time.
What Enables Multi-Step Reasoning
Architectures that support multi-step reasoning maintain state across retrieval steps — tracking what has been retrieved and how the current query relates to the overall goal.
Knowledge graphs are effective here because they encode explicit relationships. Instead of searching for "churn rate after onboarding change" as a semantic query, the system traverses connections: onboardingchange → date → churnmetrics → comparison. Temporal models ensure the system retrieves from the right time periods at each step.
Frequently Asked Questions
Can an LLM orchestrate multi-step retrieval over a vector database?
Many frameworks do this. The limitation is that every step incurs a separate call with no shared context, and the LLM must infer relationships a graph could provide directly. This works for simple chains but becomes unreliable as complexity grows.
How many steps can systems reliably handle?
Three to four with well-optimized retrieval. Beyond that, compounding errors require architectural support — state management, relationship models, and error detection — rather than sequential vector lookups.
Conclusion
Multi-step reasoning is the default for any task beyond single-fact lookup. Vector databases were not designed for stateful, dependent retrievals. Agents answering complex questions need infrastructure that understands reasoning as a process, not a single query.