Mem0 vs Zep vs Letta: Which AI Memory Solution Is Right for You?
Your AI agent just solved a complex customer problem. Tomorrow, that customer returns—but the agent has no idea what happened last week.
This is the memory problem. Every conversational AI, every agent, every LLM-powered system faces it. Without persistent memory, agents start fresh every session, losing context, relationships, and learned patterns. Your users notice. Your costs spike.
Three frameworks have emerged as the leading solutions: Mem0, Zep, and Letta. Each takes a fundamentally different approach to how agents should remember. One acts like a bolt-on external service. One is a complete runtime. One models memory as a temporal knowledge graph that tracks how facts change over time.
I've analyzed all three to help you understand which fits your architecture, your team, and your timeline. Let's start with the philosophies.
The Three Philosophies of Agent Memory
Mem0: Memory as a Service
Mem0 treats memory as infrastructure you bolt onto your existing agent. You keep your agent framework (LangGraph, CrewAI, AutoGen). You keep your runtime. Mem0 becomes a managed layer that sits between your agent and your vector database.
The core idea is simple: add memory in one line of code. Your agent stays yours. The memory layer stays theirs.
Mem0 achieves a 26% accuracy boost over baseline LLM memory, 91% faster responses than full-context loading, and 90% lower token usage. It uses a two-phase pipeline: extraction (pulling salient facts from conversations) and update (intelligently merging new facts with existing ones).
How the extraction phase works: When a conversation happens, Mem0 ingests three context sources—the latest exchange, a rolling summary, and the most recent messages. An LLM extracts candidate memories (discrete facts) from this context. These are fed into the update phase.
How the update phase works: For every candidate fact, the system fetches the K most similar memories from your vector database using embedding similarity. An LLM compares the candidate against these neighbors and decides to ADD (new information), UPDATE (modify existing fact), or DELETE (superseded).
Mem0 also extracts entities and relationships from every memory write, storing embeddings in your vector database and mirroring relationships in a graph backend (Neo4j, Memgraph). This graph layer lets you query both semantic relationships and structural connections.
This is attractive if you've already built an agent and want to drop in memory without rearchitecting.
Letta: Memory as Runtime
Letta inverts the problem. Instead of bolting memory onto your agent, Letta becomes the platform where your agent lives.
Think of it like an operating system. Your agent runs inside Letta. Letta manages the memory system, the context window, the tool execution loop—everything. The agent has direct access to three memory tiers inspired by computer architecture: core memory (like RAM, lives in the context window), recall memory (like disk cache, searchable conversation history), and archival memory (like cold storage, queried via tools).
Core memory is fixed-size, in-context, writeable only via function calls. It's for key facts and persona. Recall memory is a table that preserves the complete interaction history—searchable and retrievable even when not in the active context window. Archival memory is a vector database containing long-running memories and external data the agent needs, indexed for rapid retrieval.
The agent edits its own memory. If a customer preference changes, the agent calls a memory-edit tool to update core or archival blocks. This is radically different from Mem0's automatic extraction. You decide when and what to persist.
Letta's genius is treating the LLM's context window as a constrained resource (like RAM in an OS). The agent can move data between in-context core memory and externally stored recall/archival memory, creating an illusion of unlimited memory while respecting fixed context limits.
The upside: your agent has full control over what it remembers and when. The downside: you're moving from your existing agent framework into Letta's ecosystem.
Zep: Memory as Knowledge Graph
Zep models memory as a temporal knowledge graph. Facts aren't just stored—they're timestamped. When information changes, the old fact is marked invalid, not deleted.
A customer tells you "I work at Acme." A year later, "I work at Beta Inc." Zep doesn't overwrite. It records both facts with validity windows. Query Zep for what's true now, or what was true at any point in time. Every fact has a validity window: when it became true, and when (if ever) it was superseded.
Zep's Graphiti engine combines conversational history with structured business data, layering them into a cohesive graph. It simultaneously handles chat histories, structured JSON data, and unstructured text, with all data sources feeding into a single temporal graph.
Retrieval is where Zep shines. Instead of pure vector search, Graphiti uses a hybrid approach combining semantic embeddings, BM25 keyword search, and direct graph traversal. The system reranks results using graph distance (how close facts are in the graph), episode mentions (which conversation turns referenced this fact), and cross-encoder LLMs for final relevance scoring.
Performance is impressive: in the DMR benchmark, Zep achieves 94.8% accuracy versus 93.4% for MemGPT. Response latency drops by roughly 90% versus vector-only retrieval. P95 latency clocks in at 300ms even with complex queries.
Zep is the newest and least battle-tested in production, but the temporal tracking and graph-native retrieval are compelling for compliance-heavy scenarios.
Architecture Deep Dive
Storage and Retrieval Comparison
Mem0 stores memories in a combination of vector stores (Qdrant, Pinecone, ChromaDB, PGVector) and graph databases (Neo4j, Memgraph). Retrieval happens via vector similarity over embeddings, optionally enhanced by graph traversal for relationship-aware queries. Latency is typically 50–200ms for recall, depending on your backend scale.
The extraction + update pipeline introduces a slight write delay (a few hundred milliseconds) as facts are processed by the LLM before persisting. This eventual consistency model is acceptable for most agent workloads where perfect real-time consistency isn't required.
Letta stores memory in its own database (Postgres by default). Core memories are pinned in the system prompt, never requiring a lookup. When an agent needs archival or recall memories, it calls a search tool, which queries by embedding or full-text search. Latency is similar to Mem0 (50–300ms) but depends on your database performance.
Because the agent explicitly edits memory via function calls, every write is immediately persisted and immediately readable. This transactional consistency guarantees that an agent's next action always sees the memory it just wrote.
Zep uses temporal knowledge graphs stored in graph-native databases. Retrieval combines semantic search, keyword matching, and graph structure reasoning. Benchmarks show 90% latency reduction versus vector-only methods—the hybrid approach is faster because it eliminates false positives early using graph distance before expensive cross-encoder scoring.
For scalability: Mem0 and Zep both scale horizontally via managed infrastructure or self-hosted cloud deployments. Letta scales with your database; self-hosted Letta requires your ops overhead to handle growing memory tables.
Consistency models matter differently:
Mem0 (eventually consistent): Facts are extracted, compared, and merged asynchronously. Good for conversational agents where a 500ms delay is acceptable.
Letta (transactional): Every memory edit is atomic. Good for agents that build on their own memories in the same session.
Zep (temporal/transactional): Facts are timestamped at write. Good for compliance where you need to prove exactly when information became true.
SDK and Integration Experience
Mem0 has the gentlest integration curve. Adding memory is literally one line:
<span class="kn">from</span><span class="w"> </span><span class="nn">mem0</span><span class="w"> </span><span class="kn">import</span> <span class="n">Memory</span>
<span class="n">memory</span> <span class="o">=</span> <span class="n">Memory</span><span class="o">.</span><span class="n">from_config</span><span class="p">(</span><span class="n">config</span><span class="p">)</span>
<span class="n">memory</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s2">"user message"</span><span class="p">,</span> <span class="n">user_id</span><span class="o">=</span><span class="s2">"customer_123"</span><span class="p">)</span>
<span class="n">messages</span> <span class="o">=</span> <span class="n">memory</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s2">"What does customer_123 need?"</span><span class="p">)</span>
<span class="kn">from</span><span class="w"> </span><span class="nn">mem0</span><span class="w"> </span><span class="kn">import</span> <span class="n">Memory</span>
<span class="n">memory</span> <span class="o">=</span> <span class="n">Memory</span><span class="o">.</span><span class="n">from_config</span><span class="p">(</span><span class="n">config</span><span class="p">)</span>
<span class="n">memory</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s2">"user message"</span><span class="p">,</span> <span class="n">user_id</span><span class="o">=</span><span class="s2">"customer_123"</span><span class="p">)</span>
<span class="n">messages</span> <span class="o">=</span> <span class="n">memory</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s2">"What does customer_123 need?"</span><span class="p">)</span>
<span class="kn">from</span><span class="w"> </span><span class="nn">mem0</span><span class="w"> </span><span class="kn">import</span> <span class="n">Memory</span>
<span class="n">memory</span> <span class="o">=</span> <span class="n">Memory</span><span class="o">.</span><span class="n">from_config</span><span class="p">(</span><span class="n">config</span><span class="p">)</span>
<span class="n">memory</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s2">"user message"</span><span class="p">,</span> <span class="n">user_id</span><span class="o">=</span><span class="s2">"customer_123"</span><span class="p">)</span>
<span class="n">messages</span> <span class="o">=</span> <span class="n">memory</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s2">"What does customer_123 need?"</span><span class="p">)</span>
No rearchitecture. Works with OpenAI, LangGraph, CrewAI. Python and JavaScript both supported. You can drop it into an existing agent in minutes. The extraction pipeline runs asynchronously in the background.
Letta requires more setup because you're moving your agent into Letta. You define an agent with memory blocks instead of just adding memory to an existing agent:
<span class="kn">from</span><span class="w"> </span><span class="nn">letta</span><span class="w"> </span><span class="kn">import</span> <span class="n">Letta</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">Letta</span><span class="o">.</span><span class="n">default</span><span class="p">()</span>
<span class="n">agent</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">create_agent</span><span class="p">(</span>
<span class="n">name</span><span class="o">=</span><span class="s2">"support_agent"</span><span class="p">,</span>
<span class="n">tools</span><span class="o">=</span><span class="p">[</span><span class="s2">"memory_edit"</span><span class="p">,</span> <span class="s2">"web_search"</span><span class="p">,</span> <span class="s2">"memory_search"</span><span class="p">],</span>
<span class="n">memory_blocks</span><span class="o">=</span><span class="p">{</span>
<span class="s2">"core"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"max_tokens"</span><span class="p">:</span> <span class="mi">2000</span><span class="p">},</span>
<span class="s2">"recall"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"max_tokens"</span><span class="p">:</span> <span class="mi">8000</span><span class="p">},</span>
<span class="s2">"archival"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"max_tokens"</span><span class="p">:</span> <span class="mi">50000</span><span class="p">}</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="c1"># The agent now calls memory_edit to persist facts</span>
<span class="n">agent</span><span class="o">.</span><span class="n">core_memory</span><span class="o">.</span><span class="n">edit</span><span class="p">(</span><span class="s2">"Customer prefers email over phone"</span><span class="p">)</span><span class="kn">from</span><span class="w"> </span><span class="nn">letta</span><span class="w"> </span><span class="kn">import</span> <span class="n">Letta</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">Letta</span><span class="o">.</span><span class="n">default</span><span class="p">()</span>
<span class="n">agent</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">create_agent</span><span class="p">(</span>
<span class="n">name</span><span class="o">=</span><span class="s2">"support_agent"</span><span class="p">,</span>
<span class="n">tools</span><span class="o">=</span><span class="p">[</span><span class="s2">"memory_edit"</span><span class="p">,</span> <span class="s2">"web_search"</span><span class="p">,</span> <span class="s2">"memory_search"</span><span class="p">],</span>
<span class="n">memory_blocks</span><span class="o">=</span><span class="p">{</span>
<span class="s2">"core"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"max_tokens"</span><span class="p">:</span> <span class="mi">2000</span><span class="p">},</span>
<span class="s2">"recall"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"max_tokens"</span><span class="p">:</span> <span class="mi">8000</span><span class="p">},</span>
<span class="s2">"archival"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"max_tokens"</span><span class="p">:</span> <span class="mi">50000</span><span class="p">}</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="c1"># The agent now calls memory_edit to persist facts</span>
<span class="n">agent</span><span class="o">.</span><span class="n">core_memory</span><span class="o">.</span><span class="n">edit</span><span class="p">(</span><span class="s2">"Customer prefers email over phone"</span><span class="p">)</span><span class="kn">from</span><span class="w"> </span><span class="nn">letta</span><span class="w"> </span><span class="kn">import</span> <span class="n">Letta</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">Letta</span><span class="o">.</span><span class="n">default</span><span class="p">()</span>
<span class="n">agent</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">create_agent</span><span class="p">(</span>
<span class="n">name</span><span class="o">=</span><span class="s2">"support_agent"</span><span class="p">,</span>
<span class="n">tools</span><span class="o">=</span><span class="p">[</span><span class="s2">"memory_edit"</span><span class="p">,</span> <span class="s2">"web_search"</span><span class="p">,</span> <span class="s2">"memory_search"</span><span class="p">],</span>
<span class="n">memory_blocks</span><span class="o">=</span><span class="p">{</span>
<span class="s2">"core"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"max_tokens"</span><span class="p">:</span> <span class="mi">2000</span><span class="p">},</span>
<span class="s2">"recall"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"max_tokens"</span><span class="p">:</span> <span class="mi">8000</span><span class="p">},</span>
<span class="s2">"archival"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"max_tokens"</span><span class="p">:</span> <span class="mi">50000</span><span class="p">}</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="c1"># The agent now calls memory_edit to persist facts</span>
<span class="n">agent</span><span class="o">.</span><span class="n">core_memory</span><span class="o">.</span><span class="n">edit</span><span class="p">(</span><span class="s2">"Customer prefers email over phone"</span><span class="p">)</span>You define memory block sizes and max tokens. The agent edits them as it learns. Your existing agent logic moves into Letta's tool-calling framework. For greenfield projects, this control is elegant. For existing agents, it's a bigger lift.
Zep sits between the two. It's a service you query, but it requires thinking in graphs:
<span class="kn">from</span><span class="w"> </span><span class="nn">zep_python</span><span class="w"> </span><span class="kn">import</span> <span class="n">ZepClient</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">ZepClient</span><span class="p">(</span><span class="n">api_url</span><span class="o">=</span><span class="s2">"https://api.getzep.com"</span><span class="p">)</span>
<span class="n">session</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">memory</span><span class="o">.</span><span class="n">add_session</span><span class="p">(</span><span class="s2">"session_id"</span><span class="p">)</span>
<span class="n">client</span><span class="o">.</span><span class="n">memory</span><span class="o">.</span><span class="n">add_messages</span><span class="p">(</span><span class="n">session</span><span class="o">.</span><span class="n">session_id</span><span class="p">,</span> <span class="n">messages</span><span class="p">)</span>
<span class="c1"># Get facts with temporal context</span>
<span class="n">facts</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">memory</span><span class="o">.</span><span class="n">get_facts</span><span class="p">(</span><span class="n">session</span><span class="o">.</span><span class="n">session_id</span><span class="p">,</span> <span class="n">limit</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
<span class="c1"># facts include: fact, validity_window, importance_score</span>
<span class="kn">from</span><span class="w"> </span><span class="nn">zep_python</span><span class="w"> </span><span class="kn">import</span> <span class="n">ZepClient</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">ZepClient</span><span class="p">(</span><span class="n">api_url</span><span class="o">=</span><span class="s2">"https://api.getzep.com"</span><span class="p">)</span>
<span class="n">session</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">memory</span><span class="o">.</span><span class="n">add_session</span><span class="p">(</span><span class="s2">"session_id"</span><span class="p">)</span>
<span class="n">client</span><span class="o">.</span><span class="n">memory</span><span class="o">.</span><span class="n">add_messages</span><span class="p">(</span><span class="n">session</span><span class="o">.</span><span class="n">session_id</span><span class="p">,</span> <span class="n">messages</span><span class="p">)</span>
<span class="c1"># Get facts with temporal context</span>
<span class="n">facts</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">memory</span><span class="o">.</span><span class="n">get_facts</span><span class="p">(</span><span class="n">session</span><span class="o">.</span><span class="n">session_id</span><span class="p">,</span> <span class="n">limit</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
<span class="c1"># facts include: fact, validity_window, importance_score</span>
<span class="kn">from</span><span class="w"> </span><span class="nn">zep_python</span><span class="w"> </span><span class="kn">import</span> <span class="n">ZepClient</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">ZepClient</span><span class="p">(</span><span class="n">api_url</span><span class="o">=</span><span class="s2">"https://api.getzep.com"</span><span class="p">)</span>
<span class="n">session</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">memory</span><span class="o">.</span><span class="n">add_session</span><span class="p">(</span><span class="s2">"session_id"</span><span class="p">)</span>
<span class="n">client</span><span class="o">.</span><span class="n">memory</span><span class="o">.</span><span class="n">add_messages</span><span class="p">(</span><span class="n">session</span><span class="o">.</span><span class="n">session_id</span><span class="p">,</span> <span class="n">messages</span><span class="p">)</span>
<span class="c1"># Get facts with temporal context</span>
<span class="n">facts</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">memory</span><span class="o">.</span><span class="n">get_facts</span><span class="p">(</span><span class="n">session</span><span class="o">.</span><span class="n">session_id</span><span class="p">,</span> <span class="n">limit</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
<span class="c1"># facts include: fact, validity_window, importance_score</span>
Zep sessions are graph-scoped. Adding messages automatically triggers fact extraction and graph updates. Retrieving facts returns structured data with timestamps and validity windows, letting your agent reason about recency.
Framework compatibility:
Framework | LangGraph | CrewAI | AutoGen | LlamaIndex | Standalone |
|---|
Mem0 | ✓ | ✓ | ✓ | ✓ | ✓ |
Letta | ✗ | ✗ | ✗ | ✗ | ✓ (only) |
Zep | ✓ | ✓ | ✓ | ✓ | ✓ |
Mem0 and Zep are drop-in layers. Letta is replacement, not addition.
Quick Comparison Table
Aspect | Mem0 | Letta | Zep |
|---|
Integration Type | Bolt-on service | Complete runtime | Service layer |
Setup Time | Hours | Days/Weeks | Hours |
Framework Compatibility | LangGraph, CrewAI, AutoGen, LlamaIndex | Standalone only | LangGraph, CrewAI, AutoGen, LlamaIndex |
Hosting | Managed SaaS or self-hosted | Self-hosted or cloud | Managed cloud or self-hosted |
Storage Backend | Vector DB + Graph DB | PostgreSQL | Graph Database (native) |
Consistency Model | Eventually consistent | Transactional | Temporal/Transactional |
Write Latency | 100–500ms (async) | 10–50ms (immediate) | 50–100ms (timestamped) |
Read Latency | 50–200ms | 50–300ms | 50–150ms (with reranking) |
Memory Control | Automatic extraction | Agent-controlled | Automatic with temporal tracking |
Best For | Fast deployment, existing agents | Control, greenfield projects | Temporal reasoning, audit trails |
Cost Model | Per API call | Infrastructure only | Cloud SaaS or self-hosted |
Open Source | Yes (client) | Yes (MIT) | Yes (fully) |
Production Maturity | Highly battle-tested | Growing adoption | Newest, rapid development |
The table shows why Mem0 wins for speed, Letta wins for control, and Zep wins for temporal intelligence.
Decision Framework: Choosing the Right Tool
Choose Mem0 When
You have an existing agent running. Mem0 works with LangGraph, CrewAI, AutoGen, LlamaIndex. If your agent is already deployed and working, Mem0 adds memory without requiring rewrites.
Speed to production is critical. You need memory yesterday, not in three sprints. Mem0's integration takes hours, not weeks. The extraction pipeline is battle-tested and production-stable.
You want a managed service. Mem0 handles scaling, compliance, and uptime. You route memory calls through their API. They manage the vector database and graph backend. You pay per API call.
Budget for SaaS is available. Mem0 is a pay-as-you-go managed service. No infrastructure to operate. No database to scale yourself.
You need zero rearchitecture. Your tech stack stays the same. No migration. No rebuilding agents.
Mem0 is the pragmatist's choice—it ships fast and solves the immediate problem. Ideal for teams with working agents who just need memory bolted on.
Choose Letta When
You're building an agent from scratch. Greenfield projects where memory is part of the initial design, not an afterthought. Starting with Letta's memory-first architecture is cleaner than retrofitting.
Your agent needs to reason about its own memory. Some agents need to decide what to remember, not just have memory extracted automatically. Letta's explicit memory_edit calls let agents control their state.
You want full open source control. Letta is MIT-licensed and self-hostable. No vendor lock-in. No API costs. Just PostgreSQL and your servers.
You want memory versioning. Letta supports Git-backed memory states. You can review, diff, and revert memory changes. Useful for debugging agent behavior.
Your timeline is longer and you're willing to run newer infrastructure. Letta is mature but newer than Mem0. You're betting on its trajectory. You have time to integrate deeply.
Letta is the architect's choice—maximum control, longer ramp, unlimited scale on your infrastructure.
Choose Zep When
Temporal reasoning is core to your use case. You need to know when facts changed, not just what they are now. If compliance, audit trails, or historical accuracy matter, Zep's temporal graph is unmatched.
You're integrating structured business data with conversations. Zep's Graphiti engine handles both JSON schemas and chat history simultaneously. If you're building context graphs that link customers to organizations to tickets, Zep's multi-source graph is designed for this.
You need hybrid retrieval with low latency. Zep's combination of semantic search, BM25, graph traversal, and reranking beats pure vector similarity. If response time is critical and false positives hurt, Zep's multi-layer retrieval shines.
You're comfortable with newer infrastructure. Zep is the newest of the three and smallest production footprint. You're okay running cutting-edge tooling. The team is responsive and the project is evolving fast.
Multi-user shared context matters. Zep's graph can model shared knowledge (team memory, organizational memory) alongside individual session context. If agents need to learn from each other, Zep's model supports this.
Zep is the researcher's choice—highest sophistication, best for temporal reasoning and complex retrieval scenarios.
Consider HydraDB When
You're building B2B AI with multi-tenant requirements. You need serverless infrastructure with zero ops overhead. Response latency is critical (ultra low-latency in-memory).
You want relational context graphs that understand relationships between entities (not just embeddings). Temporal evolution of context matters—versioned facts instead of overwrites. You need plug-and-play memory without framework lock-in.
HydraDB builds intelligent relationship graphs instead of fragmented storage. It records relationships between entities so it can understand that "you work at Company A" and "you live in New York" belong to the same person's experience. It uses a Git-style versioned temporal graph, adding new states to timelines instead of overwriting.
HydraDB is the infrastructure choice—serverless, relational, temporal.
Frequently Asked Questions
Can I use Mem0 and Zep together?
Yes, but rarely. Mem0 handles the immediate memory layer with automatic extraction. Zep adds temporal fact tracking for compliance or audit. Layering them creates a two-tier system where Mem0 feeds facts to Zep, which enrich them with timestamps and validity windows. This works technically but increases operational complexity (two APIs, two databases, two potential failure modes). Most teams pick one based on their primary need.
Which has the best free tier?
Letta is fully open source and free to self-host (no free cloud tier). Mem0 offers a free tier on their managed platform with rate limits (around 100 API calls/month). Zep is also open source with optional cloud hosting, and the open source version is fully featured.
Does Mem0 work with local LLMs?
Yes. Mem0 abstracts the LLM provider. You can use Ollama, LlamaCPP, or any OpenAI-compatible API. The extraction and update phases work with any LLM capable of function calling or structured output. Memory extraction doesn't require Claude or GPT-4.
Can Letta run on serverless functions?
Letta is stateful and requires persistent database access. Lambda or serverless containers work for the agent runtime, but you need persistent storage (RDS, DynamoDB, or similar). Pure serverless (no persistent DB) breaks Letta's architecture because agents must immediately read the memories they just wrote.
How does Zep handle multi-user conversations?
Zep separates sessions by session ID. Each user or conversation gets its own temporal graph. Within a session, facts are linked and timestamped. Shared knowledge graphs (team memory, organizational knowledge) are under development but not yet standard.
What happens if I run out of memory space in Letta?
Core memory has a hard limit (usually 2000 tokens). When full, agents must move facts to archival or recall memory. This is automatic via memory_edit calls but requires the agent to have strategies for what stays in core versus what gets archived. Recall and archival are much larger and rarely fill.
Which is best for customer support bots?
Mem0 is the fastest to deploy—hours to integration. Letta is best if agents need to manage context logic or if you want Git-versioned memory for debugging. Zep is best if you need audit trails of what the agent knew when (compliance, SLA disputes). HydraDB is best if you need relational context linking customers to organizations, tickets, and interaction history into a unified graph.
Can I migrate from Mem0 to Letta later?
Partially. Mem0 stores facts; Letta stores structured memory blocks. Migrating requires exporting Mem0 facts and reformatting them into core/recall/archival blocks in Letta. There's no automated path, but the two formats are similar enough that a one-time migration script is feasible.
Conclusion
Three frameworks, three philosophies. Mem0 is a memory service layer bolted onto your existing agent. Letta is a complete runtime where agents self-manage memory. Zep is a temporal knowledge graph that tracks how facts evolve. HydraDB is serverless relational context infrastructure for multi-tenant B2B.
None is objectively better. Each excels in different scenarios.
Start by asking: Do I have an agent already, or am I building from scratch? Do I prioritize speed to production or control over memory logic? Does temporal reasoning matter for my use case?
Answer those questions first. The right tool will follow.