Agent Memory API
Store, recall, and forget. Multi-tier memory with semantic search, GraphRAG, and automatic consolidation. One API for AI agents that remember.
npx zerodb-cli initUsed by developers from leading tech companies and universities
What Is an Agent Memory API?
AI agents lose context when conversations end. They can't remember what you told them yesterday, what your preferences are, or what they learned from previous tasks.
An agent memory API solves this by giving agents persistent storage for facts, preferences, and relationships — with semantic search to recall the right information at the right time.
ZeroMemory goes further with multi-tier memory (working, episodic, semantic), automatic consolidation between tiers, importance-weighted decay, and GraphRAG — hybrid search that combines vector similarity with knowledge graph traversal for multi-hop reasoning.
Core API Primitives
Three operations. That's all your agent needs to remember everything.
Remember
POST /memory/v2/rememberStore a fact, observation, or interaction. Auto-generates embeddings (free), extracts entities, assigns importance scores, and builds knowledge graph edges.
Recall
POST /memory/v2/recallSearch memories by meaning. Blended scoring combines vector similarity, importance weight, and recency. Filter by user, session, or metadata.
Forget
POST /memory/v2/forgetDelete memories by ID, user, session, or time range. GDPR-friendly — remove all memories for a user with one call.
/reflect (agent self-reflection), /profile (user profiles from memories), /relate (entity relationships), /graph/* (16 GraphRAG endpoints)Why AINative for Agent Memory
Not just store-and-search. A complete cognitive memory system built for production agents.
Multi-Tier Memory
Working memory for active tasks, episodic memory for past interactions, semantic memory for long-term knowledge. Automatic consolidation between tiers.
Semantic Recall
Search by meaning, not keywords. Free embeddings included — no OpenAI key required. Blended scoring: similarity + importance + recency.
GraphRAG Hybrid Search
Combines vector search with multi-hop knowledge graph traversal. Finds connections that flat search misses — people, orgs, concepts, and their relationships.
Auto Consolidation & Decay
Memories strengthen with access, decay with time. Working memory consolidates into long-term storage. Importance scores adapt based on usage patterns.
Entity Graphs & Profiles
Auto-extracts entities and relationships from stored memories. Builds user profiles, agent reflections, and knowledge graphs — zero configuration.
MCP + Framework Native
18-tool MCP server for Claude Code, Cursor, and VS Code. LangChain and LlamaIndex integrations. REST API for any stack.
Connect Your Data Sources
Ingest memory from the tools your team already uses. ZeroMemory connectors pull context automatically — so agents know what happened without manual ingestion.
From raw data to agent-ready context
ZeroMemory doesn't just store — it synthesizes. Connector ingestion automatically extracts entities, builds relationships, and scores importance so your agents always have the most relevant context at recall time.
- Entity extraction from unstructured text
- Relationship graph auto-population
- Importance scoring based on recency + access patterns
- Cross-source deduplication and merging
# Synthesized context output
{
"entities": [
{ "name": "Alice", "type": "person",
"role": "engineer" },
{ "name": "ZeroDB", "type": "product" }
],
"relationships": [
{ "from": "Alice", "to": "ZeroDB",
"rel": "works_on" }
],
"importance": 0.87,
"sources": ["slack", "github"]
}Add Memory in Minutes
REST API, Python SDK, or MCP server — pick your integration path. Free embeddings, no infrastructure to manage.
Quick Setup
npx zerodb-cli initMCP Server (Memory Tools)
npm i ainative-zerodb-memory-mcpPython SDK
pip install langchain-zerodb- Free embeddings — BAAI/bge models, no OpenAI costs
- Multi-tier memory with automatic consolidation
- GraphRAG: vector + knowledge graph hybrid search
- MCP server for Claude Code, Cursor, VS Code
- LangChain + LlamaIndex integrations
# Store a memory
curl -X POST https://api.ainative.studio/api/v1/public/memory/v2/remember \
-H "X-API-Key: YOUR_API_KEY" \
-d '{
"content": "User prefers dark mode and uses Python for backend work",
"metadata": { "user_id": "u_123", "source": "onboarding" }
}'
# Response:
{
"memory_id": "mem_abc...",
"importance": 0.72,
"entities_extracted": ["dark mode", "Python"],
"tier": "working"
}Use Cases
From chatbots to autonomous research agents — persistent memory changes what agents can do.
Copilots & Chat Assistants
Remember user preferences, past conversations, and context across sessions. No more "As an AI, I don't have memory of previous conversations."
Example
A support chatbot recalls a customer's past tickets, product purchases, and preferred resolution method — before the customer says a word.
Autonomous Agents
Long-running agents that accumulate knowledge over days and weeks. Working memory for the current task, episodic memory for what happened, semantic memory for what they've learned.
Example
A research agent builds a knowledge graph of papers, authors, and findings across hundreds of sessions — and uses GraphRAG to discover connections.
Multi-Agent Systems
Shared memory across agent swarms. One agent stores a finding, another agent recalls it. User-scoped and project-scoped memory isolation built in.
Example
A coding agent stores architecture decisions. A review agent recalls them when evaluating PRs. A docs agent uses them to generate documentation.
RAG Pipelines
Go beyond document retrieval. Combine vector search with entity relationships for answers that require multi-hop reasoning across your knowledge base.
Example
Query: "What pricing does Company X use?" GraphRAG traverses Company X → negotiates_with → Competitor Y → uses → pricing strategy Z.
Building from Scratch vs. Using ZeroMemory
Compare agent memory approaches side by side.
| Feature | ZeroMemory | Mem0 | Letta | Build Custom |
|---|---|---|---|---|
| Memory store & recall | ||||
| Multi-tier memory (working/episodic/semantic) | — | — | — | |
| Automatic consolidation & decay | — | — | — | |
| Namespace isolation (global/session/project) | — | — | — | |
| Semantic search with free embeddings | — | — | ||
| GraphRAG (vector + graph hybrid) | — | — | — | |
| Knowledge graph auto-population | — | — | — | |
| Auto-context injection | — | — | — | |
| Context synthesis (LLM) | — | — | — | |
| Write-back actions (Slack, Gmail, etc.) | — | — | — | |
| Decision traces & audit trail | — | — | — | |
| Entity timeline & graph endpoints | — | — | — | |
| MCP server (18 agent tools) | — | — | — | |
| Entity extraction & profiles | — | — | ||
| Ontology templates | — | — | — | |
| No infrastructure to manage | — | — |
Namespace Isolation
Scope memories to global, session, or project contexts. Cross-namespace recall only when you ask for it.
Global Scope
globalOrganization-wide knowledge. Shared across all agents, sessions, and projects. Ideal for company facts, policies, and permanent preferences.
Session Scope
session:<id>Working context for one conversation. Auto-cleaned when the session ends. Perfect for task-specific state that shouldn't leak across chats.
Project Scope
project:<id>Shared within a project boundary. All agents in the same project see these memories. Great for codebase knowledge and team decisions.
Cross-Namespace Recall
POST /memory/v2/recall
{
"query": "deployment procedures",
"namespace": "project:backend",
"allow_cross_namespace": true // also searches global
}Consolidation Engine
Two automated processes turn raw memories into structured knowledge — no manual intervention required.
Clustering
Daily at 03:30 UTC
Groups related episodic memories by semantic similarity. When a cluster crosses the threshold (5+ memories, similarity > 0.78), it merges into a single semantic memory.
Reflection
Weekly — Sunday 04:45 UTC
LLM-powered synthesis extracts patterns, contradictions, and insights across all memories. Generates new semantic memories from cross-cutting themes.
Memory Decay by Type
Each memory tier decays at a different rate. Accessing a memory resets its clock — frequently-used memories never expire.
| Memory Type | Decay Rate | Half-Life | Archive After |
|---|---|---|---|
| Working | 0.5 | ~1.4 days | ~6 days |
| Episodic | 0.1 | ~7 days | ~30 days |
| Semantic | 0.02 | ~35 days | ~150 days |
Formula: effective_importance = importance × exp(-decay_rate × days_since_access)
Memories drop below 0.05 effective importance → soft-deleted (archived, excluded from recall, still available for consolidation).
Auto-Context Injection
Relevant memories appear in your agent's context window automatically — no explicit recall calls needed.
User sends message
The latest user message is embedded in real-time
Relevant memories found
Semantic search surfaces memories above your relevance threshold
Context injected
Top memories are prepended to the agent prompt before LLM call
Decision trace recorded
Optionally logs which memories were used and why
Configure Auto-Context (MCP)
zerodb_configure_auto_context({
"enabled": true,
"max_memories": 8,
"min_relevance": 0.7,
"auto_trace": true
})
// Result: every agent turn now includes
// up to 8 relevant memories automaticallyWrite-Back Actions
Agents don't just remember — they act. Memory-triggered actions push insights back to your tools.
Example: Memory-triggered Slack notification
// Agent consolidation finds a pattern:
// "User has 3 overdue tasks from this week"
zerodb_slack_send({
"channel": "#eng-alerts",
"message": "Reminder: 3 overdue tasks detected for @alice"
})Enterprise Security
Defense-in-depth at every layer. Built for SOC 2 compliance from day one.
Namespace Isolation
Memories in one namespace are invisible to others. User-scoped and project-scoped boundaries enforced at the query layer.
Input Validation
All inputs validated via Pydantic at the API boundary. Content length, namespace format, importance range — all enforced.
SSRF Prevention
Webhook URLs are blocked from resolving to private IP ranges (RFC 1918, loopback, IPv6 ULA).
Injection Prevention
Subprocess execution uses exec (not shell). HTML sanitization strips script/style/iframe before storage.
Graph Integrity
Phantom edge prevention, deduplication, and confidence bounds. Both entities must exist before an edge can be created.
Embedding Isolation
Each user's vectors stored in user-scoped partitions. Consolidation only processes authenticated user's memories.
Context Synthesis
Go beyond raw memory recall. An LLM synthesizes a natural-language context summary from your memories — ready to inject into any agent prompt.
Request
POST /memory/v2/synthesize
{
"query": "What do we know about this user?",
"max_sources": 10
}Response
{
"synthesis": "Senior backend engineer who prefers
Python/FastAPI, uses dark mode, works in
Pacific timezone, focuses on API performance.",
"sources": ["mem_abc", "mem_def", "mem_ghi"],
"confidence": 0.89
}Unlike raw recall, synthesis produces a coherent narrative — perfect for system prompts, onboarding summaries, and agent briefings.
Decision Traces
Track exactly why your agent made each decision. Decision traces log the memories recalled, alternatives considered, and confidence scores — making agent behavior auditable and debuggable.
Trace Logging
- Which memories were recalled
- Confidence scores for each
- Alternatives that were ranked but not used
- Timestamps and latency per step
Skill Candidates
When a decision trace reveals repeated patterns, ZeroMemory surfaces them as skill candidates — reusable knowledge the agent can apply in future interactions.
Entity Timeline & Knowledge Graph
Every entity has a timeline. Every relationship has a history. Query the evolution of knowledge over time.
Entity Timeline
Track how knowledge about an entity evolves. See when facts were learned, updated, or contradicted — with links to source memories.
Graph Endpoints
/graph/entitiesList all entities (people, orgs, concepts)/graph/relationshipsQuery edges between entities/graph/neighborsGet connected entities (1-hop or multi-hop)/graph/timelineChronological history of an entity/graph/graphragHybrid vector + graph retrieval/graph/communitiesDetect clusters of related entitiesFrequently Asked Questions
What is an agent memory API?
An agent memory API lets AI agents store, search, and retrieve information across sessions. Instead of losing context when a conversation ends, agents persist facts, preferences, and relationships — and recall them later using semantic search. ZeroMemory provides this with multi-tier memory, automatic consolidation, and GraphRAG hybrid retrieval.
How do AI agents store memory?
Agents call POST /remember with text content and optional metadata. ZeroMemory auto-generates embeddings (free), extracts entities and relationships, assigns importance scores, and stores everything in Postgres with pgvector indexes. No separate embedding API or graph database needed.
What database should I use for agent memory?
Use a purpose-built memory API like ZeroMemory rather than raw vector databases. Memory APIs handle embedding, scoring, consolidation, and retrieval in one call. ZeroDB is Postgres-native, so you get relational data, vector search, and knowledge graphs without managing separate infrastructure.
How does GraphRAG improve memory recall?
Standard vector search finds semantically similar memories. GraphRAG adds a second stage — it traverses entity relationships in a knowledge graph to surface structurally connected information. For example, querying about a person finds their team, projects, tools, and collaborators through multi-hop graph traversal, even if those memories don't share similar text.
Is there a free tier?
Yes. ZeroDB Free includes 500K vectors, 2GB storage, full memory API access, and free embeddings. No credit card required. Get started with npx zerodb-cli init.
Does it work with Claude, GPT, and open-source models?
Yes. ZeroMemory is LLM-agnostic. Use the REST API from any stack, the MCP server with Claude Code or Cursor, or the Python SDK with LangChain and LlamaIndex. The memory layer is independent of which model generates or consumes the memories.
Give Your Agents Persistent Memory
One API for memory, semantic search, and GraphRAG. Start free — no credit card, no signup wall.