Skip to main content
Persistent Memory Infrastructure for AI Agents

Agent Memory API

Store, recall, and forget. Multi-tier memory with semantic search, GraphRAG, and automatic consolidation. One API for AI agents that remember.

npx zerodb-cli init

Used by developers from leading tech companies and universities

100%
Recall@1 (LongMemEval)
<400ms
Retrieval Latency
3 Tiers
Working + Episodic + Semantic

What Is an Agent Memory API?

AI agents lose context when conversations end. They can't remember what you told them yesterday, what your preferences are, or what they learned from previous tasks.

An agent memory API solves this by giving agents persistent storage for facts, preferences, and relationships — with semantic search to recall the right information at the right time.

ZeroMemory goes further with multi-tier memory (working, episodic, semantic), automatic consolidation between tiers, importance-weighted decay, and GraphRAG — hybrid search that combines vector similarity with knowledge graph traversal for multi-hop reasoning.

Core API Primitives

Three operations. That's all your agent needs to remember everything.

Remember

POST /memory/v2/remember

Store a fact, observation, or interaction. Auto-generates embeddings (free), extracts entities, assigns importance scores, and builds knowledge graph edges.

Recall

POST /memory/v2/recall

Search memories by meaning. Blended scoring combines vector similarity, importance weight, and recency. Filter by user, session, or metadata.

Forget

POST /memory/v2/forget

Delete memories by ID, user, session, or time range. GDPR-friendly — remove all memories for a user with one call.

Plus: /reflect (agent self-reflection), /profile (user profiles from memories), /relate (entity relationships), /graph/* (16 GraphRAG endpoints)

Why AINative for Agent Memory

Not just store-and-search. A complete cognitive memory system built for production agents.

Multi-Tier Memory

Working memory for active tasks, episodic memory for past interactions, semantic memory for long-term knowledge. Automatic consolidation between tiers.

Semantic Recall

Search by meaning, not keywords. Free embeddings included — no OpenAI key required. Blended scoring: similarity + importance + recency.

GraphRAG Hybrid Search

Combines vector search with multi-hop knowledge graph traversal. Finds connections that flat search misses — people, orgs, concepts, and their relationships.

Auto Consolidation & Decay

Memories strengthen with access, decay with time. Working memory consolidates into long-term storage. Importance scores adapt based on usage patterns.

Entity Graphs & Profiles

Auto-extracts entities and relationships from stored memories. Builds user profiles, agent reflections, and knowledge graphs — zero configuration.

MCP + Framework Native

18-tool MCP server for Claude Code, Cursor, and VS Code. LangChain and LlamaIndex integrations. REST API for any stack.

Connect Your Data Sources

Ingest memory from the tools your team already uses. ZeroMemory connectors pull context automatically — so agents know what happened without manual ingestion.

✉️GmailEmail threads & action items
💬SlackChannel messages & decisions
📄NotionDocs, wikis & meeting notes
🐙GitHubPRs, issues & code context
📁Google DriveFiles & shared documents
Context Synthesis

From raw data to agent-ready context

ZeroMemory doesn't just store — it synthesizes. Connector ingestion automatically extracts entities, builds relationships, and scores importance so your agents always have the most relevant context at recall time.

  • Entity extraction from unstructured text
  • Relationship graph auto-population
  • Importance scoring based on recency + access patterns
  • Cross-source deduplication and merging

# Synthesized context output

{
  "entities": [
    { "name": "Alice", "type": "person",
      "role": "engineer" },
    { "name": "ZeroDB", "type": "product" }
  ],
  "relationships": [
    { "from": "Alice", "to": "ZeroDB",
      "rel": "works_on" }
  ],
  "importance": 0.87,
  "sources": ["slack", "github"]
}

Add Memory in Minutes

REST API, Python SDK, or MCP server — pick your integration path. Free embeddings, no infrastructure to manage.

Quick Setup

npx zerodb-cli init

MCP Server (Memory Tools)

npm i ainative-zerodb-memory-mcp

Python SDK

pip install langchain-zerodb
  • Free embeddings — BAAI/bge models, no OpenAI costs
  • Multi-tier memory with automatic consolidation
  • GraphRAG: vector + knowledge graph hybrid search
  • MCP server for Claude Code, Cursor, VS Code
  • LangChain + LlamaIndex integrations
# Store a memory
curl -X POST https://api.ainative.studio/api/v1/public/memory/v2/remember \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{
    "content": "User prefers dark mode and uses Python for backend work",
    "metadata": { "user_id": "u_123", "source": "onboarding" }
  }'

# Response:
{
  "memory_id": "mem_abc...",
  "importance": 0.72,
  "entities_extracted": ["dark mode", "Python"],
  "tier": "working"
}

Use Cases

From chatbots to autonomous research agents — persistent memory changes what agents can do.

Copilots & Chat Assistants

Remember user preferences, past conversations, and context across sessions. No more "As an AI, I don't have memory of previous conversations."

Example

A support chatbot recalls a customer's past tickets, product purchases, and preferred resolution method — before the customer says a word.

Autonomous Agents

Long-running agents that accumulate knowledge over days and weeks. Working memory for the current task, episodic memory for what happened, semantic memory for what they've learned.

Example

A research agent builds a knowledge graph of papers, authors, and findings across hundreds of sessions — and uses GraphRAG to discover connections.

Multi-Agent Systems

Shared memory across agent swarms. One agent stores a finding, another agent recalls it. User-scoped and project-scoped memory isolation built in.

Example

A coding agent stores architecture decisions. A review agent recalls them when evaluating PRs. A docs agent uses them to generate documentation.

RAG Pipelines

Go beyond document retrieval. Combine vector search with entity relationships for answers that require multi-hop reasoning across your knowledge base.

Example

Query: "What pricing does Company X use?" GraphRAG traverses Company X → negotiates_with → Competitor Y → uses → pricing strategy Z.

Building from Scratch vs. Using ZeroMemory

Compare agent memory approaches side by side.

FeatureZeroMemoryMem0LettaBuild Custom
Memory store & recall
Multi-tier memory (working/episodic/semantic)
Automatic consolidation & decay
Namespace isolation (global/session/project)
Semantic search with free embeddings
GraphRAG (vector + graph hybrid)
Knowledge graph auto-population
Auto-context injection
Context synthesis (LLM)
Write-back actions (Slack, Gmail, etc.)
Decision traces & audit trail
Entity timeline & graph endpoints
MCP server (18 agent tools)
Entity extraction & profiles
Ontology templates
No infrastructure to manage

Namespace Isolation

Scope memories to global, session, or project contexts. Cross-namespace recall only when you ask for it.

Global Scope

global

Organization-wide knowledge. Shared across all agents, sessions, and projects. Ideal for company facts, policies, and permanent preferences.

Session Scope

session:<id>

Working context for one conversation. Auto-cleaned when the session ends. Perfect for task-specific state that shouldn't leak across chats.

Project Scope

project:<id>

Shared within a project boundary. All agents in the same project see these memories. Great for codebase knowledge and team decisions.

Cross-Namespace Recall

POST /memory/v2/recall
{
  "query": "deployment procedures",
  "namespace": "project:backend",
  "allow_cross_namespace": true  // also searches global
}

Consolidation Engine

Two automated processes turn raw memories into structured knowledge — no manual intervention required.

Clustering

Daily at 03:30 UTC

Groups related episodic memories by semantic similarity. When a cluster crosses the threshold (5+ memories, similarity > 0.78), it merges into a single semantic memory.

episodic cluster → semantic merge → graph edge update

Reflection

Weekly — Sunday 04:45 UTC

LLM-powered synthesis extracts patterns, contradictions, and insights across all memories. Generates new semantic memories from cross-cutting themes.

all memories → LLM reflection → insights + corrections

Memory Decay by Type

Each memory tier decays at a different rate. Accessing a memory resets its clock — frequently-used memories never expire.

Memory TypeDecay RateHalf-LifeArchive After
Working0.5~1.4 days~6 days
Episodic0.1~7 days~30 days
Semantic0.02~35 days~150 days

Formula: effective_importance = importance × exp(-decay_rate × days_since_access)

Memories drop below 0.05 effective importance → soft-deleted (archived, excluded from recall, still available for consolidation).

Auto-Context Injection

Relevant memories appear in your agent's context window automatically — no explicit recall calls needed.

1

User sends message

The latest user message is embedded in real-time

2

Relevant memories found

Semantic search surfaces memories above your relevance threshold

3

Context injected

Top memories are prepended to the agent prompt before LLM call

4

Decision trace recorded

Optionally logs which memories were used and why

Configure Auto-Context (MCP)

zerodb_configure_auto_context({
  "enabled": true,
  "max_memories": 8,
  "min_relevance": 0.7,
  "auto_trace": true
})

// Result: every agent turn now includes
// up to 8 relevant memories automatically

Write-Back Actions

Agents don't just remember — they act. Memory-triggered actions push insights back to your tools.

💬SlackSend message
✉️GmailReply to thread
📅CalendarCreate event
🐙GitHubCreate issue
📄NotionCreate page

Example: Memory-triggered Slack notification

// Agent consolidation finds a pattern:
// "User has 3 overdue tasks from this week"

zerodb_slack_send({
  "channel": "#eng-alerts",
  "message": "Reminder: 3 overdue tasks detected for @alice"
})

Enterprise Security

Defense-in-depth at every layer. Built for SOC 2 compliance from day one.

Namespace Isolation

Memories in one namespace are invisible to others. User-scoped and project-scoped boundaries enforced at the query layer.

Input Validation

All inputs validated via Pydantic at the API boundary. Content length, namespace format, importance range — all enforced.

SSRF Prevention

Webhook URLs are blocked from resolving to private IP ranges (RFC 1918, loopback, IPv6 ULA).

Injection Prevention

Subprocess execution uses exec (not shell). HTML sanitization strips script/style/iframe before storage.

Graph Integrity

Phantom edge prevention, deduplication, and confidence bounds. Both entities must exist before an edge can be created.

Embedding Isolation

Each user's vectors stored in user-scoped partitions. Consolidation only processes authenticated user's memories.

Context Synthesis

Go beyond raw memory recall. An LLM synthesizes a natural-language context summary from your memories — ready to inject into any agent prompt.

Request

POST /memory/v2/synthesize
{
  "query": "What do we know about this user?",
  "max_sources": 10
}

Response

{
  "synthesis": "Senior backend engineer who prefers
    Python/FastAPI, uses dark mode, works in
    Pacific timezone, focuses on API performance.",
  "sources": ["mem_abc", "mem_def", "mem_ghi"],
  "confidence": 0.89
}

Unlike raw recall, synthesis produces a coherent narrative — perfect for system prompts, onboarding summaries, and agent briefings.

Decision Traces

Track exactly why your agent made each decision. Decision traces log the memories recalled, alternatives considered, and confidence scores — making agent behavior auditable and debuggable.

Trace Logging

  • Which memories were recalled
  • Confidence scores for each
  • Alternatives that were ranked but not used
  • Timestamps and latency per step

Skill Candidates

When a decision trace reveals repeated patterns, ZeroMemory surfaces them as skill candidates — reusable knowledge the agent can apply in future interactions.

pattern detected → skill candidate → agent refines → permanent skill

Entity Timeline & Knowledge Graph

Every entity has a timeline. Every relationship has a history. Query the evolution of knowledge over time.

Entity Timeline

Track how knowledge about an entity evolves. See when facts were learned, updated, or contradicted — with links to source memories.

Mar 1
Alice joins the backend team
Mar 15
Alice takes ownership of auth service
Apr 2
Alice prefers async patterns over sync
Apr 20
Alice promoted to tech lead

Graph Endpoints

/graph/entitiesList all entities (people, orgs, concepts)
/graph/relationshipsQuery edges between entities
/graph/neighborsGet connected entities (1-hop or multi-hop)
/graph/timelineChronological history of an entity
/graph/graphragHybrid vector + graph retrieval
/graph/communitiesDetect clusters of related entities

Frequently Asked Questions

What is an agent memory API?

An agent memory API lets AI agents store, search, and retrieve information across sessions. Instead of losing context when a conversation ends, agents persist facts, preferences, and relationships — and recall them later using semantic search. ZeroMemory provides this with multi-tier memory, automatic consolidation, and GraphRAG hybrid retrieval.

How do AI agents store memory?

Agents call POST /remember with text content and optional metadata. ZeroMemory auto-generates embeddings (free), extracts entities and relationships, assigns importance scores, and stores everything in Postgres with pgvector indexes. No separate embedding API or graph database needed.

What database should I use for agent memory?

Use a purpose-built memory API like ZeroMemory rather than raw vector databases. Memory APIs handle embedding, scoring, consolidation, and retrieval in one call. ZeroDB is Postgres-native, so you get relational data, vector search, and knowledge graphs without managing separate infrastructure.

How does GraphRAG improve memory recall?

Standard vector search finds semantically similar memories. GraphRAG adds a second stage — it traverses entity relationships in a knowledge graph to surface structurally connected information. For example, querying about a person finds their team, projects, tools, and collaborators through multi-hop graph traversal, even if those memories don't share similar text.

Is there a free tier?

Yes. ZeroDB Free includes 500K vectors, 2GB storage, full memory API access, and free embeddings. No credit card required. Get started with npx zerodb-cli init.

Does it work with Claude, GPT, and open-source models?

Yes. ZeroMemory is LLM-agnostic. Use the REST API from any stack, the MCP server with Claude Code or Cursor, or the Python SDK with LangChain and LlamaIndex. The memory layer is independent of which model generates or consumes the memories.

Give Your Agents Persistent Memory

One API for memory, semantic search, and GraphRAG. Start free — no credit card, no signup wall.