A persistent, shared memory backend for AI Agent pipelines, providing REST API, MCP protocol, and knowledge graph with hybrid search, autonomous memory consolidation, and multi-agent collaboration — fully self-hosted with zero cloud cost.
Core Capabilities#
- Persistent Semantic Memory: Cross-session persistent AI context storage with ~5ms semantic search retrieval
- Hybrid Search: BM25 + vector semantic search
- Local Embeddings: ONNX Runtime + sentence-transformers (MiniLM-L6-v2), data stays on-premises
- Autonomous Consolidation: Auto-compress old memories with decay-based lifecycle management
- Knowledge Graph: Typed edges (causes, fixes, contradicts, etc.) for sharing causal chains
- SHODH Compatibility: Compliant with SHODH Unified Memory API Specification v1.0.0 (emotion metadata, episodic memory, source tracking)
Protocols & Interfaces#
- MCP Protocol: Native Model Context Protocol + Remote MCP (Streamable HTTP) for claude.ai browser
- REST API: 15 endpoints, callable by any HTTP client without MCP libraries
- SSE Events: Real-time notifications for memory store/delete events
Multi-Agent Collaboration & Security#
X-Agent-IDheader auto-tags memory source with scoped retrieval per agent- Tag system as inter-agent communication bus (e.g.,
msg:cluster) - OAuth 2.0 + DCR (Dynamic Client Registration) for enterprise-grade auth
Data Import & Visualization#
- PDF auto-chunking and vectorized import (pypdf)
- Built-in Web Dashboard: semantic search, tag browser, document import, analytics, quality scoring, API docs
Storage Backends#
- Default: SQLite-vec (recommended, zero-config)
- Optional: Milvus / Milvus Lite / Zilliz Cloud
Use Cases#
- AI coding assistant persistent memory (Claude Desktop / Cursor / VS Code)
- Multi-agent pipeline shared state (LangGraph / CrewAI / AutoGen)
- Inter-agent communication bus
- Self-hosted enterprise deployment (Docker + Cloudflare Tunnel + OAuth 2.0)
- Zero-cloud-cost RAG and document knowledge base
Client Compatibility#
Claude Desktop, Claude Code, claude.ai (browser), VS Code, Cursor, Windsurf, ChatGPT (Developer Mode), Gemini CLI, OpenCode, Goose, Aider, GitHub Copilot CLI — 14+ clients.
Quick Start#
pip install mcp-memory-service
MCP_ALLOW_ANONYMOUS_ACCESS=true memory server --http
# Runs at http://localhost:8000
Optional installs:
pip install mcp-memory-service[sqlite]— SQLite-vec + ONNX (recommended)pip install mcp-memory-service[milvus]— Milvus backendpip install mcp-memory-service[full]— All dependencies
Architecture Highlights#
- Web layer: FastAPI + Uvicorn + SSE-Starlette
- MCP layer:
mcp>=1.8.0,<2.0.0SDK - Storage layer: SQLite-vec (aiosqlite) or Milvus, hybrid BM25 + vector retrieval
- Embedding layer: sentence-transformers MiniLM-L6-v2 + ONNX Runtime
- Memory lifecycle: Consolidation module + APScheduler
- Service discovery: zeroconf (mDNS / Bonjour)
- Auth: Authlib + PyJWT + cryptography
Unverified Information#
- No standalone documentation site URL found in README
- SHODH specification original link not provided
- ChatGPT Developer Mode MCP support is a third-party feature, availability unverified
- LongMemEval benchmark data not independently verified
Version 10.48.0, by Heinrich Krupp, Apache-2.0 license, 2,551+ commits as of July 2025, Production/Stable maturity.