An open-source Multi-Agent AI System for Platform Engineering teams, providing persona-driven specialized agent collaboration for automated platform operations including incident management, continuous deployment, and workflow orchestration.
CAIPE (pronounced "cape") is an open-source Multi-Agent System from the CNOE (Cloud Native Operational Excellence, CNCF ecosystem) community, designed for Platform Engineering, SRE, and DevOps teams. The system adopts a Supervisor orchestration architecture where a central Supervisor dispatches specialized sub-agents for ArgoCD, PagerDuty, GitHub, Jira, Kubernetes, Slack, and more, enabling cross-system automated operations via A2A or MCP protocols.
CAIPE offers two deployment modes: Multi-Node (production, A2A remote communication) and Single-Node (development, MCP stdio in-process communication), with flexible hybrid support via the DISTRIBUTED_AGENTS variable. The persona-driven design provides predefined roles like "Platform Engineer" and "Incident Engineer" with curated prompt libraries, configurable through YAML declarations and policy.lp policy files for agent behavior constraints.
For knowledge management, CAIPE supports basic RAG (Milvus + Redis) and Graph RAG (Neo4j + knowledge graph) modes, with API-driven automatic content ingestion. Security features include inter-agent secure communication, API RBAC, K8s Pod Security Standards compliance, and Vault secret management. Observability is built-in with Langfuse tracing and Prometheus metrics, while LLM state persistence supports Redis / Postgres / MongoDB backends. The frontend is built on Next.js 16 + React 19, with A2A protocol programming interfaces and Backstage portal integration.
The backend core is Python 3.13 + FastAPI + LangGraph + LangChain, the frontend uses Next.js 16 + React 19 + Zustand, and infrastructure supports both Docker Compose and Kubernetes Helm deployment with uv for Python package management. The project is at version 0.4.8 with 62 releases, with a language breakdown of approximately Python 44.6%, TypeScript 41.6%, and Go 10.0%. Licensed under Apache-2.0.
Typical Use Cases#
- Incident Management: Acknowledge PagerDuty incidents, query on-call schedules
- Continuous Deployment: Sync ArgoCD apps to latest commit, check application status
- Version Control: Create GitHub repositories, merge Pull Requests
- Project Management: Create Jira tickets, assign tasks
- Team Communication: Send Slack messages, create channels
Quick Start#
git clone https://github.com/cnoe-io/ai-platform-engineering.git
cd ai-platform-engineering
cp .env.example .env
docker compose --profile caipe-ui up
After startup, Web UI is available at http://localhost:3001, API at port 8000. Optional profiles: tracing (--profile tracing) or RAG knowledge base (--profile rag).