macOS GUI automation tool powered by AI vision — captures screenshots, detects UI elements via multi-provider LLMs, and executes clicks/types/scrolls through natural language or scripted workflows
Peekaboo is an AI-powered GUI automation tool for macOS, developed by the openclaw organization. It captures pixel-level screenshots via ScreenCaptureKit (full screen, window, menu bar, with Retina 2x support, 20–100 ms latency), sends them to multi-provider AI vision models (GPT-5.1, Claude 4.x, Grok 4-fast, Gemini 2.5, local Ollama) for element detection and semantic annotation, outputs structured JSON snapshots (snapshot_id + UI element list), and executes a full set of GUI operations (click, type, scroll, swipe, drag, hotkey, etc.) via the macOS Accessibility API.
Core Capabilities
- Natural Language Agent:
peekaboo agentcommand supports natural language tasks with automatic see → decide → do loop for multi-step GUI automation - Scripted Execution:
peekaboo runexecutes.peekaboo.jsonautomation scripts, CI/CD integrable - MCP Server Mode: Exposes full toolset via MCP protocol through
@steipete/peekaboonpm package, seamless integration with Claude Desktop, Cursor, and other AI coding tools - Structured Menu Discovery: Lists app menus and system menu bar as structured JSON without clicking
- Multi-Monitor Support: v3 adds cross-screen automation capability
- Snapshot Caching: Element detection results cached per snapshot to avoid redundant AI calls
- Visualization Feedback Layer: PeekabooVisualizer independent module provides non-blocking animation feedback via NSDistributedNotificationCenter
- Shell Completions:
peekaboo completionsgenerates zsh/bash/fish completion scripts
Architecture Follows a service locator + dependency injection pattern with these modules:
- PeekabooAutomation: Core automation directly calling macOS Accessibility & ScreenCaptureKit APIs — config, screenshots, app/menu/window services, snapshot management, typed models
- Tachikoma: AI model management (dependency injection), abstracting OpenAI/Anthropic/Grok/Ollama providers
- PeekabooAgentRuntime: MCP tool registration, ToolRegistry, Agent service orchestration
- PeekabooVisualizer: Independent visualization feedback layer (VisualizationClient, event store)
- PeekabooCore: Umbrella module with exports + PeekabooServices convenience container
All UI operations bound to MainActor per macOS Accessibility API requirements, with layered error handling (service → orchestration → agent → client).
Submodules (git submodules): AXorcist (Accessibility wrapper), Commander (CLI framework), Swiftdansi (ANSI terminal output), Tachikoma (AI model abstraction), TauTUI (TUI components)
Installation & Usage
- Homebrew:
brew install steipete/tap/peekaboo - MCP Server:
npx -y @steipete/peekaboo(Node 22+ required) - Prerequisites: macOS 15+ (Sequoia), Screen Recording + Accessibility permissions, at least one AI provider credential or local Ollama
- Config priority: CLI args > env vars > credentials file > config file > defaults
Boundaries: macOS only (unofficial Windows rewrite PeekabooWin exists); vision understanding relies on external AI models, no built-in inference; requires Screen Recording + Accessibility system permissions. MIT licensed.