Peekaboo

macOS GUI automation tool powered by AI vision — captures screenshots, detects UI elements via multi-provider LLMs, and executes clicks/types/scrolls through natural language or scripted workflows

Peekaboo is an AI-powered GUI automation tool for macOS, developed by the openclaw organization. It captures pixel-level screenshots via ScreenCaptureKit (full screen, window, menu bar, with Retina 2x support, 20–100 ms latency), sends them to multi-provider AI vision models (GPT-5.1, Claude 4.x, Grok 4-fast, Gemini 2.5, local Ollama) for element detection and semantic annotation, outputs structured JSON snapshots (snapshot_id + UI element list), and executes a full set of GUI operations (click, type, scroll, swipe, drag, hotkey, etc.) via the macOS Accessibility API.

Core Capabilities

Natural Language Agent: peekaboo agent command supports natural language tasks with automatic see → decide → do loop for multi-step GUI automation
Scripted Execution: peekaboo run executes .peekaboo.json automation scripts, CI/CD integrable
MCP Server Mode: Exposes full toolset via MCP protocol through @steipete/peekaboo npm package, seamless integration with Claude Desktop, Cursor, and other AI coding tools
Structured Menu Discovery: Lists app menus and system menu bar as structured JSON without clicking
Multi-Monitor Support: v3 adds cross-screen automation capability
Snapshot Caching: Element detection results cached per snapshot to avoid redundant AI calls
Visualization Feedback Layer: PeekabooVisualizer independent module provides non-blocking animation feedback via NSDistributedNotificationCenter
Shell Completions: peekaboo completions generates zsh/bash/fish completion scripts

Architecture Follows a service locator + dependency injection pattern with these modules:

PeekabooAutomation: Core automation directly calling macOS Accessibility & ScreenCaptureKit APIs — config, screenshots, app/menu/window services, snapshot management, typed models
Tachikoma: AI model management (dependency injection), abstracting OpenAI/Anthropic/Grok/Ollama providers
PeekabooAgentRuntime: MCP tool registration, ToolRegistry, Agent service orchestration
PeekabooVisualizer: Independent visualization feedback layer (VisualizationClient, event store)
PeekabooCore: Umbrella module with exports + PeekabooServices convenience container

All UI operations bound to MainActor per macOS Accessibility API requirements, with layered error handling (service → orchestration → agent → client).

Submodules (git submodules): AXorcist (Accessibility wrapper), Commander (CLI framework), Swiftdansi (ANSI terminal output), Tachikoma (AI model abstraction), TauTUI (TUI components)

Installation & Usage

Homebrew: brew install steipete/tap/peekaboo
MCP Server: npx -y @steipete/peekaboo (Node 22+ required)
Prerequisites: macOS 15+ (Sequoia), Screen Recording + Accessibility permissions, at least one AI provider credential or local Ollama
Config priority: CLI args > env vars > credentials file > config file > defaults

Boundaries: macOS only (unofficial Windows rewrite PeekabooWin exists); vision understanding relies on external AI models, no built-in inference; requires Screen Recording + Accessibility system permissions. MIT licensed.

Related Projects

Genkit

Gobii Platform

Semble

STAY UPDATED