DISCOVER THE FUTURE OF AI AGENTS

Peekaboo

Added May 8, 2026
Agent & Tooling
Open Source
PythonWorkflow AutomationDesktop AppsModel Context ProtocolMultimodalAI AgentsCLIAgent & ToolingModel & Inference FrameworkAutomation, Workflow & RPAComputer Vision & Multimodal

macOS GUI automation tool powered by AI vision — captures screenshots, detects UI elements via multi-provider LLMs, and executes clicks/types/scrolls through natural language or scripted workflows

Peekaboo is an AI-powered GUI automation tool for macOS, developed by the openclaw organization. It captures pixel-level screenshots via ScreenCaptureKit (full screen, window, menu bar, with Retina 2x support, 20–100 ms latency), sends them to multi-provider AI vision models (GPT-5.1, Claude 4.x, Grok 4-fast, Gemini 2.5, local Ollama) for element detection and semantic annotation, outputs structured JSON snapshots (snapshot_id + UI element list), and executes a full set of GUI operations (click, type, scroll, swipe, drag, hotkey, etc.) via the macOS Accessibility API.

Core Capabilities

  • Natural Language Agent: peekaboo agent command supports natural language tasks with automatic see → decide → do loop for multi-step GUI automation
  • Scripted Execution: peekaboo run executes .peekaboo.json automation scripts, CI/CD integrable
  • MCP Server Mode: Exposes full toolset via MCP protocol through @steipete/peekaboo npm package, seamless integration with Claude Desktop, Cursor, and other AI coding tools
  • Structured Menu Discovery: Lists app menus and system menu bar as structured JSON without clicking
  • Multi-Monitor Support: v3 adds cross-screen automation capability
  • Snapshot Caching: Element detection results cached per snapshot to avoid redundant AI calls
  • Visualization Feedback Layer: PeekabooVisualizer independent module provides non-blocking animation feedback via NSDistributedNotificationCenter
  • Shell Completions: peekaboo completions generates zsh/bash/fish completion scripts

Architecture Follows a service locator + dependency injection pattern with these modules:

  • PeekabooAutomation: Core automation directly calling macOS Accessibility & ScreenCaptureKit APIs — config, screenshots, app/menu/window services, snapshot management, typed models
  • Tachikoma: AI model management (dependency injection), abstracting OpenAI/Anthropic/Grok/Ollama providers
  • PeekabooAgentRuntime: MCP tool registration, ToolRegistry, Agent service orchestration
  • PeekabooVisualizer: Independent visualization feedback layer (VisualizationClient, event store)
  • PeekabooCore: Umbrella module with exports + PeekabooServices convenience container

All UI operations bound to MainActor per macOS Accessibility API requirements, with layered error handling (service → orchestration → agent → client).

Submodules (git submodules): AXorcist (Accessibility wrapper), Commander (CLI framework), Swiftdansi (ANSI terminal output), Tachikoma (AI model abstraction), TauTUI (TUI components)

Installation & Usage

  • Homebrew: brew install steipete/tap/peekaboo
  • MCP Server: npx -y @steipete/peekaboo (Node 22+ required)
  • Prerequisites: macOS 15+ (Sequoia), Screen Recording + Accessibility permissions, at least one AI provider credential or local Ollama
  • Config priority: CLI args > env vars > credentials file > config file > defaults

Boundaries: macOS only (unofficial Windows rewrite PeekabooWin exists); vision understanding relies on external AI models, no built-in inference; requires Screen Recording + Accessibility system permissions. MIT licensed.

Related Projects

View All

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.