An AI Agent platform that automates browser workflows using Vision LLMs, extending Playwright with natural language commands for web automation, workflow orchestration, and structured data extraction.
Skyvern is an AI Agent platform that automates browser workflows using Vision LLMs and computer vision. Its core innovation lies in eliminating the dependency on traditional XPath/CSS selectors, instead leveraging Vision LLMs to understand page visual elements and plan actions, making it inherently resilient to web layout changes.
As an AI extension of Playwright, Skyvern provides natural language APIs including page.act(), page.extract(), page.validate(), and page.prompt(). All standard Playwright operations support an optional prompt parameter for AI-assisted element targeting, with a "traditional selector → AI fallback" dual mode.
For orchestration, Task serves as the basic execution unit (URL + prompt + optional schema), while Workflow chains multiple Tasks and Blocks (For loops, file parsing, email sending, HTTP requests, custom code, etc.) into complete automation pipelines. page.agent further encapsulates advanced capabilities like login and file download, with Bitwarden and 1Password credential management integration.
Skyvern supports both local deployment (pip / Docker Compose / Kubernetes) and Skyvern Cloud hosting. The Cloud version includes built-in anti-bot detection, proxy networks, and CAPTCHA solvers with parallel multi-instance support. The local version defaults to SQLite since v1.0.31, requiring zero configuration to start. The backend is Python-based (supporting multiple LLM providers and LiteLLM/Ollama local models), the frontend is a React application, and both TypeScript (@skyvern/client) and Python SDKs are provided.
Typical use cases include: batch invoice downloads, automated form filling (government/insurance/job applications), structured data extraction from websites without APIs, e-commerce procurement automation, and replacing fragile traditional RPA solutions. The project reports 64.4% accuracy on the WebBench benchmark. It also supports MCP protocol integration and connections to Zapier, Make, n8n, and Workato.
Adopts an Agent Swarm multi-agent architecture (Planner-Agent-Validator), with design roots in the BabyAGI / AutoGPT Task-Driven autonomous Agent paradigm, extended with browser interaction capabilities. Released under AGPL-3.0, maintained by the Skyvern-AI organization.