DISCOVER THE FUTURE OF AI AGENTS

RCLI

Added Apr 23, 2026
Agent & Tooling
Open Source
PythonDesktop AppsPyTorchLarge Language ModelsMultimodalTransformersRAGAI AgentsCLIAgent & ToolingModel & Inference FrameworkAutomation, Workflow & RPAKnowledge Management, Retrieval & RAGComputer Vision & Multimodal

A fully on-device voice AI assistant for macOS Apple Silicon, integrating STT, LLM, TTS, VLM, RAG, and system control with zero cloud dependency.

RCLI is an on-device voice AI assistant developed by RunAnywhere, Inc. (backed by Y Combinator), designed exclusively for macOS Apple Silicon. The project chains voice activity detection (Silero VAD), streaming and offline speech recognition (Zipformer / Whisper / Parakeet), LLM inference (Qwen3 / LFM2), text-to-speech synthesis (Piper / Kokoro and others), vision-language models (Qwen3 VL / SmolVLM), local RAG document Q&A (hybrid vector + BM25 retrieval, ~4ms latency), and macOS system control (40 predefined actions) into an end-to-end pipeline with sub-200ms latency. All inference runs locally with zero cloud dependency and zero API keys.

The core differentiator is the proprietary MetalRT GPU inference engine—built on Metal 3.1 with hand-written kernels (qmv.metal, attention_decode.metal, rope.metal, swiglu.metal, kv_cache.metal, etc.) deeply optimized for Apple Silicon, achieving 550+ tok/s LLM decode speed (668 tok/s claimed on M4 Max) and 714× real-time STT inference. MetalRT requires M3+ chips; M1/M2 devices automatically fall back to llama.cpp. TTS uses a double-buffered sentence-level pipeline where the next sentence is pre-rendered during playback of the current one, eliminating inter-sentence gaps.

VLM capabilities support image file analysis, real-time camera analysis, and screen region screenshot analysis, with Qwen3 VL 2B, Liquid LFM2 VL 1.6B, and SmolVLM 500M adapted. VLM currently runs on the llama.cpp engine; MetalRT VLM is not yet released. macOS system control is bridged via AppleScript / Shell commands, covering productivity, communication, media, system, and web categories.

The project is written primarily in C++ (91.3%) with CMake build system; all dependencies are vendored or fetched via FetchContent. Supports Homebrew one-line installation or source builds (CPU-only, no MetalRT). Default model download is ~1GB (LFM2 1.2B, Whisper base.en, Piper Lessac/Amy, Silero, Snowflake); VLM models downloaded on demand. The project itself is MIT-licensed; MetalRT is under a proprietary license. Latest version: v0.3.7.

Related Projects

View All

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.