vllm-mlx
🧠A vLLM-style inference server for Apple Silicon with a native MLX backend, exposing both OpenAI and Anthropic compatible APIs in a single process, featuring multimodal unified serving, continuous batching, paged KV cache, and SSD-tiered caching.
MultimodalLarge Language ModelsPython
PromptHub
✨An open-source, free all-in-one workspace for AI Prompt and Skill management, featuring versioned Prompt editing, multi-platform Skill distribution, multi-model parallel testing, and local-first data synchronization.
Model & Inference FrameworkLarge Language ModelsElectron
UncommonRoute
✨A local proxy that automatically routes each LLM request to the cheapest still-capable model
Model & Inference FrameworkAI AgentsLarge Language Models
Hyperspace AGI
✨The first experimental fully peer-to-peer distributed AGI system where intelligence compounds continuously through autonomous agent networks, supporting decentralized training across heterogeneous devices, P2P inference routing, and a built-in blockchain micropayment economy.
Model & Inference FrameworkMulti-Agent SystemAI Agents
Rapid-MLX
✨A local AI inference engine for Apple Silicon with OpenAI-compatible API, supporting multi-modal, tool calling, and smart cloud routing.
AI AgentsLarge Language ModelsModel Context Protocol
verl
🧠A flexible, efficient, and production-ready post-training reinforcement learning framework for LLMs
OtherDeep LearningMultimodal
BullshitBench
✨A benchmark measuring whether AI models challenge nonsensical prompts rather than confidently answering them, featuring 100 questions across 5 domains with a 3-tier judgment system and multi-judge panel.
Model & Inference FrameworkNatural Language ProcessingLarge Language Models
AutoRound
✨An advanced post-training quantization toolkit for LLMs and VLMs by Intel, leveraging SignRound optimization to support 2–4 bit weight quantization and automatic mixed-precision scheme generation across Intel CPU/GPU, NVIDIA GPU, and Habana Gaudi.
MultimodalLarge Language ModelsTransformers
vLLM-Omni
🧠A fully disaggregated multimodal model inference and serving framework that extends vLLM to support any-to-any modality unified inference and high-performance deployment.
Deep LearningMultimodalFastAPI
Harbor
🧠A Docker Compose-based CLI orchestrator for local LLM stacks — spin up pre-wired inference backends, frontend UIs, RAG, voice, image generation, and more with a single command
Model & Inference FrameworkMultimodalLarge Language Models