vllm-mlx
🧠A vLLM-style inference server for Apple Silicon with a native MLX backend, exposing both OpenAI and Anthropic compatible APIs in a single process, featuring multimodal unified serving, continuous batching, paged KV cache, and SSD-tiered caching.
MultimodalLarge Language ModelsPython
Rapid-MLX
✨A local AI inference engine for Apple Silicon with OpenAI-compatible API, supporting multi-modal, tool calling, and smart cloud routing.
AI AgentsLarge Language ModelsModel Context Protocol
vLLM-Omni
🧠A fully disaggregated multimodal model inference and serving framework that extends vLLM to support any-to-any modality unified inference and high-performance deployment.
Deep LearningMultimodalFastAPI
mlx-openai-server
✨A high-performance OpenAI-compatible API server for MLX models on Apple Silicon, supporting text, vision, audio transcription, and image generation/editing.
Deep LearningLarge Language ModelsMultimodal
DeepChat
✨A powerful open-source AI agent platform that unifies models, tools, and agents with multi-LLM chat capabilities, MCP tool calling, and ACP agent integration.
Agent & ToolingElectronPython