vllm-mlx
🧠A vLLM-style inference server for Apple Silicon with a native MLX backend, exposing both OpenAI and Anthropic compatible APIs in a single process, featuring multimodal unified serving, continuous batching, paged KV cache, and SSD-tiered caching.
PythonLarge Language ModelsMultimodal
Rapid-MLX
✨A local AI inference engine for Apple Silicon with OpenAI-compatible API, supporting multi-modal, tool calling, and smart cloud routing.
PythonLarge Language ModelsModel Context Protocol
mlx-openai-server
✨A high-performance OpenAI-compatible API server for MLX models on Apple Silicon, supporting text, vision, audio transcription, and image generation/editing.
PythonPyTorchLarge Language Models