vllm-mlx
🧠A vLLM-style inference server for Apple Silicon with a native MLX backend, exposing both OpenAI and Anthropic compatible APIs in a single process, featuring multimodal unified serving, continuous batching, paged KV cache, and SSD-tiered caching.
MultimodalLarge Language ModelsPython
Rapid-MLX
✨A local AI inference engine for Apple Silicon with OpenAI-compatible API, supporting multi-modal, tool calling, and smart cloud routing.
AI AgentsLarge Language ModelsModel Context Protocol
AutoRound
✨An advanced post-training quantization toolkit for LLMs and VLMs by Intel, leveraging SignRound optimization to support 2–4 bit weight quantization and automatic mixed-precision scheme generation across Intel CPU/GPU, NVIDIA GPU, and Habana Gaudi.
MultimodalLarge Language ModelsTransformers
vLLM-Omni
🧠A fully disaggregated multimodal model inference and serving framework that extends vLLM to support any-to-any modality unified inference and high-performance deployment.
Deep LearningMultimodalFastAPI
mlx-openai-server
✨A high-performance OpenAI-compatible API server for MLX models on Apple Silicon, supporting text, vision, audio transcription, and image generation/editing.
Deep LearningLarge Language ModelsMultimodal
RLinf
✨Flexible and scalable reinforcement learning training infrastructure for embodied and agentic AI post-training, decoupling logical workflow composition from efficient physical execution via the M2Flow paradigm.
MultimodalAI AgentsReinforcement Learning
Roboflow Trackers
✨A plug-and-play multi-object tracking (MOT) Python library offering modular implementations of classic algorithms like SORT and ByteTrack. Features a detector-agnostic design compatible with any object detection model (YOLO, DETR, etc.), supporting video files, cameras, RTSP streams, and more. Provides unified CLI tools and Python API with built-in evaluation metrics (CLEAR, HOTA, Identity).
MultimodalDeep LearningSDK
MiniCPM-o
✨An end-to-side omnimodal LLM by Tsinghua THUNLP supporting vision, speech, and full-duplex multimodal live streaming, optimized for mobile deployment with performance rivaling Gemini 2.5 Flash.
Large Language ModelsMultimodalTransformers
Vision-Agents
✨An open-source framework by Stream for building vision AI agents that work with any model or video provider, leveraging Stream's edge network for ultra-low latency video experiences.
Agent & ToolingPythonPyTorch
my-neuro
✨A customizable AI desktop companion project with character settings, voice conversations, long-term memory capabilities, and sub-1-second response times. Integrates with Live2D models for visual presentation.
Agent & ToolingPythonElectron