verl
🧠A flexible, efficient, and production-ready post-training reinforcement learning framework for LLMs
A flexible, efficient, and production-ready post-training reinforcement learning framework for LLMs
A fully disaggregated multimodal model inference and serving framework that extends vLLM to support any-to-any modality unified inference and high-performance deployment.
A high-performance OpenAI-compatible API server for MLX models on Apple Silicon, supporting text, vision, audio transcription, and image generation/editing.
A plug-and-play multi-object tracking (MOT) Python library offering modular implementations of classic algorithms like SORT and ByteTrack. Features a detector-agnostic design compatible with any object detection model (YOLO, DETR, etc.), supporting video files, cameras, RTSP streams, and more. Provides unified CLI tools and Python API with built-in evaluation metrics (CLEAR, HOTA, Identity).
A production-ready implementation of InvisPose that enables real-time, camera-free full-body tracking through walls using commodity WiFi mesh routers and CSI signals, with advanced analytics like fall detection and multi-person tracking.
Microsoft's family of open-source frontier voice AI models including both Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) models, designed for long-form audio processing with multilingual support.
A project focused on TTS generation models, providing an API server and Gradio-based WebUI with support for multiple voice synthesis, voice cloning, and audio enhancement capabilities.
A curated list of embodied AI research papers maintained by the Human Communication and Perception Laboratory at SYSU, providing researchers with the latest academic findings in the embodied intelligence field.
A video content discovery tool developed by Microsoft that uses deep learning technology to automatically identify and extract key content from videos, helping users efficiently browse and understand video information。
LLaVA-Plus is a multimodal assistant system that learns to use tools, combining large language models with visual capabilities to enable AI agents to perform general vision tasks.
Page 1 / 2 · 12 total
Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.