DISCOVER THE FUTURE OF AI AGENTS

All Projects

12 projects

verl

🧠

A flexible, efficient, and production-ready post-training reinforcement learning framework for LLMs

OtherDeep LearningMultimodal

vLLM-Omni

🧠

A fully disaggregated multimodal model inference and serving framework that extends vLLM to support any-to-any modality unified inference and high-performance deployment.

Deep LearningMultimodalFastAPI

mlx-openai-server

A high-performance OpenAI-compatible API server for MLX models on Apple Silicon, supporting text, vision, audio transcription, and image generation/editing.

Deep LearningLarge Language ModelsMultimodal

Roboflow Trackers

A plug-and-play multi-object tracking (MOT) Python library offering modular implementations of classic algorithms like SORT and ByteTrack. Features a detector-agnostic design compatible with any object detection model (YOLO, DETR, etc.), supporting video files, cameras, RTSP streams, and more. Provides unified CLI tools and Python API with built-in evaluation metrics (CLEAR, HOTA, Identity).

MultimodalDeep LearningSDK

WiFi DensePose

A production-ready implementation of InvisPose that enables real-time, camera-free full-body tracking through walls using commodity WiFi mesh routers and CSI signals, with advanced analytics like fall detection and multi-person tracking.

MultimodalDeep LearningDocker

VibeVoice

Microsoft's family of open-source frontier voice AI models including both Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) models, designed for long-form audio processing with multilingual support.

Model & Inference FrameworkPyTorchPython

Speech-AI-Forge

A project focused on TTS generation models, providing an API server and Gradio-based WebUI with support for multiple voice synthesis, voice cloning, and audio enhancement capabilities.

Model & Inference FrameworkPythonGradio

Embodied_AI_Paper_List

A curated list of embodied AI research papers maintained by the Human Communication and Perception Laboratory at SYSU, providing researchers with the latest academic findings in the embodied intelligence field.

Docs, Tutorials & ResourcesPythonMultimodal

DeepVideoDiscovery

A video content discovery tool developed by Microsoft that uses deep learning technology to automatically identify and extract key content from videos, helping users efficiently browse and understand video information。

Agent & ToolingPythonPyTorch

LLaVA-Plus

LLaVA-Plus is a multimodal assistant system that learns to use tools, combining large language models with visual capabilities to enable AI agents to perform general vision tasks.

Model & Inference FrameworkPythonPyTorch
Per page

Page 1 / 2 · 12 total

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.