npcpy
✨A Python library providing key functional primitives for research in multimodal language models, agentic AI, and knowledge graphs, featuring unified model invocation, multi-agent collaboration and debate, knowledge graph lifecycle management, and multimodal generation.
Model & Inference FrameworkLarge Language ModelsMultimodal
ClawProBench
✨Transparent live-first benchmark harness for evaluating LLM Agent capability inside the OpenClaw runtime, with deterministic scoring and multi-dimensional assessment.
Model & Inference FrameworkLarge Language ModelsAI Agents
verl
🧠A flexible, efficient, and production-ready post-training reinforcement learning framework for LLMs
OtherDeep LearningMultimodal
BullshitBench
✨A benchmark measuring whether AI models challenge nonsensical prompts rather than confidently answering them, featuring 100 questions across 5 domains with a 3-tier judgment system and multi-judge panel.
Model & Inference FrameworkNatural Language ProcessingLarge Language Models
ARIS — Auto-Research-In-Sleep
✨A zero-dependency, Markdown-native autonomous ML research workflow system covering the full research lifecycle from idea discovery to rebuttal via cross-model adversarial collaboration.
Model & Inference FrameworkLarge Language ModelsMachine Learning
PaperFarm
✨An AI Agent-driven automated experiment framework that points at any code repo, autonomously analyzes, designs, runs experiments, and keeps improvements that work
Model & Inference FrameworkMachine LearningMulti-Agent System
agents-radar
✨A daily automated AI ecosystem aggregation and digest tool. Crawls 10+ data sources including GitHub, HN, ArXiv, and HuggingFace, generates bilingual (CN/EN) reports via LLM analysis, and distributes through GitHub Issues, Web UI, RSS, and MCP Server.
Model & Inference FrameworkLarge Language ModelsModel Context Protocol
Designing Multi-Agent Systems
✨A teaching-oriented multi-agent framework (PicoAgents) with a companion book, covering the full path from building LLM agents from scratch to production deployment, with 50+ examples, DAG workflow engine, autonomous orchestration, Computer Use Agent, and evaluation framework.
Model & Inference FrameworkRAGMulti-Agent System
models
✨An all-in-one AI ecosystem browser in the terminal — explore models, benchmarks, coding agents, and provider status via TUI/CLI
Model & Inference FrameworkLarge Language ModelsRust
Rankify
✨A modular Python toolkit developed by the University of Innsbruck that integrates information retrieval, re-ranking, and RAG generation, featuring 40+ pre-processed datasets and single-line pipeline construction.
Model & Inference FrameworkNatural Language ProcessingSDK