BullshitBench
✨A benchmark measuring whether AI models challenge nonsensical prompts rather than confidently answering them, featuring 100 questions across 5 domains with a 3-tier judgment system and multi-judge panel.
PythonLarge Language ModelsCLI
Local Deep Research
🧠A local-first AI research assistant featuring multi-LLM support, 20+ research strategies, multi-search-engine integration, and automated quality scoring for 212K+ academic sources, producing citation-backed PDF/Markdown reports via CLI, Web UI, REST API, or MCP Server.
PythonKnowledge BaseFastAPI