DISCOVER THE FUTURE OF AI AGENTS

All Projects

1 projects

BullshitBench

A benchmark measuring whether AI models challenge nonsensical prompts rather than confidently answering them, featuring 100 questions across 5 domains with a 3-tier judgment system and multi-judge panel.

PythonLarge Language ModelsCLI
Per page

Page 1 / 1 · 1 total

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.