Agent Park - Agent Project Navigator

All Projects

1 projects

BullshitBench

✨

A benchmark measuring whether AI models challenge nonsensical prompts rather than confidently answering them, featuring 100 questions across 5 domains with a 3-tier judgment system and multi-judge panel.

PythonLarge Language ModelsCLI

VIEW DETAILS →

Per page

Page 1 / 1 · 1 total

Browse by Filters

Project Type

Filter by Domain

Filter by Product Form

All Projects

BullshitBench

STAY UPDATED