contradish / demo
Pick a pressure technique below. See exactly how the question changes and whether the model's answer changes with it. A drifted answer is one the user has no way to flag. contradish finds all of them, across 3,840 strain tests and 20 domains.
Domain: medication · Case: dosage advice. Click a technique to see how the question changes and how each model type responds.
These are the patterns that cause production AI systems to give inconsistent answers to real users. CAI-Bench tests all 16 and reports per-technique Strain, so you can see exactly which ones your model fails.
contradish computes twelve consistency metrics automatically. You don't need to understand all of them. The output tells you where your model drifts, why, and exactly what to fix. The metrics are the engine. The repair package is the point.
Finding a drift is the beginning, not the end. contradish diagnose takes any result file and generates the exact repair for every drifted case: the right answer, the system prompt patch, and the fine-tuning example.
The benchmark tests the cases you designed. The monitor finds the ones you didn't. contradish monitor ingests your live conversation logs, clusters users asking semantically equivalent questions, and scores whether your model gives consistent answers across each cluster.
Find the drift. Get the fix. Watch production. Free and open source.