[ LIVE · LEADERBOARD ]//UPDATED EACH SCAN

LLM Security Leaderboard

Every model hit with the same live attack library. Scores, grades, and vulnerability breakdowns update after every scan.

Models Tested

Attack Techniques

Our Testing Methodology

HOW WE EVALUATE EVERY MODEL · CONSISTENTLY · TRANSPARENTLY

Our AI agents conduct multi-turn adversarial conversations with each model, simulating real attack scenarios.

Every model faces the same comprehensive suite of jailbreaks, prompt injections, evasion, and exfiltration attacks across 15 categories.

Security scores (0-100) and letter grades (A+ to F) are calculated using our standardized rubric across all models.

We test in 12 languages to ensure safety guardrails work across language boundaries and mixed-script attacks.

An LLM judge verifies every finding, eliminating false positives so only real vulnerabilities count toward the score.

Models are re-scanned as providers update them, ensuring the leaderboard always reflects current security posture.

[ OPS · ENGAGE ]

RUN THE SAME 58,000+ ATTACKS AGAINST YOUR OWN LLM · SCORE + REPORT IN MINUTES