Anthropic publishes its constitutional AI principles. We publish the full offensive scanner methodology. No vendor in this category does this. Trust as moat.
The benchmark harness and ground-truth set are in the repo; the public numbers are published after a reviewed real-jury run.
backend/benchmarks/backend/benchmarks/judge_groundtruth/3-tier fallback: Anthropic Opus 4 when explicitly enabled -> NVIDIA nvidia/llama-3.3-nemotron-super-49b-v1 -> NVIDIA meta/llama-3.3-70b-instruct -> deterministic template
sha256(target_url || scan_mode || sorted(categories) || corpus_version)Expected variance: +/- 5% on confidence scores due to LLM non-determinism
Methodology is open-source — CC-BY 4.0. Implementation is proprietary.
Default path is the configured NVIDIA analysis fallback model; Anthropic Opus is used only when analysis_provider enables Anthropic and the NVIDIA path is unavailable or not preferred.
Jury runs 4 LLM calls per finding (classification + StrongREJECT × 2 PINE passes + independent) vs. 1 for the legacy single-judge. Free tier stays on legacy to preserve its latency/cost envelope; paid tiers get the jury for its higher fidelity + SAGE transparency.
StrongREJECT's rubric has explicit ordered list items, so its juror is called twice — original + swapped order — and the final score is the mean. The classification + independent jurors use a single call because their rubric asks a verdict question with no positional structure; swapping payload ↔ response would change prompt semantics rather than measure position bias. Their reliability is observed via cross-juror IPI instead.
Wang et al., 2024 (PINE); Souly et al., NeurIPS 2024 (StrongREJECT §A.3)
python -m benchmarks.run_judge_benchmark --backend realapp.security.mcp_scanner.phase_3
app.security.mcp_scanner.phase_4
app.security.mcp_battery.probe_mcp_c_cross_server_trust
app.security.mcp_battery.probe_mcp_d_resource_uri_ssrf
app.security.mcp_battery.probe_mcp_e_prompt_injection_via_resource
app.security.mcp_battery.probe_mcp_f_oauth_injection
app.security.mcp_battery.probe_mcp_g_command_injection_stdio
app.security.mcp_battery.probe_mcp_h_settings_poisoning
mcp-remote OAuth Endpoint RCE
oauth injection
Microsoft 365 Copilot EchoLeak Zero-Click Exfiltration
prompt injection resource
MCP Inspector CSRF → RCE
command injection
Claude Code / Copilot autoApprove Privilege Escalation
settings poisoning
GPT Researcher MCP Consumer Vulnerability
command injection
DocsGPT MCP Consumer RCE
command injection
LiteLLM MCP-related Vulnerability
command injection
Flowise Authenticated RCE via MCP Adapters
command injection
Tool description prompt injection
tool poisoning
Tool name shadowing across servers
tool poisoning
Mid-session capability mutation (rug pull)
rug pull
Sampling result trust chain
cross server
Resource URI directory traversal
resource ssrf
Symlink escape from declared roots
resource ssrf
Cloud metadata SSRF via resource URIs
resource ssrf
Server name spoofing during initialization
cross server
Prompt injection via shared log resources
prompt injection resource
Resource change notification injection
prompt injection resource
OAuth flow open redirect to attacker-controlled URL
oauth injection
Dynamic Client Registration confusion
oauth injection
Environment variable injection in spawned MCP servers
command injection
Argument injection via embedded newlines
command injection
Credential exfiltration via debug logs
settings poisoning
Agent settings.json poisoning via tool invocation
settings poisoning
Roots permission scope escalation
settings poisoning
Anthropic MCP SDK STDIO Transport Arbitrary Command Execution (by design)
command injection