Skip to main content

Security

These metrics cover attacks, exfiltration, and leakage in prompts and outputs. Each page lists its shortname, fields, an example payload (and optional metric_args when the metric supports them), and evaluation metadata returned as eval_metadata on successful runs.

Metrics

The same pages appear under Security in the docs sidebar: