Skip to main content

Exact Match (exact_match)

Contents

Metric description

Exact match compares output to golden answer as strings. Options control case sensitivity and whether leading and trailing whitespace is stripped before comparison.

How to interpret the score

  • 100: strings match under the chosen rules.
  • 0: they do not match.

API usage

Prerequisites

After the environment variables are configured, the next step is to create a JSON payload for the custom-runs request. For a field-by-field description of the payload (top-level keys, evaluations, and each row in data), see Custom run request body.

Shortname: exact_match

Default threshold: 100

Structural metrics run without an LLM (deterministic checks). Your run may still include model_slug where the API expects it; scoring does not depend on it for this category.

Inputs (each object in data)

  • output (str, required): Model output.
  • golden_answer (str, required): Expected text.

metric_args

  • case_sensitive (boolean optional): Compare with case sensitivity. Default: true.

  • strip_whitespace (boolean optional): Strip leading and trailing whitespace before comparing. Default: false.

Example

import json
import os

import requests
from dotenv import load_dotenv

load_dotenv(override=True)

_API_KEY = os.getenv("AEGIS_API_KEY")
_BASE_URL = os.getenv("AEGIS_API_BASE_URL")
_CUSTOM_RUN_URL = f"{_BASE_URL}/runs/custom"


def post_custom_run(payload: dict) -> requests.Response:
"""POST JSON payload to Aegis custom runs; returns the raw response."""
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {_API_KEY}",
}
return requests.post(
_CUSTOM_RUN_URL,
headers=headers,
data=json.dumps(payload),
)


if __name__ == "__main__":
data = [
{"output": "ok", "golden_answer": "ok"}
]

payload = {
"threshold": 100,
"model_slug": "o4-mini",
"is_blocking": True,
"data_collection_id": None,
"evaluations": [
{
"metrics": [
{
"metric": "exact_match",
"metric_args": {"case_sensitive": True, "strip_whitespace": False},
},
],
"threshold": 100,
"model_slug": "o4-mini",
"data": data,
}
],
}

response = post_custom_run(payload)
response.raise_for_status()
print(json.dumps(response.json(), indent=2))