Entity Faithfulness (entity_faith)
Contents
Metric Description
Entity faithfulness evaluates whether the entities in the model's output are present in the input text. It detects when the model hallucinates entities, introduces facts not supported by the input, or distorts entity representations. The metric is designed for use cases where the output is expected to extract or reference only entities that appear (explicitly or in equivalent form) in the source input.
The score runs from 0 (many entities not grounded in the input) to 100 (all extracted entities are present in the input). The implementation combines (1) LLM-as-a-Judge assessments, (2) heuristic measures, and (3) word-boundary substring checks followed—depending on match_mode—by strict regex-only rejection (exact) or LLM-based grounding (strict / flexible, default).
How to interpret the score
- Closer to 100: All or nearly all entities in the output can be traced back to the input; little or no hallucination or unsupported references.
- Closer to 0: Many entities in the output are not found in the input; the model is introducing or distorting entity information.
Entity faithfulness checks that entities mentioned in the output exist in the input, but it does not assess correctness of the answer or semantic faithfulness of claims. Pair this with faithfulness, factfulness, or answer correctness when evaluating full response quality.
API usage
Prerequisites
After the environment variables are configured, the next step is to create a JSON payload for the custom-runs request. For a field-by-field description of the payload (top-level keys, evaluations, and each row in data), see Custom run request body.
Shortname: entity_faith
Default threshold: 80
Inputs (each object in data)
input(strrequired): The source text from which entities should be grounded (e.g., document, note, or context).output(strordictrequired): The model-generated output that extracts or references entities to evaluate. If a dict is provided, it is serialized to JSON.prompt(stroptional): Extra instructions or formatting context. When provided, the metric may extract additional entities and additional instructions from it (e.g., abbreviation definitions) to help grounding.
metric_args
domain(stroptional): Domain hint for entity extraction (passed through to extraction prompts). Default: omitted (None). If set, it must be a string (non-string values are rejected).match_mode(stroptional): How strictly leftover entities are matched after an initial word-boundary, case-insensitive exact pass againstinput(plus anyprompt-derived extra entities). Must be exactly one ofexact,strict, orflexible. Default:flexible.exact: Any entity not matched exactly against the source corpus is counted missing (regex-only; no LLM matching step).strict: LLM matching allows well-established proper-noun abbreviations (e.g., USA, NASA) and abbreviations explicitly defined in the prompt; casual shorthand is rejected.flexible: LLM matching that also accepts common professional shorthand and widely understood equivalences (e.g., “2 wks” vs “2 weeks”, “temp” vs “temperature”).
Evaluation metadata
On successful evaluation, the metric returns eval_metadata with grounding failures:
entities_not_found_in_source(list[dict]): Entities appearing in the output that could not be grounded in the input. Each item hastext(the entity surface form from the output) andreason(why it was not found or accepted in the source).
Example
import json
import os
import requests
from dotenv import load_dotenv
load_dotenv(override=True)
_API_KEY = os.getenv("AEGIS_API_KEY")
_BASE_URL = os.getenv("AEGIS_API_BASE_URL")
_CUSTOM_RUN_URL = f"{_BASE_URL}/runs/custom"
def post_custom_run(payload: dict) -> requests.Response:
"""POST JSON payload to Aegis custom runs; returns the raw response."""
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {_API_KEY}",
}
return requests.post(
_CUSTOM_RUN_URL,
headers=headers,
data=json.dumps(payload),
)
if __name__ == "__main__":
# Records must match the columns your metric expects (input, output).
data = [
{
"input": "Nokia's regional team met with Vodafone RO to discuss the 5G rollout in Iași.",
"output": "Entities: Nokia; Vodafone Romania; fifth-generation network deployment; Iași.",
},
]
payload = {
"threshold": 80,
"model_slug": "o4-mini",
"is_blocking": True,
"data_collection_id": None,
"evaluations": [
{
"metrics": [
{
"metric": "entity_faith",
"metric_args": {
"domain": "telecom",
"match_mode": "flexible",
},
},
],
"threshold": 80,
"model_slug": "o4-mini",
"data": data,
}
],
}
response = post_custom_run(payload)
response.raise_for_status()
print(json.dumps(response.json(), indent=2))