Context Waste (ctx_waste)
Contents
Metric Description
Context waste measures how efficient the provided context is for generating the model’s output. It assesses whether the retrieved context contains information that is not used in the output, as well as redundant or contradictory chunks. A context is wasteful when it includes unused passages, duplicate information, or mutually incompatible claims.
The score runs from 0 (high waste) to 100 (efficient context). The implementation combines (1) an LLM-as-a-Judge to interpret the content and (2) heuristic calculations that aggregate these into component scores.
How to interpret the score
- Closer to 100: Most chunks are relevant to the output, contain little redundancy, and do not contradict each other. The context is efficient.
- Closer to 0: Many chunks are unused, redundant, or contradictory. The context is wasteful and could be improved by retrieval or deduplication.
Context waste does not measure answer quality or factual correctness. Pair this with metrics like faithfulness and context recall when you need to evaluate answer grounding and retrieval coverage.
API usage
Prerequisites
After the environment variables are configured, the next step is to create a JSON payload for the custom-runs request. For a field-by-field description of the payload (top-level keys, evaluations, and each row in data), see Custom run request body.
Shortname: ctx_waste
Default threshold: 80
Inputs (each object in data)
output(strrequired): The model-generated answer to evaluate.context(strorlistrequired): The context chunks provided to the model. Can be a string or a list of strings (one string per chunk).
Evaluation metadata
On successful evaluation, the metric returns eval_metadata highlighting structural inefficiencies in the context:
redundant_groups(list[list[int]]): Groups of chunk IDs (each inner list has at least two IDs) judged as saying the same thing or overlapping strongly. Chunk IDs are 0-based indices into the chunk list passed to the metric.contradictory_pairs(list[dict]): Pairs of chunks judged to have incompatible information. Each object hasa(int) andb(int) (0-based chunk IDs) andreason(str) (short explanation of the conflict).
Example
import json
import os
import requests
from dotenv import load_dotenv
load_dotenv(override=True)
_API_KEY = os.getenv("AEGIS_API_KEY")
_BASE_URL = os.getenv("AEGIS_API_BASE_URL")
_CUSTOM_RUN_URL = f"{_BASE_URL}/runs/custom"
def post_custom_run(payload: dict) -> requests.Response:
"""POST JSON payload to Aegis custom runs; returns the raw response."""
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {_API_KEY}",
}
return requests.post(
_CUSTOM_RUN_URL,
headers=headers,
data=json.dumps(payload),
)
if __name__ == "__main__":
data = [
{
"output": "Paris is the capital of France and has a population of about 2.1 million.",
"context": [
"Paris is the capital and largest city of France.",
"As of 2024, Paris has a population of approximately 2.1 million within city limits.",
],
},
]
payload = {
"threshold": 80,
"model_slug": "o4-mini",
"is_blocking": True,
"data_collection_id": None,
"evaluations": [
{
"metrics": ["ctx_waste"],
"threshold": 80,
"model_slug": "o4-mini",
"data": data,
}
],
}
response = post_custom_run(payload)
response.raise_for_status()
print(json.dumps(response.json(), indent=2))