Skip to main content

Run one evaluation

Endpoint: POST /evaluate

Description
Runs a single evaluation: you pass a metric short name, optional model and threshold, and the input fields the metric needs. The response echoes those fields and adds scores, costs, and timing.

Parameters

  • Bodyapplication/json:
{
"metric_shortname": "string",
"model_slug": "string | null",
"threshold": "integer | null",
"metric_args": "object | null",
"prompt": "string | null",
"input": "string | null",
"context": "string | null",
"output": "string | null",
"golden_answer": "string | null"
}

Error responses

  • 401, 402 — Authentication or insufficient balance; shared with other routes (Introduction — status codes).
  • 404 — No metric with the given metric_shortname.
  • 400 — Metric is not active; duplicate alias for your account; other DB integrity errors.
  • 500 — Error during evaluation or persistence.

Responses

  • 201 — JSON object with the shape below.

Example response (201)

{
"id": 1542,
"run_id": 991,
"model_slug": "gpt-4o",
"metric_args": null,
"is_success": true,
"is_gte_threshold": true,
"threshold": 70,
"prompt": "What is 2+2?",
"input": null,
"context": null,
"output": "4",
"golden_answer": "4",
"started_at": "2026-04-01T09:10:03Z",
"result": 100,
"explanation": "Answer exactly matches the expected value.",
"evaluation_cost": "0.0004",
"finished_at": "2026-04-01T09:10:04Z",
"metric_shortname": "answer_correctness",
"eval_metadata": null
}
{
"id": 0,
"run_id": 0,
"model_slug": "string",
"metric_args": "object | null",
"is_success": "boolean | null",
"is_gte_threshold": "boolean | null",
"threshold": 0,
"prompt": "string | null",
"input": "string | null",
"context": "string | null",
"output": "string | null",
"golden_answer": "string | null",
"started_at": "date",
"result": "number | null",
"explanation": "string | null",
"evaluation_cost": "string | null",
"finished_at": "date | null",
"metric_shortname": "string",
"eval_metadata": "object | null"
}

metric_args echoes the per-metric arguments that were used for this evaluation. Keys must be strings and must match argument names declared by the metric — unknown keys are rejected. Each value follows the type declared by the metric (string, boolean, integer, number, list, or object). See the metric's doc page for accepted argument names and types.

eval_metadata is a metric-specific JSON object with extra signals about how the score was produced (for example, intermediate computations or token usage). It is null when the metric does not emit metadata or when the evaluation failed before metadata could be collected.

curl

curl -X POST "https://api.aegisevals.ai/api/v1/evaluate" \
-H "Authorization: Bearer sk_00000000000000000000000000000000" \
-H "Content-Type: application/json" \
-d '{
"metric_shortname": "answer_correctness",
"model_slug": "gpt-4o",
"threshold": 70,
"prompt": "What is 2+2?",
"output": "4",
"golden_answer": "4"
}'