Run one evaluation
Endpoint: POST /evaluate
Description
Runs a single evaluation: you pass a metric short name, optional model and threshold, and the input fields the metric needs. The response echoes those fields and adds scores, costs, and timing.
Parameters
- Body —
application/json:
{
"metric_shortname": "string",
"model_slug": "string | null",
"threshold": "integer | null",
"metric_args": "object | null",
"prompt": "string | null",
"input": "string | null",
"context": "string | null",
"output": "string | null",
"golden_answer": "string | null"
}
Error responses
401,402— As in Error responses above.404— No metric with the givenmetric_shortname.400— Metric is not active; duplicate alias for your account; other DB integrity errors.500— Error during evaluation or persistence.
Responses
201— JSON object with the shape below.
Example response (201)
{
"id": 1542,
"run_id": 991,
"model_slug": "gpt-4o",
"metric_args": null,
"is_success": true,
"is_gte_threshold": true,
"threshold": 70,
"prompt": "What is 2+2?",
"input": null,
"context": null,
"output": "4",
"golden_answer": "4",
"started_at": "2026-04-01T09:10:03Z",
"result": 100,
"explanation": "Answer exactly matches the expected value.",
"evaluation_cost": 0.0004,
"credits_consumed": 1,
"finished_at": "2026-04-01T09:10:04Z",
"metric_shortname": "answer_correctness"
}
{
"id": 0,
"run_id": 0,
"model_slug": "string",
"metric_args": "object | null",
"is_success": "boolean | null",
"is_gte_threshold": "boolean | null",
"threshold": 0,
"prompt": "string | null",
"input": "string | null",
"context": "string | null",
"output": "string | null",
"golden_answer": "string | null",
"started_at": "date",
"result": "number | null",
"explanation": "string | null",
"evaluation_cost": "number | null",
"credits_consumed": "integer | null",
"finished_at": "date | null",
"metric_shortname": "string"
}
curl
curl -X POST "https://api.aegisevals.ai/api/v1/evaluate" \
-H "Authorization: Bearer sk_00000000000000000000000000000000" \
-H "Content-Type: application/json" \
-d '{
"metric_shortname": "answer_correctness",
"model_slug": "gpt-4o",
"threshold": 70,
"prompt": "What is 2+2?",
"output": "4",
"golden_answer": "4"
}'