Create run from a dataset
Endpoint: POST /runs/dataset
Description Starts a run using an existing dataset: the server takes the dataset’s records and selected metrics.
Sharing rules
The new run is always owned by you (your account is the individual owner). Sharing isn't accepted directly on this endpoint — the run inherits visibility from the dataset (and its data collection). Runs created here are never org-only (org-only resources have no individual owner).
- The run's
data_collection_idis copied from the dataset. - The run's
org_idis set in this order:- The dataset's
org_id, if any. - Otherwise, the
org_idof the dataset's data collection, if any. - Otherwise, if the dataset belongs to a different user and you're in an organization, your own
org_id(so the dataset owner can see the run via the same organization). - Otherwise the run is created private.
- The dataset's
Parameters
- Body —
application/json:
{
"dataset_id": 1,
"threshold": "integer | null",
"model_slug": "string | null",
"metric_args": "object | null",
"is_blocking": false,
"alias": "string | null"
}
alias
- Optional display name for the run (max 255 characters). Must be unique per account.
- If omitted, the server generates one from the dataset name and creation timestamp.
metric_args
- Optional per-run override of metric arguments, applied on top of the dataset's stored
selected_metricsdefaults. - Shape:
{ "<metric_id>": { "<arg_name>": <value>, ... }, ... }. Keys are metric IDs as strings ("1", not1). Non-integer keys return400. - Only metric IDs already in the dataset's
selected_metricsare honored — entries for other metrics are ignored. - Unknown argument names for a metric are rejected; missing required args (with no dataset-level default) are rejected. Each value's type must match the type declared by the metric.
- Omit the field (or pass
null) to use the dataset-level defaults unchanged.
is_blocking
false(default)— The run and evaluation records are saved, then evaluation work is started in the background; the201response returns right away. Results may still be missing in that payload - pollGET /runs/{run_id}until evaluations complete. Prefer this for large datasets to avoid holding the HTTP request open.true— The API waits until every evaluation for this run has been executed. The response body normally includes filled-in scores,aggregate_results, andfinished_at, and cost totals for the run are finalized before you receive201.
Error responses
401,402— Auth or insufficient balance.422— Body/query validation failed.404—model_slugis not a documented model slug.400— Dataset id not found; metric IDs from the dataset config invalid; inactive metrics; duplicate run alias (unique per user).404— No records on the dataset; no metrics configured for the dataset.500— Failure creating the run.
Responses
201— same run object shape asGET /runs/{run_id}.
Example response (201)
{
"id": 991,
"user": "analyst@acme.com",
"author_email": "analyst@acme.com",
"run_type": "Dataset",
"run_source": "API Call",
"dataset": "Support QA - March",
"data_collection": "Customer Support",
"org_id": null,
"number_of_metrics": 2,
"result": 83.5,
"threshold": 70,
"model_slug": "gpt-4o",
"alias": "support-march",
"aggregate_results": {
"ans_corr": 86,
"faith": 81
},
"total_cost": null,
"started_at": "2026-04-01T09:12:00Z",
"finished_at": null,
"is_gte_threshold": null,
"evaluations": []
}
curl
curl -X POST "https://api.aegisevals.ai/api/v1/runs/dataset" \
-H "Authorization: Bearer sk_00000000000000000000000000000000" \
-H "Content-Type: application/json" \
-d '{"dataset_id":1,"threshold":70,"model_slug":"gpt-4o","is_blocking":false}'
Override metric args for this run only
Dataset 42 selected metrics 1 (json_equal) and 2 (exact_match). The example below overrides json_equal's args for this run while reusing whatever defaults are stored on the dataset for the other metric.
curl -X POST "https://api.aegisevals.ai/api/v1/runs/dataset" \
-H "Authorization: Bearer sk_00000000000000000000000000000000" \
-H "Content-Type: application/json" \
-d '{
"dataset_id": 42,
"threshold": 70,
"model_slug": "gpt-4o",
"is_blocking": false,
"metric_args": {
"1": { "ignore_extra_keys": true, "ignore_order": false }
}
}'