Skip to main content

Create run from a dataset

Endpoint: POST /runs/dataset

Description Starts a run using an existing dataset: the server takes the dataset’s records and selected metrics.

Sharing rules

The new run is always owned by you (your account is the individual owner). Sharing isn't accepted directly on this endpoint — the run inherits visibility from the dataset (and its data collection). Runs created here are never org-only (org-only resources have no individual owner).

  • The run's data_collection_id is copied from the dataset.
  • The run's org_id is set in this order:
    1. The dataset's org_id, if any.
    2. Otherwise, the org_id of the dataset's data collection, if any.
    3. Otherwise, if the dataset belongs to a different user and you're in an organization, your own org_id (so the dataset owner can see the run via the same organization).
    4. Otherwise the run is created private.

Parameters

  • Bodyapplication/json:
{
"dataset_id": 1,
"threshold": "integer | null",
"model_slug": "string | null",
"metric_args": "object | null",
"is_blocking": false,
"alias": "string | null"
}

alias

  • Optional display name for the run (max 255 characters). Must be unique per account.
  • If omitted, the server generates one from the dataset name and creation timestamp.

metric_args

  • Optional per-run override of metric arguments, applied on top of the dataset's stored selected_metrics defaults.
  • Shape: { "<metric_id>": { "<arg_name>": <value>, ... }, ... }. Keys are metric IDs as strings ("1", not 1). Non-integer keys return 400.
  • Only metric IDs already in the dataset's selected_metrics are honored — entries for other metrics are ignored.
  • Unknown argument names for a metric are rejected; missing required args (with no dataset-level default) are rejected. Each value's type must match the type declared by the metric.
  • Omit the field (or pass null) to use the dataset-level defaults unchanged.

is_blocking

  • false (default)— The run and evaluation records are saved, then evaluation work is started in the background; the 201 response returns right away. Results may still be missing in that payload - poll GET /runs/{run_id} until evaluations complete. Prefer this for large datasets to avoid holding the HTTP request open.
  • true — The API waits until every evaluation for this run has been executed. The response body normally includes filled-in scores, aggregate_results, and finished_at, and cost totals for the run are finalized before you receive 201.

Error responses

  • 401, 402 — Auth or insufficient balance.
  • 422 — Body/query validation failed.
  • 404model_slug is not a documented model slug.
  • 400 — Dataset id not found; metric IDs from the dataset config invalid; inactive metrics; duplicate run alias (unique per user).
  • 404 — No records on the dataset; no metrics configured for the dataset.
  • 500 — Failure creating the run.

Responses

Example response (201)

{
"id": 991,
"user": "analyst@acme.com",
"author_email": "analyst@acme.com",
"run_type": "Dataset",
"run_source": "API Call",
"dataset": "Support QA - March",
"data_collection": "Customer Support",
"org_id": null,
"number_of_metrics": 2,
"result": 83.5,
"threshold": 70,
"model_slug": "gpt-4o",
"alias": "support-march",
"aggregate_results": {
"ans_corr": 86,
"faith": 81
},
"total_cost": null,
"started_at": "2026-04-01T09:12:00Z",
"finished_at": null,
"is_gte_threshold": null,
"evaluations": []
}

curl

curl -X POST "https://api.aegisevals.ai/api/v1/runs/dataset" \
-H "Authorization: Bearer sk_00000000000000000000000000000000" \
-H "Content-Type: application/json" \
-d '{"dataset_id":1,"threshold":70,"model_slug":"gpt-4o","is_blocking":false}'

Override metric args for this run only

Dataset 42 selected metrics 1 (json_equal) and 2 (exact_match). The example below overrides json_equal's args for this run while reusing whatever defaults are stored on the dataset for the other metric.

curl -X POST "https://api.aegisevals.ai/api/v1/runs/dataset" \
-H "Authorization: Bearer sk_00000000000000000000000000000000" \
-H "Content-Type: application/json" \
-d '{
"dataset_id": 42,
"threshold": 70,
"model_slug": "gpt-4o",
"is_blocking": false,
"metric_args": {
"1": { "ignore_extra_keys": true, "ignore_order": false }
}
}'