Skip to main content

Create dataset

Endpoint: POST /datasets

Description Creates a custom dataset from a CSV file. The server parses the file, stores records, and attaches selected metrics (metric id → { threshold, metric_args }). You can optionally set data_collection_id to a data collection id you can access (same rules as listing collections: yours or shared with your organization).

Sharing rules

The new dataset is always owned by you (your account is the individual owner). Sharing isn't accepted directly on this endpoint — visibility is derived from the data collection (if any). Datasets created here are never org-only (org-only resources have no individual owner and can only be produced through admin flows).

  • Without data_collection_id → the dataset is created private (org_id = null).
  • With data_collection_id:
    • The linked collection is org-shared → the dataset is created shared with the collection's org_id. You must still be a member of that organization.
    • The linked collection is private → the dataset is created private.

Parameters

  • Bodymultipart/form-data:
{
"file": "binary (.csv file)",
"selected_metrics": "string (JSON: see shape below)",
"column_mappings": "string | null (JSON: Aegis field → CSV column name)",
"name": "string | null",
"data_collection_id": "integer | null"
}

selected_metrics is a JSON-encoded string. Decoded, it is a map keyed by metric id (as a string, e.g. "1"), where each value is an object:

{
"<metric_id>": {
"threshold": "integer (0–100)",
"metric_args": "object | null"
}
}
  • threshold is required and must be between 0 and 100.
  • metric_args is optional (null or omitted means "use the metric's defaults"). Keys are argument names declared by the metric and values must match the argument's declared type. Unknown argument names are rejected; required args with no default must be supplied.
  • Metric id keys. Non-integer keys return 400.
  • Per-run overrides can later be supplied via POST /runs/dataset's metric_args.

Error responses

  • 401 — Authentication failed.
  • 422 — Missing required form parts or invalid multipart fields.
  • 400 — Invalid CSV, empty file, bad encoding, invalid selected_metrics / column_mappings, unknown or inactive metrics, duplicate dataset name for your account or organization, etc.
  • 404 — Dataset type not found; or data collection not found / not accessible for data_collection_id.
  • 500 — Server error.

Responses

  • 201 — dataset object with the same JSON shape as Get dataset by id. On create, runs and evaluations are usually empty arrays until you start a dataset run.

Example response (201)

{
"id": 42,
"user": { "id": 7, "email": "analyst@acme.com" },
"dataset_type_id": 1,
"author_email": "analyst@acme.com",
"name": "My evaluation set",
"selected_metrics": {
"1": { "threshold": 70, "metric_args": null },
"2": { "threshold": 80, "metric_args": { "ignore_extra_keys": true } }
},
"structure": ["prompt", "output", "golden_answer"],
"column_mappings": null,
"data_collection_id": null,
"org_id": null,
"created_at": "2026-04-02T12:00:00Z",
"updated_at": null,
"dataset_type": {
"id": 1,
"name": "CUSTOM",
"label": "Custom",
"description": null
},
"records": [
{
"id": 1001,
"user_id": 7,
"dataset_id": 42,
"prompt": "What is the refund policy?",
"input": null,
"context": null,
"output": "Refunds are available within 30 days.",
"golden_answer": "30-day refund window.",
"created_at": "2026-04-02T12:00:00Z",
"updated_at": null
}
],
"runs": [],
"evaluations": []
}

curl

curl -X POST "https://api.aegisevals.ai/api/v1/datasets" \
-H "Authorization: Bearer sk_00000000000000000000000000000000" \
-F "file=@/path/to/data.csv" \
-F 'selected_metrics={"1":{"threshold":70,"metric_args":null},"2":{"threshold":80,"metric_args":{"ignore_extra_keys":true}}}' \
-F 'name=My evaluation set'