Create dataset
Endpoint: POST /datasets
Description
Creates a custom dataset from a CSV file. The server parses the file, stores records, and attaches selected metrics (metric id → { threshold, metric_args }). You can optionally set data_collection_id to a data collection id you can access (same rules as listing collections: yours or shared with your organization).
Sharing rules
The new dataset is always owned by you (your account is the individual owner). Sharing isn't accepted directly on this endpoint — visibility is derived from the data collection (if any). Datasets created here are never org-only (org-only resources have no individual owner and can only be produced through admin flows).
- Without
data_collection_id→ the dataset is created private (org_id = null). - With
data_collection_id:- The linked collection is org-shared → the dataset is created shared with the collection's
org_id. You must still be a member of that organization. - The linked collection is private → the dataset is created private.
- The linked collection is org-shared → the dataset is created shared with the collection's
Parameters
- Body —
multipart/form-data:
{
"file": "binary (.csv file)",
"selected_metrics": "string (JSON: see shape below)",
"column_mappings": "string | null (JSON: Aegis field → CSV column name)",
"name": "string | null",
"data_collection_id": "integer | null"
}
selected_metrics is a JSON-encoded string. Decoded, it is a map keyed by metric id (as a string, e.g. "1"), where each value is an object:
{
"<metric_id>": {
"threshold": "integer (0–100)",
"metric_args": "object | null"
}
}
thresholdis required and must be between0and100.metric_argsis optional (nullor omitted means "use the metric's defaults"). Keys are argument names declared by the metric and values must match the argument's declared type. Unknown argument names are rejected; required args with no default must be supplied.- Metric id keys. Non-integer keys return
400. - Per-run overrides can later be supplied via
POST /runs/dataset'smetric_args.
Error responses
401— Authentication failed.422— Missing required form parts or invalid multipart fields.400— Invalid CSV, empty file, bad encoding, invalidselected_metrics/column_mappings, unknown or inactive metrics, duplicate dataset name for your account or organization, etc.404— Dataset type not found; or data collection not found / not accessible fordata_collection_id.500— Server error.
Responses
201— dataset object with the same JSON shape as Get dataset by id. On create,runsandevaluationsare usually empty arrays until you start a dataset run.
Example response (201)
{
"id": 42,
"user": { "id": 7, "email": "analyst@acme.com" },
"dataset_type_id": 1,
"author_email": "analyst@acme.com",
"name": "My evaluation set",
"selected_metrics": {
"1": { "threshold": 70, "metric_args": null },
"2": { "threshold": 80, "metric_args": { "ignore_extra_keys": true } }
},
"structure": ["prompt", "output", "golden_answer"],
"column_mappings": null,
"data_collection_id": null,
"org_id": null,
"created_at": "2026-04-02T12:00:00Z",
"updated_at": null,
"dataset_type": {
"id": 1,
"name": "CUSTOM",
"label": "Custom",
"description": null
},
"records": [
{
"id": 1001,
"user_id": 7,
"dataset_id": 42,
"prompt": "What is the refund policy?",
"input": null,
"context": null,
"output": "Refunds are available within 30 days.",
"golden_answer": "30-day refund window.",
"created_at": "2026-04-02T12:00:00Z",
"updated_at": null
}
],
"runs": [],
"evaluations": []
}
curl
curl -X POST "https://api.aegisevals.ai/api/v1/datasets" \
-H "Authorization: Bearer sk_00000000000000000000000000000000" \
-F "file=@/path/to/data.csv" \
-F 'selected_metrics={"1":{"threshold":70,"metric_args":null},"2":{"threshold":80,"metric_args":{"ignore_extra_keys":true}}}' \
-F 'name=My evaluation set'