Introduction
The Aegis API Server is the HTTP API for programmatic access to Aegis. This section is organized into Introduction, Evaluations, and Data so you can quickly find endpoint behavior, request payloads, and response examples.
Base URL
All documented routes live under a single prefix:
https://api.aegisevals.ai/api/v1
Core concepts
Before calling endpoints, these are the key terms used across the API:
- Evaluation: one scoring task where Aegis checks model output against one or more metrics (for example: correctness, safety, formatting).
- Run: a stored execution of an evaluation job. A run usually contains many evaluations and their metric scores.
- Dataset: uploaded CSV data that Aegis stores as rows, with chosen metrics and thresholds, so you can run evaluations repeatedly without resending the file.
- Dataset type: a lookup that classifies a dataset (for example custom user uploads versus proprietary catalog data). Types have integer ids used when filtering dataset lists.
- Model (catalog): a registered LLM in Aegis with
id,slug,name, and a supplier. See Models for the list of validslugvalues to use in runs and evaluations. - Dataset run: a run created from a saved dataset that already exists in Aegis.
- Custom run: a run created by sending rows directly in the request body, without needing a pre-saved dataset.
- Data collection: a container that groups datasets and related runs so teams can organize and review evaluation work in one place.
- Organization-only resource: a row with no individual owner (
user/user_idis null) and anorg_idset — the organization owns it (typically via admin flows). Organization-only rows cannot be unshared or moved to another organization through the public API; deleting them usually requires an organization administrator. Endpoints that create resources via API keys still attach you as the individual owner unless documented otherwise.
Endpoint guide
Evaluate
POST /evaluate (Single Evaluation)
- Runs an evaluation request and returns the computed scoring output.
- Use this for quick or direct evaluation execution from your app/backend.
Runs
POST /runs/dataset (Create run from dataset)
- Starts a new run using a dataset already stored in Aegis.
- Use this for repeatable evaluations over curated data.
POST /runs/custom (Create custom run)
- Starts a new run by sending metric config and rows directly in the request.
- Use this when data is generated on the fly and not saved as a dataset first.
GET /runs/{run_id} (Get run)
- Retrieves one existing run with its summary and row-level evaluation data.
- Use this to get information about a certain run.
PUT /runs/{run_id} (Update run)
- Updates run metadata.
- Use this when run attributes need to change after creation.
GET /runs/{run_id}/download (Download run)
- Exports run results as CSV for analysis.
- Use this for reporting, sharing, and offline analysis.
DELETE /runs/{run_id} (Delete run)
- Permanently removes a run you own, or an org-only run when you are an org administrator.
- Use this to clean up completed or obsolete evaluation runs.
Dataset types
GET /dataset-types + GET /dataset-types/{dataset_type_id} (Get dataset types)
- Lists all dataset types or returns one type by id (
name,label,description). - Use this to discover type ids for
dataset_type_idon Get datasets.
Models
Models (reference) — the supported LLM slugs, display names, and flags (thinking, latest) are listed in the docs; the API server does not expose a models HTTP endpoint. Use a documented slug in model_slug when you create runs or evaluations.
Datasets
POST /datasets (Create dataset)
- Uploads a CSV and creates a custom dataset with metric thresholds.
- Use this to persist evaluation data for dataset runs and data collections.
GET /datasets/all-partial, GET /datasets, GET /datasets/{dataset_id} (Get datasets)
- Lists dataset ids/names, paginated summaries, or one full dataset (records, runs, evaluations).
- Use this to discover datasets and inspect stored rows.
PUT /datasets/{dataset_id} (Update dataset)
- Updates name, metrics, column mappings, or data collection membership.
- Use this to keep dataset information accurate over time.
DELETE /datasets/{dataset_id} (Delete dataset)
- Deletes a custom dataset you own.
GET /datasets/{dataset_id}/download (Download dataset)
- Exports dataset rows as CSV.
Data collections
POST /data-collections (Create data collection)
- Creates a new data collection container.
- Use this when grouping datasets or runs is necessary.
GET /data-collections + GET /data-collections/{data_collection_id} (Get data collections)
- Lists collections (paginated) or fetches one collection by id. The paginated list returns summary nested datasets and runs; get by id returns nested datasets and runs.
- Use this to discover collections or inspect one collection in depth.
PUT /data-collections/{data_collection_id} (Update data collection)
- Updates optional
name,dataset_ids, and/ororg_id. - Use this to rename a collection, attach datasets, or adjust organization scope.
DELETE /data-collections/{data_collection_id} (Delete data collection)
- Permanently removes a data collection.
- Use this when a collection is no longer needed.
Organizations
GET /organizations/me (Get my organization)
- Returns the organization the authenticated API key belongs to, including org admins and members.
- Returns
nullif the key has no associated organization.
Getting started
- Use your organization’s Aegis web app to sign in and create an API key intended for integrations.
- Send the key on every request to the routes in this section, as described below.
Authentication
Every route under /runs, /evaluate, /data-collections, /dataset-types, and /datasets, and /organizations requires:
Authorization: Bearer <token>
Billing and balance
Operations that enqueue real model work generally check your account balance before proceeding. That includes starting a dataset run, a custom run, and calling POST /evaluate. If your balance is insufficient, the server responds with 402 Payment Required (see HTTP status codes and error responses). Costs are tracked in USD — each evaluation records an evaluation_cost (the dollar amount charged for that evaluation).
HTTP status codes and error responses
Responses use this shape: JSON with a detail field.
200— Success forGET/PUT/DELETEwhere a body is returned.201— Resource created (POSTruns,POSTevaluate,POSTdata collections,POSTdatasets).204— Success with no body (DELETErun,DELETEdata collection,DELETEdataset).400— Bad request: invalid payload, inactive metrics, missing, or inconsistent IDs, duplicate alias/name, database constraint, etc.401— Missing/invalidAuthorizationheader, or API key not accepted.402— Insufficient balance — returned beforePOST /runs/dataset,POST /runs/custom, orPOST /evaluateif your balance is too low.404— Resource not found (run, metric, model, dataset, dataset type, data collection, no rows/CSV data, etc.).422— Validation failed - invalid JSON body or field types on aPOST/PUT(e.g.POST /runs/dataset,POST /runs/custom), or out-of-range query params (e.g.page/page_size).500— Unexpected server error.