Introduction

The Aegis API Server is the HTTP API for programmatic access to Aegis. This section is organized into Introduction, Evaluations, and Data so you can quickly find endpoint behavior, request payloads, and response examples.

Base URL

All documented routes live under a single prefix:

https://api.aegisevals.ai/api/v1

Core concepts

Before calling endpoints, these are the key terms used across the API:

Evaluation: one scoring task where Aegis checks model output against one or more metrics (for example: correctness, safety, formatting).
Run: a stored execution of an evaluation job. A run usually contains many evaluations and their metric scores.
Dataset: uploaded CSV data that Aegis stores as rows, with chosen metrics and thresholds, so you can run evaluations repeatedly without resending the file.
Dataset type: a lookup that classifies a dataset (for example custom user uploads versus proprietary catalog data). Types have integer ids used when filtering dataset lists.
Model (catalog): a registered LLM in Aegis with id, slug, name, and a supplier. See Models for the list of valid slug values to use in runs and evaluations.
Dataset run: a run created from a saved dataset that already exists in Aegis.
Custom run: a run created by sending rows directly in the request body, without needing a pre-saved dataset.
Data collection: a container that groups datasets and related runs so teams can organize and review evaluation work in one place.

Endpoint guide

Evaluate

POST /evaluate (Single Evaluation)

Runs an evaluation request and returns the computed scoring output.
Use this for quick or direct evaluation execution from your app/backend.

Runs

POST /runs/dataset (Create run from dataset)

Starts a new run using a dataset already stored in Aegis.
Use this for repeatable evaluations over curated data.

POST /runs/custom (Create custom run)

Starts a new run by sending metric config and rows directly in the request.
Use this when data is generated on the fly and not saved as a dataset first.

GET /runs/{run_id} (Get run)

Retrieves one existing run with its summary and row-level evaluation data.
Use this to get information about a certain run.

GET /runs/{run_id}/download (Download run)

Exports run results as CSV for analysis.
Use this for reporting, sharing, and offline analysis.

Dataset types

GET /dataset-types + GET /dataset-types/{dataset_type_id} (Get dataset types)

Lists all dataset types or returns one type by id (name, label, description).
Use this to discover type ids for dataset_type_id on Get datasets.

Models

Models (reference) — the supported LLM slugs, display names, and flags (thinking, latest) are listed in the docs; the API server does not expose a models HTTP endpoint. Use a documented slug in model_slug when you create runs or evaluations.

Datasets

POST /datasets (Create dataset)

Uploads a CSV and creates a custom dataset with metric thresholds.
Use this to persist evaluation data for dataset runs and data collections.

GET /datasets/all-partial, GET /datasets, GET /datasets/{dataset_id} (Get datasets)

Lists dataset ids/names, paginated summaries, or one full dataset (records, runs, evaluations).
Use this to discover datasets and inspect stored rows.

PUT /datasets/{dataset_id} (Update dataset)

Updates name, metrics, column mappings, or data collection membership.
Use this to keep dataset information accurate over time.

DELETE /datasets/{dataset_id} (Delete dataset)

Deletes a custom dataset you own.

GET /datasets/{dataset_id}/download (Download dataset)

Exports dataset rows as CSV.

Data collections

POST /data-collections (Create data collection)

Creates a new data collection container.
Use this when grouping datasets or runs is necessary.

GET /data-collections + GET /data-collections/{data_collection_id} (Get data collections)

Lists collections (paginated) and fetches one collection by id.
Use this to get information about a certain collection.

PUT data-collections/{data_collection_id} (Update data collection)

Updates collection metadata (for example name, aliases, linked references).
Use this to keep collection information accurate over time.

DELETE /data-collections/{data_collection_id} (Delete data collection)

Permanently removes a data collection.
Use this when a collection is no longer needed.

Getting started

Use your organization’s Aegis web app to sign in and create an API key intended for integrations.
Send the key on every request to the routes in this section, as described below.

Authentication

Every route under /runs, /evaluate, /data-collections, /dataset-types, and /datasets requires:

Authorization: Bearer <token>

Credits and billable work

Operations that enqueue real model work generally check your credit balance before proceeding. That includes starting a dataset run, a custom run, and calling POST /evaluate. If you lack credits, the server responds with 402 Payment Required (see HTTP status codes and error responses).

HTTP status codes and error responses

Responses use this shape: JSON with a detail field.

200 — Success for GET/PUT/DELETE where a body is returned.
201 — Resource created (POST runs, POST evaluate, POST data collections, POST datasets).
204 — Success with no body (DELETE data collection, DELETE dataset).
400 — Bad request: invalid payload, inactive metrics, missing, or inconsistent IDs, duplicate alias/name, database constraint, etc.
401 — Missing/invalid Authorization header, or API key not accepted.
402 — Insufficient credits - returned before POST /runs/dataset, POST /runs/custom, or POST /evaluate runs if your balance is too low.
404 — Resource not found (run, metric, model, dataset, dataset type, data collection, no rows/CSV data, etc.).
422 — Validation failed - invalid JSON body or field types on a POST/PUT (e.g. POST /runs/dataset, POST /runs/custom), or out-of-range query params (e.g. page / page_size).
500 — Unexpected server error.

Base URL​

Core concepts​

Endpoint guide​

Evaluate​

Runs​

Dataset types​

Models​

Datasets​

Data collections​

Getting started​

Authentication​

Credits and billable work​

HTTP status codes and error responses​