Eval sweeps that fit in a batch window.
Evals are embarrassingly parallel — yet teams still run them one model at a time on realtime endpoints. Bundle five models and ten thousand prompts into a single batch, get reproducible outputs tied to model versions, and hand the report to your compliance team.
You're comparing a new fine-tune to the last one, or to three candidate base models, across a benchmark your domain actually cares about. That's a product of {models} × {prompts} × {seeds} — a few tens of thousands of completions, all non-interactive, all needed before the next deploy decision.
The whole sweep in one submission.
Submit every (model, prompt, seed) tuple as a single batch with a 24h window. Shards run in parallel on spot capacity. No orchestration code for you to babysit.
Your fine-tune, same pipeline.
Evaluate your latest fine-tune alongside Qwen, Mistral, and Gemma on identical infrastructure — not across three different APIs with three different rate limits.
Outputs are pinned, not drifting.
Catalog models are versioned; BYOM models are pinned to the weights you uploaded. Rerun the same batch a month later and the outputs will match. Regulators and reviewers accept this.
Example — sweep 4 models across a 10K-prompt benchmark.
{ "id": "bench_0042", "prompt": "Summarise the following clinical note…", "reference": "…" }{
"id": "bench_0042",
"model": "ft_a@v7",
"completion": "…",
"metrics": { "rouge_l": 0.48, "latency_ms": 612 },
"_sference": {
"region": "eu-de-fra",
"batch": "bch_e42f"
}
}Pricing is announced at launch. Eval sweeps are a natural 24h job — 48h is fine when the next deploy is Monday.
pricing at launchSmoke sweep on a small benchmark.
Same-day regression check.
Canonical eval window.
Full matrix sweeps, weekend runs.
Flagship general catalog model; pinned version, reproducible across reruns.
Dense multimodal alternative; useful as a judge or a side-by-side baseline.
Upload weights, pin a version, run side-by-side with catalog models.
Export the report as an Annex IV artefact.
Every completion is tied to a pinned model version, region, and batch ID. Export the report as JSONL or HTML and attach it to your EU AI Act technical documentation. Same artefact works for customers' vendor-risk reviews.
Stop paying realtime prices for work that can wait.
We're in early access. Drop your email — if your workload fits, we'll send you API credentials and you're good to go.