Generate a million samples overnight — with receipts.

Synthetic data is naturally batch. Long windows on spot capacity get you teacher-quality samples at a price that makes regeneration a reflex, not a decision. Every sample carries a model version and batch ID.

1M samples·24h window·pricing at launch
We'll tag your request with synthetic data generation and pair you with a design-partner slot.
The workload

You're bootstrapping a fine-tune, distilling a frontier model into a smaller one, stress-testing a classifier with adversarial prompts, or just filling in a training-set gap. The jobs are embarrassingly parallel and fundamentally non-interactive — yet realtime APIs charge realtime prices and leave you piecing together provenance after the fact.

Why it's different on sference
Async fit

Generation runs that regenerate cheaply.

Pick a 24h or 48h window and generate at spot-capacity cost. Partial-run checkpointing means a reclaim never loses more than the current shard — the rest resumes automatically.

Teacher choice

Qwen3.6 Plus, Mistral-Medium 3 — or your own teacher.

Open-weight frontier models served on demand, so your teacher only costs you while it's running. BYOM if you've licensed or trained a specific teacher.

Provenance

Model version + batch ID on every sample.

Exportable as JSONL ready for Annex IV documentation. Reproducible regenerations. No silent model swaps between runs.

Example

Example — generate 1M instruction-tuning samples.

sference — synthetic
$ sference batch ./seeds.jsonl --model qwen3.6-plus --template ./prompt.tpl --window 24h
→ uploading 100,000 seeds (84 MB)
→ batch bch_7c4f queued · eta 19h 10m · sla 24h
→ fanout 10x per seed → 1,000,000 samples
▸ 32 shards · 4 EU providers · streaming output
▸ shard 14/32 · de-fra · preempted → rescheduled
✓ completed 999,412/1,000,000 · 19h 02m · manifest exported
input · seed
{ "id": "seed_0042", "topic": "procurement clauses", "n": 10 }
output · sample
{
  "id": "seed_0042.03",
  "prompt": "Rewrite the following procurement clause…",
  "completion": "…",
  "_sference": {
    "model": "qwen3.6-plus@2026-04",
    "temperature": 0.8,
    "region": "eu-fi-hel",
    "batch": "bch_7c4f"
  }
}
SLA & cost

Pricing is announced at launch. Long windows are where synthetic generation wins — the cost curve makes regeneration cheap.

pricing at launch
1h

Tight iteration, small sample counts.

Baseline
6h

Same-day data refresh.

Cheaper
24h

Canonical synthetic-data window.

Much cheaper
48h

Full-corpus distillation runs.

Cheapest
Recommended models
Qwen3.6 PlusCatalog

Flagship MoE teacher; hybrid attention, multilingual, strong reasoning.

Mistral-Medium 3Catalog

European teacher with quality-tier output; solid English + EU languages.

Your teacherBYOM

Licensed or fine-tuned — served under the same audit trail.

Compliance

Annex IV-ready provenance.

Every sample records teacher model version, sampling params, region, and batch ID. Export the full batch manifest as a signed JSONL and drop it straight into your EU AI Act technical documentation.

05Early access

Stop paying realtime prices for work that can wait.

We're in early access. Drop your email — if your workload fits, we'll send you API credentials and you're good to go.