Rebuild your RAG index — without the realtime markup.

Embedding 10 million chunks on a hosted API is a budget item. Embedding them on sference is a batch job. Run any open-weight embedder or your own, on European infrastructure.

10M chunks·6h window·pricing at launch

We'll tag your request with embeddings at scale and pair you with a design-partner slot.

The workload

Building a RAG corpus, re-embedding after a model swap, or maintaining semantic search over a document store — the per-vector math is trivial, the volume isn't. Realtime embedding APIs bill for latency you don't need and route through US infrastructure that your compliance team doesn't want.

—Why it's different on sference

Async fit

Index builds are the textbook batch workload.

No user is waiting. You want the index by morning. A 6h or 24h window on spot capacity gets the same vectors at a fraction of the hosted-API cost.

Open weights

Qwen3-Embedding, BGE — and BYOM.

Pick an embedder that matches your domain and language mix. We serve anything vLLM or SGLang hosts, including your own fine-tunes. Model version pinned per batch, so your index stays consistent.

Residency

Source data never leaves Europe.

Chunks are embedded on EU GPUs. Source text is processed under your retention policy and deleted on completion if you choose. Vectors and metadata stream back to your store.

—Example

Example — embed 10M chunks into your vector store.

sference — embeddings

$ sference embed ./chunks.jsonl --model bge-m3 --window 6h --out ./vectors.jsonl

→ uploading 10,000,000 chunks (12.4 GB)

→ batch bch_e9d1 queued · eta 4h 30m · sla 6h

→ dim 1024 · fp16 · normalized

▸ 8 shards · 3 EU providers · streaming output

✓ completed 10,000,000/10,000,000 · 4h 18m

✓ vectors.jsonl · 40.2 GB · batch manifest exported

input · chunk

{ "id": "doc_0042#p07", "text": "Section 4.2 — In the event of early termination…" }

output · vector

{
  "id": "doc_0042#p07",
  "vector": [0.0183, -0.0412, …],
  "_sference": {
    "model": "bge-m3@2026-04",
    "dim": 1024,
    "region": "eu-nl-ams",
    "batch": "bch_e9d1"
  }
}

—SLA & cost

Pricing is announced at launch. Longer windows map to cheaper spot capacity — index builds are the canonical fit.

pricing at launch

Rush re-embed before a release.

Baseline

Same-day index refresh.

Cheaper

24h

Overnight full rebuild.

Much cheaper

48h

Corpus-wide re-embed after a model swap.

Cheapest

—Recommended models

Qwen3-Embedding-8BCatalog

Current MTEB multilingual leader; 100+ languages; flexible output dims.

BGE-M3Catalog

Strong multilingual default; 1024-dim; dense + sparse + multi-vector.

Your fine-tuneBYOM

Domain-adapted embedder, same batch pipeline.

—Compliance

EU residency, end to end.

Source text is processed on European GPUs, never routed through US infrastructure. Inputs can be deleted on completion; batch manifests carry model version, region, and dim so the index is reproducible and auditable.

—Other workloads

All use cases →

01Workload

Structured extraction

Turn PDFs, scans, and forms into clean JSON — with an audit trail per document.

100K invoices·24h window·pricing at launch

03Workload

Synthetic data generation

Generate training sets, distillation corpora, and red-team prompts — with full per-sample provenance.

1M samples·24h window·pricing at launch

04Workload

Model evals

Run full eval sweeps across base models, fine-tunes, and prompt variants — reproducible and export-ready.

4 models · 10K prompts·24h window·pricing at launch

05Early access

Stop paying realtime prices for work that can wait.

We're in early access. Drop your email — if your workload fits, we'll send you API credentials and you're good to go.