Rebuild your RAG index — without the realtime markup.

Embedding 10 million chunks on a hosted API is a budget item. Embedding them on sference is a batch job. Run any open-weight embedder or your own, on European infrastructure.

10M chunks·6h window·pricing at launch
We'll tag your request with embeddings at scale and pair you with a design-partner slot.
The workload

Building a RAG corpus, re-embedding after a model swap, or maintaining semantic search over a document store — the per-vector math is trivial, the volume isn't. Realtime embedding APIs bill for latency you don't need and route through US infrastructure that your compliance team doesn't want.

Why it's different on sference
Async fit

Index builds are the textbook batch workload.

No user is waiting. You want the index by morning. A 6h or 24h window on spot capacity gets the same vectors at a fraction of the hosted-API cost.

Open weights

Qwen3-Embedding, BGE — and BYOM.

Pick an embedder that matches your domain and language mix. We serve anything vLLM or SGLang hosts, including your own fine-tunes. Model version pinned per batch, so your index stays consistent.

Residency

Source data never leaves Europe.

Chunks are embedded on EU GPUs. Source text is processed under your retention policy and deleted on completion if you choose. Vectors and metadata stream back to your store.

Example

Example — embed 10M chunks into your vector store.

sference — embeddings
$ sference embed ./chunks.jsonl --model bge-m3 --window 6h --out ./vectors.jsonl
→ uploading 10,000,000 chunks (12.4 GB)
→ batch bch_e9d1 queued · eta 4h 30m · sla 6h
→ dim 1024 · fp16 · normalized
▸ 8 shards · 3 EU providers · streaming output
✓ completed 10,000,000/10,000,000 · 4h 18m
✓ vectors.jsonl · 40.2 GB · batch manifest exported
input · chunk
{ "id": "doc_0042#p07", "text": "Section 4.2 — In the event of early termination…" }
output · vector
{
  "id": "doc_0042#p07",
  "vector": [0.0183, -0.0412, …],
  "_sference": {
    "model": "bge-m3@2026-04",
    "dim": 1024,
    "region": "eu-nl-ams",
    "batch": "bch_e9d1"
  }
}
SLA & cost

Pricing is announced at launch. Longer windows map to cheaper spot capacity — index builds are the canonical fit.

pricing at launch
1h

Rush re-embed before a release.

Baseline
6h

Same-day index refresh.

Cheaper
24h

Overnight full rebuild.

Much cheaper
48h

Corpus-wide re-embed after a model swap.

Cheapest
Recommended models
Qwen3-Embedding-8BCatalog

Current MTEB multilingual leader; 100+ languages; flexible output dims.

BGE-M3Catalog

Strong multilingual default; 1024-dim; dense + sparse + multi-vector.

Your fine-tuneBYOM

Domain-adapted embedder, same batch pipeline.

Compliance

EU residency, end to end.

Source text is processed on European GPUs, never routed through US infrastructure. Inputs can be deleted on completion; batch manifests carry model version, region, and dim so the index is reproducible and auditable.

05Early access

Stop paying realtime prices for work that can wait.

We're in early access. Drop your email — if your workload fits, we'll send you API credentials and you're good to go.