Rebuild your RAG index — without the realtime markup.
Embedding 10 million chunks on a hosted API is a budget item. Embedding them on sference is a batch job. Run any open-weight embedder or your own, on European infrastructure.
Building a RAG corpus, re-embedding after a model swap, or maintaining semantic search over a document store — the per-vector math is trivial, the volume isn't. Realtime embedding APIs bill for latency you don't need and route through US infrastructure that your compliance team doesn't want.
Index builds are the textbook batch workload.
No user is waiting. You want the index by morning. A 6h or 24h window on spot capacity gets the same vectors at a fraction of the hosted-API cost.
Qwen3-Embedding, BGE — and BYOM.
Pick an embedder that matches your domain and language mix. We serve anything vLLM or SGLang hosts, including your own fine-tunes. Model version pinned per batch, so your index stays consistent.
Source data never leaves Europe.
Chunks are embedded on EU GPUs. Source text is processed under your retention policy and deleted on completion if you choose. Vectors and metadata stream back to your store.
Example — embed 10M chunks into your vector store.
{ "id": "doc_0042#p07", "text": "Section 4.2 — In the event of early termination…" }{
"id": "doc_0042#p07",
"vector": [0.0183, -0.0412, …],
"_sference": {
"model": "bge-m3@2026-04",
"dim": 1024,
"region": "eu-nl-ams",
"batch": "bch_e9d1"
}
}Pricing is announced at launch. Longer windows map to cheaper spot capacity — index builds are the canonical fit.
pricing at launchRush re-embed before a release.
Same-day index refresh.
Overnight full rebuild.
Corpus-wide re-embed after a model swap.
Current MTEB multilingual leader; 100+ languages; flexible output dims.
Strong multilingual default; 1024-dim; dense + sparse + multi-vector.
Domain-adapted embedder, same batch pipeline.
EU residency, end to end.
Source text is processed on European GPUs, never routed through US infrastructure. Inputs can be deleted on completion; batch manifests carry model version, region, and dim so the index is reproducible and auditable.
Stop paying realtime prices for work that can wait.
We're in early access. Drop your email — if your workload fits, we'll send you API credentials and you're good to go.