Async AI inference on excess GPU capacity

AI inference for async pipelines.

$sference batch ./workload.jsonl --model qwen3.5-35b --window overnight

Trade latency for massive savings. We aggregate spot and preemptible GPU capacity across EU providers into a single compute pool — run any open-weight model or bring your own fine-tune. Sovereign EU infrastructure with compliance built in.

Federated spot GPU capacity

We aggregate excess and preemptible GPU capacity across multiple EU providers. You get volume pricing without volume commitments and no single-vendor dependency.

Trade latency for cost

Priority (~1hr) and overnight (~24hr) delivery windows. Batch workloads are naturally interruptible and resumable — the structural cost advantage of non-realtime processing. Up to 75% off.

Any model. Including yours.

Open-weight models from the Qwen, Mistral, and Llama families — or bring your own fine-tune. If it runs on vLLM/SGLang, we serve it. No model lock-in.

Sovereign EU infrastructure

Every request processed on EU GPUs. No US CLOUD Act exposure. Full compliance audit trail, configurable retention, and exportable reports. DPA included.

Why we're cheaper

Most batch APIs give you a 50% discount and a 24-hour window. Our architecture is built from the ground up for async workloads on excess GPU capacity.

Hardware-agnostic GPU federation

We abstract across multiple EU GPU providers — different hardware generations, different pricing. Workloads route to the best available capacity. No single-vendor dependency.

Spot instance economics

Non-realtime processing lets us use preemptible and spot capacity at significant discounts. Batch workloads are naturally interruptible and resumable — if a spot instance is reclaimed, the orchestrator reschedules remaining chunks.

On-demand model loading

Without millisecond latency requirements, we cold-start models per batch job rather than keeping them resident in GPU memory. This enables BYOM — upload your fine-tuned weights, we load for the job, process, and release.

Smart routing and chunking

Each batch decomposes into chunks distributed across available GPUs. The orchestrator handles scheduling, fault tolerance, checkpoint resumption, and provider selection.

Compliance built into the runtime

Full request traceability, configurable retention, exportable reports, transparent model provenance. Built into the infrastructure, not bolted on after the fact.

US batch APIs offer discounts but no EU sovereignty or BYOM. Realtime inference platforms can't use spot capacity. EU datacenters sell raw GPU hours with no batch optimization. We combine async batch processing on federated spot GPUs with any model including BYOM, EU sovereignty, and compliance traceability — nobody else does all five.

Built for regulated verticals

For SaaS companies whose customers demand compliance. One integration brings thousands of end-users through your API — with audit trails their compliance teams can verify.

FinTech

Batch KYC extraction, transaction classification, statement processing. Overnight processing with full audit trail for regulated financial data.

LegalTech

Contract corpus analysis, document review, embedding generation for legal RAG. Sovereign processing for sensitive legal data.

HealthTech

Medical record digitization, prescription extraction, clinical data processing. Full compliance traceability for patient data.

InsureTech

Claims processing, policy document analysis, underwriting data extraction. Structured output with configurable retention.

AI/ML Teams

Model evals, synthetic data generation, fine-tuning data prep on sensitive datasets. Run thousands of evaluations in hours, not days.

Document Processing

Invoices, contracts, forms at scale. Any open-weight model or your own fine-tune. Cost-optimized batch processing with full governance.

Pricing

Pick a delivery window. We use spot and preemptible GPU capacity — the longer you can wait, the deeper the discount.

Dev Mode
Realtime
Full price

Prompt iteration and testing pipelines.

Priority
~1 hour
Up to 50% off

Background agents and production workflows.

Overnight
~24 hours
Up to 75% off

Large batch jobs and bulk processing.

No credit card required. No minimum spend. Pay only for tokens used.

For your compliance team

The section your engineer can forward to their CTO — and their customer's compliance officer. Our compliance dashboard serves both layers: operational for your team, audit-ready for your customer.

Your customers keep asking where their data goes.

Now you have an answer. Sovereign infrastructure, full audit trail, exportable compliance reports, DPA included. Give your customer's compliance team a dashboard link — not a "we take security seriously" PDF. EU-only processing guaranteed architecturally, not by policy.

Regulatory deadlines

DORA is in enforcement. AI Act begins August 2026.

DORA already requires financial institutions to assess third-party AI risk. The EU AI Act's deployer obligations begin August 2026 — transparency and traceability for any AI touching regulated data. We build it into the infrastructure so you don't have to.

Bring your own model, keep compliance.

Fine-tuned on proprietary data? Run it on our infrastructure with the same compliance guarantees as any catalog model. Same audit trail, same dashboard, same exportable reports. Transparent model provenance — you know exactly what processed your data.

Built by engineers from

Sentry·Adobe·Facebook·Celtra
Jernej Strasner
CEO

Founded Specto (acquired by Sentry). Director of Engineering at Sentry, leading teams processing billions of events/day. Former Tech Lead at Facebook.

Aleksander Pejcic
CTO

Sr ML Engineering Manager at Adobe, leading 20K+ GPU AI Platform for Adobe Firefly. Former VP of Engineering & Product at Celtra.

Benjamin Dobnikar
Business Development

CEO of Iryo (healthcare tech). Deep network in Slovenian and EU tech ecosystems.

AI Act ReadyGDPR CompliantEU Data ResidencyDORA Ready

FAQ