
Civo
Best for Kubernetes-native AI applications, Developer deployments

Best for LLM Serverless APIs, Fast Image Generation, Voice AI
DeepInfra provides exceptionally fast and highly scalable serverless inference for open-source foundation models. Their custom inference engine dramatically reduces token latency, making it the preferred API provider for developers building snappy generative AI applications using Llama, Mistral, and Stable Diffusion.
| GPU Models | H100, A100, RTX A6000 |
| GPU Types | A100, A6000, H100 |
| Headquarters | San Francisco, CA, USA |
| Founded | 2022 |
| Availability | Available Now |
| Website | deepinfra.com ↗ |
💡 Pricing note: Rates shown are indicative. Final pricing depends on GPU model, reservation type (spot vs. on-demand), contract length, and region. Get an exact quote →
DeepInfra GPU cloud pricing starts from $0.15/hr depending on GPU type, reservation model (on-demand vs. spot vs. reserved), and region. Use the quote form to get exact pricing for your specific workload.
DeepInfra offers H100, A100, RTX A6000 GPU instances. Availability varies by region and configuration. Contact the provider through ComputeStacker for current availability.
DeepInfra operates data centers in US East, US West. Choosing a region close to your users minimises latency and can help with data residency compliance requirements.
Use the "Get a Quote" button on this page to submit your GPU requirements. ComputeStacker will forward your request to DeepInfra and other matching providers. You'll receive proposals within 24 hours — no commitment required.
DeepInfra offers high-performance GPU infrastructure suitable for large language model training and fine-tuning workloads. For large-scale distributed training, check the Specs tab for NVLink and InfiniBand interconnect availability.

Best for Kubernetes-native AI applications, Developer deployments

Best for Environmentally conscious organizations, AI Training

Best for European data compliance, large bare metal deployments