
Microsoft Azure
Best for Enterprises, OpenAI Integrations, Hybrid Cloud

Best for Serverless Image Generation, LLM API inference, Open-Source Model Hosting
Replicate is an API-first machine learning cloud that lets developers run open-source models with thousands of pre-configured endpoints. When you need to integrate Llama 3 or Flux into an app quickly, Replicate handles the entire underlying GPU infrastructure.
| GPU Models | H100, A100 80GB, A100 40GB, A40 |
| GPU Types | A100, H100 |
| Headquarters | San Francisco, CA, USA |
| Founded | 2019 |
| Availability | Available Now |
| Website | replicate.com ↗ |
💡 Pricing note: Rates shown are indicative. Final pricing depends on GPU model, reservation type (spot vs. on-demand), contract length, and region. Get an exact quote →
Replicate GPU cloud pricing starts from $0.36/hr depending on GPU type, reservation model (on-demand vs. spot vs. reserved), and region. Use the quote form to get exact pricing for your specific workload.
Replicate offers H100, A100 80GB, A100 40GB, A40 GPU instances. Availability varies by region and configuration. Contact the provider through ComputeStacker for current availability.
Replicate operates data centers in EU West, US East, US West. Choosing a region close to your users minimises latency and can help with data residency compliance requirements.
Use the "Get a Quote" button on this page to submit your GPU requirements. ComputeStacker will forward your request to Replicate and other matching providers. You'll receive proposals within 24 hours — no commitment required.
Replicate offers high-performance GPU infrastructure suitable for large language model training and fine-tuning workloads. For large-scale distributed training, check the Specs tab for NVLink and InfiniBand interconnect availability.

Best for Enterprises, OpenAI Integrations, Hybrid Cloud

Best for Enterprise AI Training, Multi-Tenant GPU Clusters, Cost-Effective H100 Access

Best for Environmentally conscious organizations, AI Training