Replicate
ML model hostingCheapRun and deploy any open-source ML model via API — no GPU to manage.
What it is
Replicate runs, fine-tunes and deploys thousands of open-source ML models (image, video, audio, LLMs) through simple API calls, billed by the second of compute — no servers to manage. It hosts the FLUX image family and lets you package your own models with the open-source Cog tool, autoscaling down to zero when idle.
What it can do
- One API for thousands of community models
- FLUX image models (schnell / dev / pro)
- LLMs and audio/video models
- Deploy custom models via Cog (open source)
- Fine-tune models on your data
- GPUs up to 8× H100, autoscale to zero
- Per-second or per-output billing
- Client libraries and webhooks
Pricing
Free tier
No standing free tier; small trial credit, then pay-as-you-go
Paid from
from ~$0.09/hr CPU; FLUX schnell ~$0.003/img
Similar services
| Service | How it differs |
|---|---|
| Replicatefrom ~$0.09/hr CPU; FLUX schnell ~$0.003/img | Run and deploy any open-source ML model via API — no GPU to manage. |
| fal.ai | Faster, latency-optimized inference for image/video diffusion models. |
| Hugging Face | Model hub with Inference Endpoints; broadest open-model ecosystem. |
| Modal | Serverless GPU platform for custom Python ML code at scale. |