Replicate

ML model hostingCheap

Run and deploy any open-source ML model via API — no GPU to manage.

What it is

Replicate runs, fine-tunes and deploys thousands of open-source ML models (image, video, audio, LLMs) through simple API calls, billed by the second of compute — no servers to manage. It hosts the FLUX image family and lets you package your own models with the open-source Cog tool, autoscaling down to zero when idle.

What it can do

One API for thousands of community models
FLUX image models (schnell / dev / pro)
LLMs and audio/video models
Deploy custom models via Cog (open source)
Fine-tune models on your data
GPUs up to 8× H100, autoscale to zero
Per-second or per-output billing
Client libraries and webhooks

Pricing

Free tier

No standing free tier; small trial credit, then pay-as-you-go

Paid from

from ~$0.09/hr CPU; FLUX schnell ~$0.003/img

Similar services

Service	How it differs
Replicatefrom ~$0.09/hr CPU; FLUX schnell ~$0.003/img	Run and deploy any open-source ML model via API — no GPU to manage.
fal.ai	Faster, latency-optimized inference for image/video diffusion models.
Hugging Face	Model hub with Inference Endpoints; broadest open-model ecosystem.
Modal	Serverless GPU platform for custom Python ML code at scale.

Skills that use this service

content-gen→