All services

Replicate

ML model hostingCheap

Run and deploy any open-source ML model via API — no GPU to manage.

What it is

Replicate runs, fine-tunes and deploys thousands of open-source ML models (image, video, audio, LLMs) through simple API calls, billed by the second of compute — no servers to manage. It hosts the FLUX image family and lets you package your own models with the open-source Cog tool, autoscaling down to zero when idle.

What it can do

  • One API for thousands of community models
  • FLUX image models (schnell / dev / pro)
  • LLMs and audio/video models
  • Deploy custom models via Cog (open source)
  • Fine-tune models on your data
  • GPUs up to 8× H100, autoscale to zero
  • Per-second or per-output billing
  • Client libraries and webhooks

Pricing

Free tier

No standing free tier; small trial credit, then pay-as-you-go

Paid from

from ~$0.09/hr CPU; FLUX schnell ~$0.003/img

Similar services

ServiceHow it differs
Replicatefrom ~$0.09/hr CPU; FLUX schnell ~$0.003/imgRun and deploy any open-source ML model via API — no GPU to manage.
fal.ai Faster, latency-optimized inference for image/video diffusion models.
Hugging Face Model hub with Inference Endpoints; broadest open-model ecosystem.
Modal Serverless GPU platform for custom Python ML code at scale.

Skills that use this service