Replicate API

Name: Replicate API API
Brand: Replicate API
Availability: InStock

★ Only Publicly Available OpenAPI DocumentAI/MLMl Inferencebearer20 EndpointsREST

For Agents

Run predictions on thousands of open-source ML models, train custom versions, and deploy dedicated infrastructure. Supports image, text, audio, and video models with automatic scaling.

Quickstart

Get started with Replicate API in minutes using your preferred integration method.

# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
  "jentic": {
    "url": "https://api.jentic.com/mcp",
    "auth": "oauth"
  }
}

# Then ask your agent:
"run a prediction on an ML model"

# → Jentic returns the GET /events tool with parameter schema, agent executes.

Capabilities

What an agent can do with Replicate API API.

Run predictions on any public or private model with automatic GPU provisioning

Train custom model versions on your own datasets with configurable hardware

Deploy models to dedicated always-on infrastructure for low-latency production traffic

Browse curated model collections organized by task like text-to-image or speech synthesis

GET STARTED

Start building with Replicate API API

Explore with Jentic

View OpenAPI Document

Jentic publishes the only available OpenAPI document for Replicate API, keeping it validated and agent-ready.

Jentic publishes the only available OpenAPI specification for Replicate API, keeping it validated and agent-ready. Run open-source ML models in the cloud without managing infrastructure across 20 endpoints covering predictions, model versioning, training, deployments, and collections. Supports thousands of community-contributed models for image generation, language processing, audio synthesis, and video creation with automatic GPU scaling and pay-per-second billing.

Use Cases

Patterns agents use Replicate API API for, with concrete tasks.

★ AI Agent Model Inference via Jentic

AI agents discover and invoke ML models on Replicate through Jentic's intent-based search. Agents specify what they need (e.g., 'generate an image from text') and Jentic returns matching Replicate operations with input schemas for the specific model version. No SDK setup or model hosting required — agents call POST /v1/predictions with a model version ID and inputs, then poll for results.

Search Jentic for 'run an image generation model', load the POST /v1/predictions schema, and execute with version ID for stable-diffusion and a text prompt input

On-Demand Image Generation

Generate images from text prompts by running predictions against community models like Stable Diffusion, FLUX, and SDXL. Create a prediction via POST /v1/predictions with the model version and prompt, then poll until the output URL is available. Replicate handles GPU provisioning, scaling to zero when idle, and pay-per-second billing so you only pay for actual compute time.

Create a prediction on POST /v1/predictions with a Stable Diffusion XL version ID, input prompt 'a mountain landscape at sunset', and poll GET /v1/predictions/{prediction_id} until status is 'succeeded'

Custom Model Training

Fine-tune open-source models on custom datasets using Replicate's training endpoints. Create a training run via POST /v1/models/{model_owner}/{model_name}/versions/{version_id}/trainings with your training data and hyperparameters. Monitor progress via GET /v1/trainings/{training_id}. The resulting model version can be used for predictions immediately or deployed to dedicated hardware.

Create a training run via POST /v1/models/stability-ai/sdxl/versions/{version_id}/trainings with a dataset URL and 2000 training steps, then poll for completion

Production Model Deployment

Deploy models to dedicated always-on infrastructure for consistent low-latency responses via POST /v1/deployments. Unlike on-demand predictions that cold-start from zero, deployments keep models warm on reserved GPUs. Run predictions against deployments via POST /v1/deployments/{deployment_owner}/{deployment_name}/predictions for predictable latency in production applications.

Create a deployment via POST /v1/deployments for a text generation model with min_instances=1, then run a prediction via POST /v1/deployments/{owner}/{name}/predictions

Key Endpoints

20 endpoints — jentic publishes the only available openapi specification for replicate api, keeping it validated and agent-ready.

METHOD

PATH

DESCRIPTION

POST

/v1/predictions

Run a prediction on a model version

GET

/v1/predictions/{prediction_id}

Get prediction status and output

POST

/v1/predictions/{prediction_id}/cancel

Cancel a running prediction

GET

/v1/models

List available models

GET

/v1/models/{model_owner}/{model_name}/versions

List versions of a model

POST

/v1/deployments

Create a dedicated model deployment

GET

/v1/collections/{collection_slug}

Get models in a curated collection

GET

/v1/hardware

List available hardware options

POST

/v1/predictions

Run a prediction on a model version

GET

/v1/predictions/{prediction_id}

Get prediction status and output

POST

/v1/predictions/{prediction_id}/cancel

Cancel a running prediction

GET

/v1/models

List available models

GET

/v1/models/{model_owner}/{model_name}/versions

List versions of a model

Why though Jentic?

Three things that make agents converge on Jentic-routed access.

Credential isolation

Replicate Bearer tokens are stored encrypted in the Jentic vault (MAXsystem). Agents receive scoped access tokens — raw API tokens (r8_...) never enter the agent's context or logs.

Intent-based discovery

Agents search by intent (e.g., 'run an image generation model') and Jentic returns matching Replicate operations with input schemas, model version requirements, and hardware options, so the agent can launch predictions without browsing the model catalog.

Time to first call

Direct Replicate integration: 1-2 days for auth, model discovery, async polling, and output handling. Through Jentic: under 1 hour — search for the operation, load schema, execute and poll.

Related APIs

Alternatives and complements available in the Jentic catalogue.

Alternative

Hugging Face API

Model hub with inference API and broader ecosystem of datasets and spaces

Choose Hugging Face when you need access to the largest model repository, dataset hosting, or prefer their Inference Endpoints for dedicated hosting

Alternative

Stability AI API

Official Stable Diffusion API with direct vendor support and optimizations

Choose Stability AI when you specifically need Stable Diffusion models with official vendor optimization, upscaling, and inpainting features

Complementary

OpenAI API

Proprietary LLMs and DALL-E for tasks not covered by open-source models

Use OpenAI alongside Replicate when you need GPT-4o for reasoning tasks or DALL-E 3 for image generation that complements open-source model outputs

Complementary

Pinecone API

Vector database for storing embeddings generated by Replicate models

Use Pinecone alongside Replicate to store embeddings from open-source embedding models and build retrieval systems

FAQs

Specific to using Replicate API API through Jentic.

Why is there no official OpenAPI spec for Replicate API?

Replicate does not publish an OpenAPI specification. Jentic generates and maintains this spec so that AI agents and developers can call Replicate API via structured tooling. It is validated against the live API and kept up to date. Get started at https://app.jentic.com/sign-up.

What authentication does the Replicate API use?

The Replicate API uses Bearer token authentication. Pass your API token in the Authorization header as 'Bearer r8_...'. Through Jentic, your Replicate token is stored encrypted in the MAXsystem vault and agents receive scoped access without the raw token entering their context.

Can I run any open-source model on Replicate?

Yes. Replicate hosts thousands of community-contributed models accessible via POST /v1/predictions. Specify the model version ID and input parameters. Popular models include Stable Diffusion XL, FLUX, LLaMA, and Whisper. You can also push your own models using Cog packaging and run them through the same predictions API.

What are the rate limits for the Replicate API?

Replicate does not enforce strict per-minute rate limits. Instead, concurrency is limited by your plan: free accounts get 1 concurrent prediction, paid accounts scale based on GPU availability. The API returns 429 status codes if you exceed concurrent prediction limits. Deployment endpoints have separate concurrency based on configured instances.

How do I run a prediction on Replicate through Jentic?

Search Jentic for 'run a model prediction on Replicate' to discover the POST /v1/predictions operation. The schema requires a version ID (model version hash) and an input object matching the model's schema. Execute through Jentic's SDK (pip install jentic) and poll the returned prediction URL until status shows 'succeeded'. The output field contains your results.

What is the difference between predictions and deployments?

POST /v1/predictions runs inference on shared, auto-scaling infrastructure that cold-starts from zero — ideal for variable traffic and cost efficiency. POST /v1/deployments creates dedicated always-on GPU instances that stay warm — ideal for production workloads needing consistent sub-second latency. Deployments cost more but eliminate cold-start delays.