ElevenLabs API

Name: ElevenLabs API API
Brand: ElevenLabs API
Availability: InStock

★ Only Publicly Available OpenAPI DocumentAI/MLSpeechapiKey151 EndpointsREST

For Agents

Synthesize speech from text with custom or cloned voices, convert speech between voices, and generate sound effects. Supports 29+ languages with streaming output for real-time audio.

Quickstart

Get started with ElevenLabs API in minutes using your preferred integration method.

# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
  "jentic": {
    "url": "https://api.jentic.com/mcp",
    "auth": "oauth"
  }
}

# Then ask your agent:
"convert text to natural-sounding speech"

# → Jentic returns the GET /events tool with parameter schema, agent executes.

Capabilities

What an agent can do with ElevenLabs API API.

Synthesize natural speech from text with 29+ language support and emotion control

Clone custom voices from uploaded audio samples for personalized synthesis

Convert speech from one voice to another while preserving content and timing

Generate sound effects and ambient audio from text descriptions

GET STARTED

Start building with ElevenLabs API API

Explore with Jentic

View OpenAPI Document

Use Cases

Patterns agents use ElevenLabs API API for, with concrete tasks.

★ AI Agent Voice Synthesis via Jentic

AI agents discover and invoke ElevenLabs text-to-speech through Jentic's intent-based search to add voice output to conversational interfaces. Agents search for the synthesis operation, receive the input schema including voice_id and model_id parameters, and execute requests that return audio streams. No SDK setup required — agents specify text, voice, and output format and receive MP3 or PCM audio directly.

Search Jentic for 'convert text to speech', load the POST /v1/text-to-speech/{voice_id} schema, and execute with a voice ID and 200-word text input to receive MP3 audio

Custom Voice Cloning

Create custom voices from audio samples using voice generation endpoints. Upload reference audio files via voice samples, configure voice characteristics, and use the resulting voice_id for all subsequent text-to-speech calls. Supports both instant voice cloning from a single sample and professional voice cloning with higher fidelity from multiple samples.

Upload an audio sample to create a voice clone via POST /v1/voice-generation/create-voice, then synthesize text using the returned voice_id via POST /v1/text-to-speech/{voice_id}

Real-Time Streaming Speech

Stream synthesized audio in real-time using the POST /v1/text-to-speech/{voice_id}/stream endpoint for low-latency applications. Returns chunked audio as it is generated, enabling playback to begin before the full synthesis completes. Ideal for voice assistants, live narration, and interactive dialogue systems where response time matters more than batch processing efficiency.

Stream a 500-word article as speech via POST /v1/text-to-speech/{voice_id}/stream with output_format 'mp3_44100_128' and begin playback from the first audio chunk

Audio Isolation and Enhancement

Separate vocals from background noise in audio recordings using POST /v1/audio-isolation. Upload mixed audio and receive a clean vocal track with background sounds removed. Useful for podcast editing, meeting transcription preprocessing, and cleaning noisy recordings before speech-to-text or voice conversion processing.

Submit a noisy audio recording to POST /v1/audio-isolation and retrieve the cleaned vocal track for downstream transcription

Sound Effect Generation

Generate custom sound effects from text descriptions using POST /v1/sound-generation. Describe the desired sound in natural language and receive synthesized audio matching the description. Supports ambient sounds, foley effects, and musical elements for game development, video production, and interactive media without requiring a sound library.

Generate a 'thunderstorm with heavy rain' sound effect via POST /v1/sound-generation with duration of 10 seconds

Key Endpoints

151 endpoints — convert text to natural-sounding speech, clone voices, and process audio across 151 endpoints covering text-to-speech, speech-to-speech, voice generation, sound effects, dubbing, and audio isolation.

METHOD

PATH

DESCRIPTION

POST

/v1/text-to-speech/{voice_id}

Convert text to speech with a specific voice

POST

/v1/text-to-speech/{voice_id}/stream

Stream synthesized speech in real-time

POST

/v1/speech-to-speech/{voice_id}

Convert speech from one voice to another

POST

/v1/sound-generation

Generate sound effects from text descriptions

POST

/v1/audio-isolation

Isolate vocals from background audio

GET

/v1/voices

List all available voices

POST

/v1/voice-generation/create-voice

Create a voice from generated parameters

GET

/v1/history

Retrieve speech generation history

POST

/v1/text-to-speech/{voice_id}

Convert text to speech with a specific voice

POST

/v1/text-to-speech/{voice_id}/stream

Stream synthesized speech in real-time

POST

/v1/speech-to-speech/{voice_id}

Convert speech from one voice to another

POST

/v1/sound-generation

Generate sound effects from text descriptions

POST

/v1/audio-isolation

Isolate vocals from background audio

Why though Jentic?

Three things that make agents converge on Jentic-routed access.

Credential isolation

ElevenLabs API keys (xi-api-key header) are stored encrypted in the Jentic vault (MAXsystem). Agents receive scoped access tokens — raw API keys never enter the agent's context or logs.

Intent-based discovery

Agents search by intent (e.g., 'convert text to speech with a custom voice') and Jentic returns matching ElevenLabs operations with input schemas including voice_id options, model selections, and output format parameters.

Time to first call

Direct ElevenLabs integration: 1-3 days for auth, voice selection, streaming setup, and format handling. Through Jentic: under 1 hour — search for the operation, load schema, execute.

Related APIs

Alternatives and complements available in the Jentic catalogue.

Alternative

OpenAI API

TTS and Whisper transcription as part of a broader AI model ecosystem

Choose OpenAI when you need basic TTS alongside chat completions and transcription in a single API, but prefer ElevenLabs for higher-quality voices, cloning, and streaming

Complementary

AssemblyAI API

Speech-to-text transcription and audio intelligence to complement voice synthesis

Use AssemblyAI alongside ElevenLabs when you need accurate transcription, speaker diarization, or content moderation for audio that ElevenLabs generates or processes

Complementary

Deepgram API

Real-time speech-to-text with streaming transcription for conversational AI

Use Deepgram alongside ElevenLabs to build full-duplex voice interfaces — Deepgram handles speech recognition while ElevenLabs handles synthesis

Complementary

Stability AI API

Image and video generation to pair with audio for multimedia content creation

Use Stability AI alongside ElevenLabs when building multimedia content pipelines that need both visual assets and voiceover narration

FAQs

Specific to using ElevenLabs API API through Jentic.

What authentication does the ElevenLabs API use?

The ElevenLabs API uses an API key passed in the xi-api-key header with every request. You can find your key in the Profile tab at elevenlabs.io. Through Jentic, your xi-api-key is stored encrypted in the MAXsystem vault and agents receive scoped access without the raw key entering their context.

Can I clone a custom voice with the ElevenLabs API?

Yes. Upload audio samples via the voice creation endpoints to create a voice clone. Instant voice cloning works from a single short sample via POST /v1/voice-generation/create-voice. Professional cloning requires multiple samples for higher fidelity. The resulting voice_id can be used in all text-to-speech and streaming endpoints.

What are the rate limits for the ElevenLabs API?

Rate limits are tied to your subscription tier. Free accounts get 10,000 characters per month. Starter plans provide 30,000 characters, and higher tiers scale up to millions. Concurrent request limits vary by plan — the API returns 429 status codes when exceeded. Character usage can be checked via GET /v1/user/subscription.

How do I synthesize speech through Jentic?

Search Jentic for 'convert text to speech with a custom voice' to discover the POST /v1/text-to-speech/{voice_id} operation. The schema shows required parameters: voice_id in the path and text in the body. Optionally set model_id for multilingual synthesis. Execute through Jentic's SDK (pip install jentic) and receive the audio file directly.

What audio formats does the ElevenLabs API support?

The text-to-speech endpoints support MP3 (various bitrates: 44100_128, 44100_64), PCM (16000, 22050, 24000, 44100), and mu-law formats. Specify the desired format via the output_format query parameter on POST /v1/text-to-speech/{voice_id}. Streaming endpoints support the same formats for chunked delivery.

Can I generate sound effects with the ElevenLabs API?

Yes. The POST /v1/sound-generation endpoint accepts a text description of the desired sound and generates matching audio. Describe sounds in natural language like 'gentle rain on a tin roof' or 'car engine starting'. The endpoint returns the generated audio file directly. Useful for game audio, video production, and interactive media.

What languages does ElevenLabs text-to-speech support?

ElevenLabs supports 29+ languages including English, Spanish, French, German, Italian, Portuguese, Polish, Hindi, Arabic, Japanese, Korean, and Chinese. Use the multilingual_v2 model for non-English synthesis. Language is auto-detected from input text, or you can specify it explicitly in the request body for more accurate pronunciation.