For Agents
Synthesize speech from text with custom or cloned voices, convert speech between voices, and generate sound effects. Supports 29+ languages with streaming output for real-time audio.
Get started with ElevenLabs API in minutes using your preferred integration method.
# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
"jentic": {
"url": "https://api.jentic.com/mcp",
"auth": "oauth"
}
}
# Then ask your agent:
"convert text to natural-sounding speech"
# → Jentic returns the GET /events tool with parameter schema, agent executes.What an agent can do with ElevenLabs API API.
Synthesize natural speech from text with 29+ language support and emotion control
Clone custom voices from uploaded audio samples for personalized synthesis
Convert speech from one voice to another while preserving content and timing
Generate sound effects and ambient audio from text descriptions
GET STARTED
Use for: I need to convert this text to speech with a specific voice, I want to clone a voice from an audio sample, Generate a sound effect from a text description, List all available voices and their characteristics
Not supported: Does not handle speech-to-text transcription, natural language understanding, or text generation — use for voice synthesis, audio processing, and sound generation only.
Jentic publishes the only available OpenAPI document for ElevenLabs API, keeping it validated and agent-ready.
Convert text to natural-sounding speech, clone voices, and process audio across 151 endpoints covering text-to-speech, speech-to-speech, voice generation, sound effects, dubbing, and audio isolation. Supports 29+ languages, custom voice creation from audio samples, streaming audio output, and pronunciation dictionaries. Authentication uses the xi-api-key header for all requests.
Stream audio output in real-time for low-latency playback applications
Isolate vocals from background noise in audio recordings
Dub video content into multiple languages with voice matching
Patterns agents use ElevenLabs API API for, with concrete tasks.
★ AI Agent Voice Synthesis via Jentic
AI agents discover and invoke ElevenLabs text-to-speech through Jentic's intent-based search to add voice output to conversational interfaces. Agents search for the synthesis operation, receive the input schema including voice_id and model_id parameters, and execute requests that return audio streams. No SDK setup required — agents specify text, voice, and output format and receive MP3 or PCM audio directly.
Search Jentic for 'convert text to speech', load the POST /v1/text-to-speech/{voice_id} schema, and execute with a voice ID and 200-word text input to receive MP3 audio
Custom Voice Cloning
Create custom voices from audio samples using voice generation endpoints. Upload reference audio files via voice samples, configure voice characteristics, and use the resulting voice_id for all subsequent text-to-speech calls. Supports both instant voice cloning from a single sample and professional voice cloning with higher fidelity from multiple samples.
Upload an audio sample to create a voice clone via POST /v1/voice-generation/create-voice, then synthesize text using the returned voice_id via POST /v1/text-to-speech/{voice_id}
Real-Time Streaming Speech
Stream synthesized audio in real-time using the POST /v1/text-to-speech/{voice_id}/stream endpoint for low-latency applications. Returns chunked audio as it is generated, enabling playback to begin before the full synthesis completes. Ideal for voice assistants, live narration, and interactive dialogue systems where response time matters more than batch processing efficiency.
Stream a 500-word article as speech via POST /v1/text-to-speech/{voice_id}/stream with output_format 'mp3_44100_128' and begin playback from the first audio chunk
Audio Isolation and Enhancement
Separate vocals from background noise in audio recordings using POST /v1/audio-isolation. Upload mixed audio and receive a clean vocal track with background sounds removed. Useful for podcast editing, meeting transcription preprocessing, and cleaning noisy recordings before speech-to-text or voice conversion processing.
Submit a noisy audio recording to POST /v1/audio-isolation and retrieve the cleaned vocal track for downstream transcription
Sound Effect Generation
Generate custom sound effects from text descriptions using POST /v1/sound-generation. Describe the desired sound in natural language and receive synthesized audio matching the description. Supports ambient sounds, foley effects, and musical elements for game development, video production, and interactive media without requiring a sound library.
Generate a 'thunderstorm with heavy rain' sound effect via POST /v1/sound-generation with duration of 10 seconds
151 endpoints — convert text to natural-sounding speech, clone voices, and process audio across 151 endpoints covering text-to-speech, speech-to-speech, voice generation, sound effects, dubbing, and audio isolation.
METHOD
PATH
DESCRIPTION
/v1/text-to-speech/{voice_id}
Convert text to speech with a specific voice
/v1/text-to-speech/{voice_id}/stream
Stream synthesized speech in real-time
/v1/speech-to-speech/{voice_id}
Convert speech from one voice to another
/v1/sound-generation
Generate sound effects from text descriptions
/v1/audio-isolation
Isolate vocals from background audio
/v1/voices
List all available voices
/v1/voice-generation/create-voice
Create a voice from generated parameters
/v1/history
Retrieve speech generation history
/v1/text-to-speech/{voice_id}
Convert text to speech with a specific voice
/v1/text-to-speech/{voice_id}/stream
Stream synthesized speech in real-time
/v1/speech-to-speech/{voice_id}
Convert speech from one voice to another
/v1/sound-generation
Generate sound effects from text descriptions
/v1/audio-isolation
Isolate vocals from background audio
Three things that make agents converge on Jentic-routed access.
Credential isolation
ElevenLabs API keys (xi-api-key header) are stored encrypted in the Jentic vault (MAXsystem). Agents receive scoped access tokens — raw API keys never enter the agent's context or logs.
Intent-based discovery
Agents search by intent (e.g., 'convert text to speech with a custom voice') and Jentic returns matching ElevenLabs operations with input schemas including voice_id options, model selections, and output format parameters.
Time to first call
Direct ElevenLabs integration: 1-3 days for auth, voice selection, streaming setup, and format handling. Through Jentic: under 1 hour — search for the operation, load schema, execute.
Alternatives and complements available in the Jentic catalogue.
OpenAI API
TTS and Whisper transcription as part of a broader AI model ecosystem
Choose OpenAI when you need basic TTS alongside chat completions and transcription in a single API, but prefer ElevenLabs for higher-quality voices, cloning, and streaming
AssemblyAI API
Speech-to-text transcription and audio intelligence to complement voice synthesis
Use AssemblyAI alongside ElevenLabs when you need accurate transcription, speaker diarization, or content moderation for audio that ElevenLabs generates or processes
Deepgram API
Real-time speech-to-text with streaming transcription for conversational AI
Use Deepgram alongside ElevenLabs to build full-duplex voice interfaces — Deepgram handles speech recognition while ElevenLabs handles synthesis
Stability AI API
Image and video generation to pair with audio for multimedia content creation
Use Stability AI alongside ElevenLabs when building multimedia content pipelines that need both visual assets and voiceover narration
Specific to using ElevenLabs API API through Jentic.
What authentication does the ElevenLabs API use?
The ElevenLabs API uses an API key passed in the xi-api-key header with every request. You can find your key in the Profile tab at elevenlabs.io. Through Jentic, your xi-api-key is stored encrypted in the MAXsystem vault and agents receive scoped access without the raw key entering their context.
Can I clone a custom voice with the ElevenLabs API?
Yes. Upload audio samples via the voice creation endpoints to create a voice clone. Instant voice cloning works from a single short sample via POST /v1/voice-generation/create-voice. Professional cloning requires multiple samples for higher fidelity. The resulting voice_id can be used in all text-to-speech and streaming endpoints.
What are the rate limits for the ElevenLabs API?
Rate limits are tied to your subscription tier. Free accounts get 10,000 characters per month. Starter plans provide 30,000 characters, and higher tiers scale up to millions. Concurrent request limits vary by plan — the API returns 429 status codes when exceeded. Character usage can be checked via GET /v1/user/subscription.
How do I synthesize speech through Jentic?
Search Jentic for 'convert text to speech with a custom voice' to discover the POST /v1/text-to-speech/{voice_id} operation. The schema shows required parameters: voice_id in the path and text in the body. Optionally set model_id for multilingual synthesis. Execute through Jentic's SDK (pip install jentic) and receive the audio file directly.
What audio formats does the ElevenLabs API support?
The text-to-speech endpoints support MP3 (various bitrates: 44100_128, 44100_64), PCM (16000, 22050, 24000, 44100), and mu-law formats. Specify the desired format via the output_format query parameter on POST /v1/text-to-speech/{voice_id}. Streaming endpoints support the same formats for chunked delivery.
Can I generate sound effects with the ElevenLabs API?
Yes. The POST /v1/sound-generation endpoint accepts a text description of the desired sound and generates matching audio. Describe sounds in natural language like 'gentle rain on a tin roof' or 'car engine starting'. The endpoint returns the generated audio file directly. Useful for game audio, video production, and interactive media.
What languages does ElevenLabs text-to-speech support?
ElevenLabs supports 29+ languages including English, Spanish, French, German, Italian, Portuguese, Polish, Hindi, Arabic, Japanese, Korean, and Chinese. Use the multilingual_v2 model for non-English synthesis. Language is auto-detected from input text, or you can specify it explicitly in the request body for more accurate pronunciation.
/v1/voices
List all available voices
/v1/voice-generation/create-voice
Create a voice from generated parameters
/v1/history
Retrieve speech generation history