For Agents
Transcribe audio to text, synthesize speech from text, and analyze language content. Supports 30+ languages with speaker diarization, topic detection, and custom vocabulary for domain-specific accuracy.
Get started with Deepgram API in minutes using your preferred integration method.
# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
"jentic": {
"url": "https://api.jentic.com/mcp",
"auth": "oauth"
}
}
# Then ask your agent:
"transcribe audio to text"
# → Jentic returns the GET /events tool with parameter schema, agent executes.What an agent can do with Deepgram API API.
Transcribe pre-recorded audio files with speaker diarization, punctuation, and paragraph segmentation
Convert text to natural-sounding speech with multiple voice models and output formats
Analyze text for intent, sentiment, topic detection, and summarization via the Read endpoint
Select from multiple transcription models optimized for different domains (general, meeting, phonecall, voicemail)
GET STARTED
Use for: I need to transcribe an audio file to text, I want to convert text into spoken audio, Get a summary of a long audio recording, Retrieve the usage breakdown for my Deepgram project this month
Not supported: Does not handle audio editing, music generation, or real-time voice calling — use for speech-to-text transcription, text-to-speech synthesis, and text understanding only.
Transcribe audio to text, synthesize speech from text, and understand language through 39 endpoints covering Deepgram's AI speech platform. The Listen endpoint handles pre-recorded and streaming speech-to-text with support for 30+ languages, speaker diarization, punctuation, and topic detection. The Speak endpoint converts text to natural-sounding speech. The API also provides project management, API key administration, usage tracking, and model selection for fine-tuned domain-specific transcription.
Track transcription usage, billing breakdown, and project-level consumption across API keys
Configure AI agent think-model settings for conversational voice agent orchestration
Patterns agents use Deepgram API API for, with concrete tasks.
★ AI Agent Audio Transcription Pipeline
AI agents transcribe audio content (meetings, calls, podcasts, voicemails) into structured text through Jentic. The agent posts audio to the Listen endpoint with parameters for diarization, punctuation, and paragraphs, and receives timestamped text with speaker labels. Jentic handles API key injection and model selection, so the agent focuses on processing the transcript output for downstream tasks like summarization or search indexing.
POST an audio file URL to /v1/listen with parameters diarize=true, punctuate=true, and model=general, then extract the transcript text with speaker labels from the response
Text-to-Speech Generation
Generate natural-sounding speech audio from text input for voice assistants, accessibility features, or content narration. The Speak endpoint accepts plain text or SSML and returns audio in multiple formats (WAV, MP3, OGG). Multiple voice models provide different speaking styles and tones. Response streaming enables real-time audio playback as generation progresses.
POST text content to /v1/speak with a specified voice model and receive the generated audio file in MP3 format
Meeting and Call Analytics
Extract actionable intelligence from meeting recordings and phone calls with speaker-attributed transcription, topic detection, and summarization. Deepgram's meeting-optimized model handles overlapping speech, filler word removal, and domain terminology. The output includes word-level timestamps enabling precise navigation to specific moments in recordings and automated meeting minutes generation.
Transcribe a meeting recording via POST /v1/listen with model=meeting, diarize=true, summarize=true, and topics=true, then parse the response for speaker turns and topic segments
Document and Text Understanding
Analyze written text for intent, sentiment, topics, and summaries through the Read endpoint. This enables processing of transcripts, emails, support tickets, and documents without audio input. The API returns structured analysis including detected topics with confidence scores, overall sentiment, and concise summaries suitable for search indexing or automated routing.
POST a text document to /v1/read with intents=true and summarize=true, then extract the detected intents and summary from the response
39 endpoints — transcribe audio to text, synthesize speech from text, and understand language through 39 endpoints covering deepgram's ai speech platform.
METHOD
PATH
DESCRIPTION
/v1/listen
Transcribe audio to text with model and language options
/v1/speak
Synthesize text to speech audio
/v1/read
Analyze text for intent, sentiment, and topics
/v1/models
List available transcription models
/v1/projects
List all projects for the account
/v1/projects/{project_id}/usage
Get usage statistics for a project
/v1/projects/{project_id}/keys
Create a new API key for a project
/v1/auth/grant
Generate a temporary JWT for scoped access
/v1/listen
Transcribe audio to text with model and language options
/v1/speak
Synthesize text to speech audio
/v1/read
Analyze text for intent, sentiment, and topics
/v1/models
List available transcription models
/v1/projects
List all projects for the account
Three things that make agents converge on Jentic-routed access.
Credential isolation
Deepgram API keys are stored encrypted in the Jentic vault (MAXsystem). Agents receive pre-authenticated Authorization: Token headers — the raw API key never enters the agent's execution context.
Intent-based discovery
Agents search by intent (e.g., 'transcribe an audio recording') and Jentic returns the matching Deepgram Listen, Speak, or Read operation with parameter schemas, so the agent selects the right model and options without navigating documentation.
Time to first call
Direct Deepgram integration: 1-2 days for auth setup, model selection, parameter tuning, and response parsing. Through Jentic: under 1 hour — search, load schema, execute.
Alternatives and complements available in the Jentic catalogue.
AssemblyAI API
Speech-to-text API with additional LLM-powered features like auto chapters, entity detection, and content moderation
Choose AssemblyAI when you need built-in LLM post-processing features like auto-chapters, entity detection, or content safety labels alongside transcription
Rev.ai API
Speech-to-text service with human-in-the-loop options for higher accuracy on difficult audio
Choose Rev.ai when you need human transcription fallback for high-accuracy requirements or when dealing with heavy accents and noisy audio
OpenAI API
LLM platform for summarizing, analyzing, and extracting structured data from Deepgram transcripts
Choose OpenAI when you need to summarize, classify, or extract structured information from transcripts produced by Deepgram's speech-to-text
Spotify Web API
Music and podcast catalog for sourcing audio content that Deepgram can transcribe
Choose Spotify when you need to discover podcast episodes or audio content to feed into Deepgram for transcription and analysis
Specific to using Deepgram API API through Jentic.
What authentication does the Deepgram API use?
The Deepgram API uses API key authentication passed in the Authorization header with a 'Token' prefix (Authorization: Token <API_KEY>). It also supports JWT bearer tokens for temporary scoped access. API keys are scoped to projects and can have different permission levels. Through Jentic, API keys are stored encrypted in the MAXsystem vault — agents receive pre-authenticated headers without the raw key entering their context.
Can I transcribe audio with speaker identification?
Yes. Add diarize=true to your POST /v1/listen request and Deepgram returns speaker labels for each word and utterance in the transcript. The diarization model distinguishes between speakers in multi-person recordings like meetings and interviews. Each word in the response includes a speaker field with a numeric speaker identifier.
What are the rate limits for the Deepgram API?
Deepgram's rate limits depend on your plan tier and concurrency level. The API supports concurrent transcription requests with limits based on your subscription. Pay-as-you-go plans start with a set concurrency level that scales with usage. The API returns HTTP 429 with rate limit headers when the concurrency ceiling is reached.
How do I transcribe an audio file through Jentic with an AI agent?
Install the Jentic SDK with pip install jentic, then search for 'transcribe audio to text'. Jentic returns the POST /v1/listen operation schema with supported parameters (model, language, diarize, punctuate, paragraphs, summarize). The agent posts the audio URL or binary, and Jentic injects the Authorization: Token header — the API key never appears in the agent's context.
Which transcription models does Deepgram offer?
Deepgram offers domain-optimized models including 'general' for broad use, 'meeting' for multi-speaker meetings, 'phonecall' for telephony audio, and 'voicemail' for short-form messages. Each model is tuned for its domain's acoustic characteristics and vocabulary. You can list available models via GET /v1/models and select per-project models via GET /v1/projects/{project_id}/models.
Does Deepgram support text-to-speech generation?
Yes. The POST /v1/speak endpoint accepts text input and returns synthesized audio in formats like WAV, MP3, and OGG. You select a voice model in the request to control the speaking style and tone. The endpoint supports streaming responses for real-time playback applications.
/v1/projects/{project_id}/usage
Get usage statistics for a project
/v1/projects/{project_id}/keys
Create a new API key for a project
/v1/auth/grant
Generate a temporary JWT for scoped access