Deepgram API

Name: Deepgram API API
Brand: Deepgram API
Availability: InStock

✓ Official Vendor SpecAI/MLSpeechapiKey, bearer39 EndpointsREST

For Agents

Transcribe audio to text, synthesize speech from text, and analyze language content. Supports 30+ languages with speaker diarization, topic detection, and custom vocabulary for domain-specific accuracy.

Quickstart

Get started with Deepgram API in minutes using your preferred integration method.

# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
  "jentic": {
    "url": "https://api.jentic.com/mcp",
    "auth": "oauth"
  }
}

# Then ask your agent:
"transcribe audio to text"

# → Jentic returns the GET /events tool with parameter schema, agent executes.

Capabilities

What an agent can do with Deepgram API API.

Transcribe pre-recorded audio files with speaker diarization, punctuation, and paragraph segmentation

Convert text to natural-sounding speech with multiple voice models and output formats

Analyze text for intent, sentiment, topic detection, and summarization via the Read endpoint

Select from multiple transcription models optimized for different domains (general, meeting, phonecall, voicemail)

GET STARTED

Start building with Deepgram API API

Explore with Jentic

View OpenAPI Document

Use Cases

Patterns agents use Deepgram API API for, with concrete tasks.

★ AI Agent Audio Transcription Pipeline

AI agents transcribe audio content (meetings, calls, podcasts, voicemails) into structured text through Jentic. The agent posts audio to the Listen endpoint with parameters for diarization, punctuation, and paragraphs, and receives timestamped text with speaker labels. Jentic handles API key injection and model selection, so the agent focuses on processing the transcript output for downstream tasks like summarization or search indexing.

POST an audio file URL to /v1/listen with parameters diarize=true, punctuate=true, and model=general, then extract the transcript text with speaker labels from the response

Text-to-Speech Generation

Generate natural-sounding speech audio from text input for voice assistants, accessibility features, or content narration. The Speak endpoint accepts plain text or SSML and returns audio in multiple formats (WAV, MP3, OGG). Multiple voice models provide different speaking styles and tones. Response streaming enables real-time audio playback as generation progresses.

POST text content to /v1/speak with a specified voice model and receive the generated audio file in MP3 format

Meeting and Call Analytics

Extract actionable intelligence from meeting recordings and phone calls with speaker-attributed transcription, topic detection, and summarization. Deepgram's meeting-optimized model handles overlapping speech, filler word removal, and domain terminology. The output includes word-level timestamps enabling precise navigation to specific moments in recordings and automated meeting minutes generation.

Transcribe a meeting recording via POST /v1/listen with model=meeting, diarize=true, summarize=true, and topics=true, then parse the response for speaker turns and topic segments

Document and Text Understanding

Analyze written text for intent, sentiment, topics, and summaries through the Read endpoint. This enables processing of transcripts, emails, support tickets, and documents without audio input. The API returns structured analysis including detected topics with confidence scores, overall sentiment, and concise summaries suitable for search indexing or automated routing.

POST a text document to /v1/read with intents=true and summarize=true, then extract the detected intents and summary from the response

Key Endpoints

39 endpoints — transcribe audio to text, synthesize speech from text, and understand language through 39 endpoints covering deepgram's ai speech platform.

METHOD

PATH

DESCRIPTION

POST

/v1/listen

Transcribe audio to text with model and language options

POST

/v1/speak

Synthesize text to speech audio

POST

/v1/read

Analyze text for intent, sentiment, and topics

GET

/v1/models

List available transcription models

GET

/v1/projects

List all projects for the account

GET

/v1/projects/{project_id}/usage

Get usage statistics for a project

POST

/v1/projects/{project_id}/keys

Create a new API key for a project

POST

/v1/auth/grant

Generate a temporary JWT for scoped access

POST

/v1/listen

Transcribe audio to text with model and language options

POST

/v1/speak

Synthesize text to speech audio

POST

/v1/read

Analyze text for intent, sentiment, and topics

GET

/v1/models

List available transcription models

GET

/v1/projects

List all projects for the account

Why though Jentic?

Three things that make agents converge on Jentic-routed access.

Credential isolation

Deepgram API keys are stored encrypted in the Jentic vault (MAXsystem). Agents receive pre-authenticated Authorization: Token headers — the raw API key never enters the agent's execution context.

Intent-based discovery

Agents search by intent (e.g., 'transcribe an audio recording') and Jentic returns the matching Deepgram Listen, Speak, or Read operation with parameter schemas, so the agent selects the right model and options without navigating documentation.

Time to first call

Direct Deepgram integration: 1-2 days for auth setup, model selection, parameter tuning, and response parsing. Through Jentic: under 1 hour — search, load schema, execute.

Related APIs

Alternatives and complements available in the Jentic catalogue.

Alternative

AssemblyAI API

Speech-to-text API with additional LLM-powered features like auto chapters, entity detection, and content moderation

Choose AssemblyAI when you need built-in LLM post-processing features like auto-chapters, entity detection, or content safety labels alongside transcription

Alternative

Rev.ai API

Speech-to-text service with human-in-the-loop options for higher accuracy on difficult audio

Choose Rev.ai when you need human transcription fallback for high-accuracy requirements or when dealing with heavy accents and noisy audio

Complementary

OpenAI API

LLM platform for summarizing, analyzing, and extracting structured data from Deepgram transcripts

Choose OpenAI when you need to summarize, classify, or extract structured information from transcripts produced by Deepgram's speech-to-text

Complementary

Spotify Web API

Music and podcast catalog for sourcing audio content that Deepgram can transcribe

Choose Spotify when you need to discover podcast episodes or audio content to feed into Deepgram for transcription and analysis

FAQs

Specific to using Deepgram API API through Jentic.

What authentication does the Deepgram API use?

The Deepgram API uses API key authentication passed in the Authorization header with a 'Token' prefix (Authorization: Token <API_KEY>). It also supports JWT bearer tokens for temporary scoped access. API keys are scoped to projects and can have different permission levels. Through Jentic, API keys are stored encrypted in the MAXsystem vault — agents receive pre-authenticated headers without the raw key entering their context.

Can I transcribe audio with speaker identification?

Yes. Add diarize=true to your POST /v1/listen request and Deepgram returns speaker labels for each word and utterance in the transcript. The diarization model distinguishes between speakers in multi-person recordings like meetings and interviews. Each word in the response includes a speaker field with a numeric speaker identifier.

What are the rate limits for the Deepgram API?

Deepgram's rate limits depend on your plan tier and concurrency level. The API supports concurrent transcription requests with limits based on your subscription. Pay-as-you-go plans start with a set concurrency level that scales with usage. The API returns HTTP 429 with rate limit headers when the concurrency ceiling is reached.

How do I transcribe an audio file through Jentic with an AI agent?

Install the Jentic SDK with pip install jentic, then search for 'transcribe audio to text'. Jentic returns the POST /v1/listen operation schema with supported parameters (model, language, diarize, punctuate, paragraphs, summarize). The agent posts the audio URL or binary, and Jentic injects the Authorization: Token header — the API key never appears in the agent's context.

Which transcription models does Deepgram offer?

Deepgram offers domain-optimized models including 'general' for broad use, 'meeting' for multi-speaker meetings, 'phonecall' for telephony audio, and 'voicemail' for short-form messages. Each model is tuned for its domain's acoustic characteristics and vocabulary. You can list available models via GET /v1/models and select per-project models via GET /v1/projects/{project_id}/models.

Does Deepgram support text-to-speech generation?

Yes. The POST /v1/speak endpoint accepts text input and returns synthesized audio in formats like WAV, MP3, and OGG. You select a voice model in the request to control the speaking style and tone. The endpoint supports streaming responses for real-time playback applications.