For Agents
Transcribe audio, stream realtime speech-to-text, and run LLM chat completions over the resulting transcripts.
Get started with AssemblyAI API in minutes using your preferred integration method.
# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
"jentic": {
"url": "https://api.jentic.com/mcp",
"auth": "oauth"
}
}
# Then ask your agent:
"transcribe an audio file with AssemblyAI"
# → Jentic returns the GET /events tool with parameter schema, agent executes.What an agent can do with AssemblyAI API API.
Upload an audio file to AssemblyAI for transcription
Create a transcript from a hosted audio URL or uploaded file
Retrieve a completed transcript with word-level timestamps
Generate SRT or VTT subtitle files from a transcript
Search for specific words across a transcript
GET STARTED
Use for: I need to transcribe a podcast audio file, Get the SRT subtitles for a video, Search a transcript for every mention of a specific word, List all transcripts created in my account
Not supported: Does not handle text-to-speech, voice cloning, or audio editing — use for speech-to-text transcription, transcript-derived data, and transcript-grounded LLM completions only.
Jentic publishes the only available OpenAPI document for AssemblyAI API, keeping it validated and agent-ready.
Jentic publishes the only available OpenAPI specification for AssemblyAI API, keeping it validated and agent-ready. AssemblyAI is a speech-to-text and audio intelligence provider offering high-accuracy transcription, real-time streaming, and an LLM that operates over transcripts. The API exposes 12 endpoints across file upload, transcripts, transcript-derived data (sentences, paragraphs, subtitles, redacted audio, word search), realtime streaming tokens, and LLM chat completions. Authentication is via the authorization header.
Mint short-lived tokens for realtime streaming transcription
Run an LLM chat completion grounded in a transcript
Patterns agents use AssemblyAI API API for, with concrete tasks.
★ Podcast and Video Transcription
Upload media files via POST /v2/upload, kick off transcription via POST /v2/transcript, and retrieve the completed transcript via GET /v2/transcript/{transcript_id}. Generate SRT or VTT subtitles from the same transcript with GET /v2/transcript/{transcript_id}/subtitles. Suitable for podcast networks and video publishers needing accurate captions at scale.
Upload episode-42.mp3 to AssemblyAI, create a transcript, poll for completion, and download the SRT subtitles
Realtime Streaming Transcription
Mint a short-lived realtime token via POST /v2/realtime/token and use it from a browser or call platform to stream audio for live transcription. Suitable for contact centres, live captions, and meeting-assistant agents where latency matters.
Create a realtime streaming token via POST /v2/realtime/token and return it for the client to open a websocket connection
Compliance-Aware Audio Redaction
Create a transcript with PII redaction enabled, then download the redacted audio via GET /v2/transcript/{transcript_id}/redacted-audio. Useful where call recordings need to retain content for review while removing names, card numbers, and other sensitive entities.
Create a transcript with PII redaction for a customer-support call, then retrieve the redacted audio file via GET /v2/transcript/{transcript_id}/redacted-audio
LLM Q&A Over Transcripts
Use POST /v2/llm/chat-completions to run an LLM grounded in a transcript — useful for meeting summarisation, action-item extraction, and ad-hoc Q&A. Pair with GET /v2/transcript/{transcript_id}/word-search when a specific term needs to be located before the LLM call.
Submit a chat completion asking 'What were the action items?' grounded in transcript 'tx_123' via POST /v2/llm/chat-completions
AI Agent for Audio Operations
An agent integrated through Jentic can manage the full upload-transcribe-summarise pipeline, polling for transcript completion and feeding results into downstream tools — without holding the AssemblyAI API key. Jentic stores the key in its vault and uses intent search to navigate the 12 AssemblyAI operations.
Through Jentic, upload a meeting recording, transcribe it, summarise via the LLM endpoint, and post the summary to a Slack channel
12 endpoints — jentic publishes the only available openapi specification for assemblyai api, keeping it validated and agent-ready.
METHOD
PATH
DESCRIPTION
/v2/upload
Upload media file
/v2/transcript
Create a transcript
/v2/transcript/{transcript_id}
Get transcript
/v2/transcript/{transcript_id}/subtitles
Get SRT/VTT subtitles
/v2/transcript/{transcript_id}/sentences
Get transcript sentences
/v2/transcript/{transcript_id}/word-search
Search for words in a transcript
/v2/realtime/token
Create realtime streaming token
/v2/llm/chat-completions
Run LLM chat completion over a transcript
/v2/upload
Upload media file
/v2/transcript
Create a transcript
/v2/transcript/{transcript_id}
Get transcript
/v2/transcript/{transcript_id}/subtitles
Get SRT/VTT subtitles
/v2/transcript/{transcript_id}/sentences
Get transcript sentences
Three things that make agents converge on Jentic-routed access.
Credential isolation
AssemblyAI API keys are stored encrypted in the Jentic vault. Agents receive scoped execution access — the authorization header is injected at execution time and the raw key never enters prompts, logs, or agent memory.
Intent-based discovery
Agents search Jentic by intent (e.g. 'transcribe audio' or 'get realtime streaming token') and Jentic returns the matching AssemblyAI operation with its input schema, so the agent does not have to read the AssemblyAI docs to find the right endpoint.
Time to first call
Direct AssemblyAI integration: 1-3 days to wire upload, transcript creation, polling, and subtitle retrieval. Through Jentic: under 1 hour — search by intent, load schema, execute the full pipeline.
Alternatives and complements available in the Jentic catalogue.
Deepgram
Deepgram offers speech-to-text with strong realtime latency; AssemblyAI emphasises accuracy and audio intelligence features.
Choose Deepgram for ultra-low-latency live transcription; choose AssemblyAI when richer transcript-derived features (subtitles, summarisation, redaction) matter.
Rev.ai
Rev.ai provides speech-to-text plus an option for human-verified transcripts.
Use Rev.ai when human-grade accuracy is required; pick AssemblyAI for fully automated pipelines with LLM features.
OpenAI
OpenAI's audio endpoints transcribe via Whisper; AssemblyAI offers more transcript-tooling around the result.
Choose OpenAI Whisper for simple transcription within an existing OpenAI workflow; pick AssemblyAI when subtitles, redaction, and word-search features are needed.
ElevenLabs
ElevenLabs handles text-to-speech generation, complementing AssemblyAI's speech-to-text direction.
Pair with AssemblyAI for full audio loops — transcribe inbound audio with AssemblyAI, generate spoken responses with ElevenLabs.
Specific to using AssemblyAI API API through Jentic.
Why is there no official OpenAPI spec for AssemblyAI API?
AssemblyAI does not publish a maintained OpenAPI specification covering all endpoints. Jentic generates and maintains this spec so that AI agents and developers can call AssemblyAI API via structured tooling. It is validated against the live API and kept up to date. Get started at https://app.jentic.com/sign-up.
What authentication does the AssemblyAI API use?
The API uses an API key passed in the authorization header (note: lowercase 'authorization', as a raw value, not 'Bearer {key}'). Jentic stores the key in its credential vault and injects the header at execution time so the raw key never enters the agent's context.
Can I get SRT subtitles from an AssemblyAI transcript?
Yes. GET /v2/transcript/{transcript_id}/subtitles returns SRT or VTT subtitle data for a completed transcript. Pair it with GET /v2/transcript/{transcript_id}/paragraphs for paragraph-level breakdowns or GET /v2/transcript/{transcript_id}/sentences for sentence boundaries.
Does AssemblyAI support realtime streaming transcription?
Yes. POST /v2/realtime/token mints a short-lived token that a client uses to open a websocket connection for live transcription. The websocket itself is outside the REST API surface but the token endpoint is part of this OpenAPI spec.
What are the rate limits for the AssemblyAI API?
The OpenAPI specification does not document explicit rate limits. AssemblyAI publishes plan-tier limits on their pricing page — implement exponential backoff on HTTP 429 responses and batch transcript creation rather than firing requests in parallel.
How do I transcribe a file through Jentic?
Search Jentic for 'transcribe an audio file with AssemblyAI' — POST /v2/upload and POST /v2/transcript will be returned. Load each schema, upload the file, create the transcript, and poll GET /v2/transcript/{transcript_id} until status is 'completed'. Jentic handles the authorization header at every step.
/v2/transcript/{transcript_id}/word-search
Search for words in a transcript
/v2/realtime/token
Create realtime streaming token
/v2/llm/chat-completions
Run LLM chat completion over a transcript