AssemblyAI API

Name: AssemblyAI API API
Brand: AssemblyAI API
Availability: InStock

★ Only Publicly Available OpenAPI DocumentAI/MLSpeechapiKey12 EndpointsREST

For Agents

Transcribe audio, stream realtime speech-to-text, and run LLM chat completions over the resulting transcripts.

Quickstart

Get started with AssemblyAI API in minutes using your preferred integration method.

# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
  "jentic": {
    "url": "https://api.jentic.com/mcp",
    "auth": "oauth"
  }
}

# Then ask your agent:
"transcribe an audio file with AssemblyAI"

# → Jentic returns the GET /events tool with parameter schema, agent executes.

Capabilities

What an agent can do with AssemblyAI API API.

Upload an audio file to AssemblyAI for transcription

Create a transcript from a hosted audio URL or uploaded file

Retrieve a completed transcript with word-level timestamps

Generate SRT or VTT subtitle files from a transcript

Search for specific words across a transcript

GET STARTED

Start building with AssemblyAI API API

Explore with Jentic

View OpenAPI Document

Jentic publishes the only available OpenAPI document for AssemblyAI API, keeping it validated and agent-ready.

Jentic publishes the only available OpenAPI specification for AssemblyAI API, keeping it validated and agent-ready. AssemblyAI is a speech-to-text and audio intelligence provider offering high-accuracy transcription, real-time streaming, and an LLM that operates over transcripts. The API exposes 12 endpoints across file upload, transcripts, transcript-derived data (sentences, paragraphs, subtitles, redacted audio, word search), realtime streaming tokens, and LLM chat completions. Authentication is via the authorization header.

Use Cases

Patterns agents use AssemblyAI API API for, with concrete tasks.

★ Podcast and Video Transcription

Upload media files via POST /v2/upload, kick off transcription via POST /v2/transcript, and retrieve the completed transcript via GET /v2/transcript/{transcript_id}. Generate SRT or VTT subtitles from the same transcript with GET /v2/transcript/{transcript_id}/subtitles. Suitable for podcast networks and video publishers needing accurate captions at scale.

Upload episode-42.mp3 to AssemblyAI, create a transcript, poll for completion, and download the SRT subtitles

Realtime Streaming Transcription

Mint a short-lived realtime token via POST /v2/realtime/token and use it from a browser or call platform to stream audio for live transcription. Suitable for contact centres, live captions, and meeting-assistant agents where latency matters.

Create a realtime streaming token via POST /v2/realtime/token and return it for the client to open a websocket connection

Compliance-Aware Audio Redaction

Create a transcript with PII redaction enabled, then download the redacted audio via GET /v2/transcript/{transcript_id}/redacted-audio. Useful where call recordings need to retain content for review while removing names, card numbers, and other sensitive entities.

Create a transcript with PII redaction for a customer-support call, then retrieve the redacted audio file via GET /v2/transcript/{transcript_id}/redacted-audio

LLM Q&A Over Transcripts

Use POST /v2/llm/chat-completions to run an LLM grounded in a transcript — useful for meeting summarisation, action-item extraction, and ad-hoc Q&A. Pair with GET /v2/transcript/{transcript_id}/word-search when a specific term needs to be located before the LLM call.

Submit a chat completion asking 'What were the action items?' grounded in transcript 'tx_123' via POST /v2/llm/chat-completions

AI Agent for Audio Operations

An agent integrated through Jentic can manage the full upload-transcribe-summarise pipeline, polling for transcript completion and feeding results into downstream tools — without holding the AssemblyAI API key. Jentic stores the key in its vault and uses intent search to navigate the 12 AssemblyAI operations.

Through Jentic, upload a meeting recording, transcribe it, summarise via the LLM endpoint, and post the summary to a Slack channel

Key Endpoints

12 endpoints — jentic publishes the only available openapi specification for assemblyai api, keeping it validated and agent-ready.

METHOD

PATH

DESCRIPTION

POST

/v2/upload

Upload media file

POST

/v2/transcript

Create a transcript

GET

/v2/transcript/{transcript_id}

Get transcript

GET

/v2/transcript/{transcript_id}/subtitles

Get SRT/VTT subtitles

GET

/v2/transcript/{transcript_id}/sentences

Get transcript sentences

GET

/v2/transcript/{transcript_id}/word-search

Search for words in a transcript

POST

/v2/realtime/token

Create realtime streaming token

POST

/v2/llm/chat-completions

Run LLM chat completion over a transcript

POST

/v2/upload

Upload media file

POST

/v2/transcript

Create a transcript

GET

/v2/transcript/{transcript_id}

Get transcript

GET

/v2/transcript/{transcript_id}/subtitles

Get SRT/VTT subtitles

GET

/v2/transcript/{transcript_id}/sentences

Get transcript sentences

Why though Jentic?

Three things that make agents converge on Jentic-routed access.

Credential isolation

AssemblyAI API keys are stored encrypted in the Jentic vault. Agents receive scoped execution access — the authorization header is injected at execution time and the raw key never enters prompts, logs, or agent memory.

Intent-based discovery

Agents search Jentic by intent (e.g. 'transcribe audio' or 'get realtime streaming token') and Jentic returns the matching AssemblyAI operation with its input schema, so the agent does not have to read the AssemblyAI docs to find the right endpoint.

Time to first call

Direct AssemblyAI integration: 1-3 days to wire upload, transcript creation, polling, and subtitle retrieval. Through Jentic: under 1 hour — search by intent, load schema, execute the full pipeline.

Related APIs

Alternatives and complements available in the Jentic catalogue.

Alternative

Deepgram

Deepgram offers speech-to-text with strong realtime latency; AssemblyAI emphasises accuracy and audio intelligence features.

Choose Deepgram for ultra-low-latency live transcription; choose AssemblyAI when richer transcript-derived features (subtitles, summarisation, redaction) matter.

Alternative

Rev.ai

Rev.ai provides speech-to-text plus an option for human-verified transcripts.

Use Rev.ai when human-grade accuracy is required; pick AssemblyAI for fully automated pipelines with LLM features.

Alternative

OpenAI

OpenAI's audio endpoints transcribe via Whisper; AssemblyAI offers more transcript-tooling around the result.

Choose OpenAI Whisper for simple transcription within an existing OpenAI workflow; pick AssemblyAI when subtitles, redaction, and word-search features are needed.

Complementary

ElevenLabs

ElevenLabs handles text-to-speech generation, complementing AssemblyAI's speech-to-text direction.

Pair with AssemblyAI for full audio loops — transcribe inbound audio with AssemblyAI, generate spoken responses with ElevenLabs.

FAQs

Specific to using AssemblyAI API API through Jentic.

Why is there no official OpenAPI spec for AssemblyAI API?

AssemblyAI does not publish a maintained OpenAPI specification covering all endpoints. Jentic generates and maintains this spec so that AI agents and developers can call AssemblyAI API via structured tooling. It is validated against the live API and kept up to date. Get started at https://app.jentic.com/sign-up.

What authentication does the AssemblyAI API use?

The API uses an API key passed in the authorization header (note: lowercase 'authorization', as a raw value, not 'Bearer {key}'). Jentic stores the key in its credential vault and injects the header at execution time so the raw key never enters the agent's context.

Can I get SRT subtitles from an AssemblyAI transcript?

Yes. GET /v2/transcript/{transcript_id}/subtitles returns SRT or VTT subtitle data for a completed transcript. Pair it with GET /v2/transcript/{transcript_id}/paragraphs for paragraph-level breakdowns or GET /v2/transcript/{transcript_id}/sentences for sentence boundaries.

Does AssemblyAI support realtime streaming transcription?

Yes. POST /v2/realtime/token mints a short-lived token that a client uses to open a websocket connection for live transcription. The websocket itself is outside the REST API surface but the token endpoint is part of this OpenAPI spec.

What are the rate limits for the AssemblyAI API?

The OpenAPI specification does not document explicit rate limits. AssemblyAI publishes plan-tier limits on their pricing page — implement exponential backoff on HTTP 429 responses and batch transcript creation rather than firing requests in parallel.

How do I transcribe a file through Jentic?

Search Jentic for 'transcribe an audio file with AssemblyAI' — POST /v2/upload and POST /v2/transcript will be returned. Load each schema, upload the file, create the transcript, and poll GET /v2/transcript/{transcript_id} until status is 'completed'. Jentic handles the authorization header at every step.