Cloud Speech-to-Text API

Name: Cloud Speech-to-Text API API
Brand: Cloud Speech-to-Text API
Availability: InStock

✓ Official Vendor SpecAI/MLSpeechoauth211 EndpointsREST

For Agents

Transcribe speech audio to text with optional phrase-set bias. Supports short synchronous calls and long-running jobs for multi-minute recordings.

Quickstart

Get started with Cloud Speech-to-Text API in minutes using your preferred integration method.

# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
  "jentic": {
    "url": "https://api.jentic.com/mcp",
    "auth": "oauth"
  }
}

# Then ask your agent:
"transcribe an audio file"

# → Jentic returns the GET /events tool with parameter schema, agent executes.

Capabilities

What an agent can do with Cloud Speech-to-Text API API.

Transcribe a short audio clip synchronously and return the recognised text and word-level confidence

Submit a long-running recognition job for audio stored in Cloud Storage and poll for results

Bias the recogniser toward domain vocabulary by attaching a phrase set or custom class

Manage phrase sets — collections of weighted phrases — at the project level

GET STARTED

Start building with Cloud Speech-to-Text API API

Explore with Jentic

View OpenAPI Document

Use Cases

Patterns agents use Cloud Speech-to-Text API API for, with concrete tasks.

★ Call Centre Transcription

Transcribe recorded customer support calls into searchable text for QA review. Calls are uploaded to Cloud Storage and submitted via POST /v1/speech:longrunningrecognize, which returns an operation handle. The job runs asynchronously and the resulting transcript is fetched via the operations endpoint, typically within minutes.

POST /v1/speech:longrunningrecognize with audio.uri=gs://calls/abc.flac and config.languageCode=en-US, then poll GET /v1/operations/{name} until done.

Voice Note Capture in a Field App

A mobile field service app records short voice notes and sends them to the Speech-to-Text API for synchronous transcription. POST /v1/speech:recognize accepts audio inline as base64 or by Cloud Storage URI and returns the transcript in the same response, suitable for clips up to about a minute long.

POST /v1/speech:recognize with audio.content=<base64 LINEAR16> and config.languageCode=en-GB; return the alternatives[0].transcript.

Domain-Adapted Medical Dictation

Improve recognition accuracy for clinicians by creating a phrase set containing common drug names, procedures, and anatomical terms. The phrase set is then referenced in each recognise call's adaptation config so the model is biased toward those terms during decoding.

POST /v1/{+parent}/phraseSets with phrases=[{value:'amoxicillin',boost:15},...], then call recognize with adaptation.phraseSets=['projects/p/locations/global/phraseSets/meds'].

AI Agent Voice-Driven Workflow

An AI agent receives a voice message from a user, transcribes it via the Speech-to-Text API through Jentic, and then routes the transcript to its downstream reasoning step. The agent searches for the recognise operation, loads the schema, and executes — Jentic handles auth so the agent never sees the underlying credentials.

Search Jentic for 'transcribe an audio file', execute POST /v1/speech:recognize with the user's audio content and languageCode='en-US', then pass alternatives[0].transcript to the next reasoning step.

Key Endpoints

11 endpoints — the cloud speech-to-text api converts spoken audio into text using google's speech recognition models.

METHOD

PATH

DESCRIPTION

POST

/v1/speech:recognize

Synchronously transcribe a short audio clip

POST

/v1/speech:longrunningrecognize

Submit a long audio file for asynchronous transcription

GET

/v1/operations

List long-running recognition operations

GET

/v1/operations/{+name}

Get the status and result of a long-running recognition job

GET

/v1/{+parent}/phraseSets

List phrase sets in a project location

GET

/v1/{+parent}/customClasses

List custom classes in a project location

POST

/v1/speech:recognize

Synchronously transcribe a short audio clip

POST

/v1/speech:longrunningrecognize

Submit a long audio file for asynchronous transcription

GET

/v1/operations

List long-running recognition operations

GET

/v1/operations/{+name}

Get the status and result of a long-running recognition job

GET

/v1/{+parent}/phraseSets

List phrase sets in a project location

Why though Jentic?

Three things that make agents converge on Jentic-routed access.

Credential isolation

Speech-to-Text service-account credentials are stored in the Jentic vault (MAXsystem) and exchanged for scoped, short-lived access tokens on each call. Long-lived JSON keys never enter the agent context.

Intent-based discovery

Agents search Jentic with intents like 'transcribe an audio file' or 'submit a long audio for transcription', and Jentic returns the matching speech.recognize or longrunningrecognize operation with its request schema.

Time to first call

Direct Speech-to-Text integration: 1-2 days to wire OAuth, audio encoding choices, and long-running operation polling. Through Jentic: under 30 minutes to discover, load, and execute.

Related APIs

Alternatives and complements available in the Jentic catalogue.

Alternative

Deepgram API

Deepgram offers fast, streaming-first speech recognition with on-prem options.

Choose Deepgram when low-latency streaming or self-hosted deployment is the priority. Choose Cloud Speech-to-Text when staying inside the Google Cloud trust boundary matters.

Alternative

AssemblyAI API

AssemblyAI bundles transcription with summarisation, topic detection, and PII redaction.

Choose AssemblyAI when downstream NLP features (entity detection, summaries) are needed in the same call. Choose Cloud Speech-to-Text for raw transcription with phrase-set bias.

Complementary

Cloud Text-to-Speech API

Text-to-Speech generates audio from text; Speech-to-Text does the reverse.

Use Text-to-Speech to produce a spoken response after Speech-to-Text has parsed the user's voice input — e.g. in a voice agent loop.

Complementary

Cloud Translation API

Translate the transcribed text into another language for multilingual workflows.

Use Cloud Translation when the spoken input language differs from the language the downstream system expects.

FAQs

Specific to using Cloud Speech-to-Text API API through Jentic.

What authentication does the Cloud Speech-to-Text API use?

The Cloud Speech-to-Text API uses OAuth 2.0 with the cloud-platform scope. Through Jentic, OAuth credentials are stored in the Jentic vault (MAXsystem) and exchanged for short-lived access tokens, so service-account JSON keys never enter the agent context.

Can I transcribe long audio files with the Speech-to-Text API?

Yes. POST /v1/speech:longrunningrecognize accepts a Cloud Storage URI and returns an operation that can be polled via GET /v1/operations/{name}. This is the recommended path for any audio longer than about 60 seconds, where the synchronous recognise endpoint times out.

What are the rate limits for the Cloud Speech-to-Text API?

Default project quotas are 900 requests per minute and 480 minutes of audio per minute, with stricter limits on long-running submissions. Quotas are visible in the Google Cloud Console under IAM and admin > Quotas and can be raised on request.

How do I improve recognition of brand or technical terms?

Create a phrase set via POST /v1/{+parent}/phraseSets with the target terms and a boost value, then reference the phrase set in the adaptation field of the recognise request. This biases the recogniser toward the supplied vocabulary without retraining a model.

How do I run transcription through Jentic?

Search Jentic for 'transcribe an audio file', load the speech.recognize or speech.longrunningrecognize schema, and execute. Jentic returns the operation result for sync calls and the long-running operation handle for async jobs, which the agent can poll via the operations endpoint.

Is the Cloud Speech-to-Text API free?

Google offers 60 minutes of free transcription per month, after which usage is billed per 15-second increment, with different rates for standard, video, and medical models. Phrase set storage is free; data adaptation usage is billed at standard rates.