Cloud Video Intelligence API

Name: Cloud Video Intelligence API API
Brand: Cloud Video Intelligence API
Availability: InStock

✓ Official Vendor SpecAI/MLVisionoauth28 EndpointsREST

For Agents

Annotate videos for labels, shots, text, speech, and explicit content so an agent can build searchable metadata or moderate content without running its own ML.

Quickstart

Get started with Cloud Video Intelligence API in minutes using your preferred integration method.

# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
  "jentic": {
    "url": "https://api.jentic.com/mcp",
    "auth": "oauth"
  }
}

# Then ask your agent:
"annotate a video for labels and explicit content"

# → Jentic returns the GET /events tool with parameter schema, agent executes.

Capabilities

What an agent can do with Cloud Video Intelligence API API.

Annotate a video for object tracking, label detection, and shot change detection

Run optical character recognition on text that appears on screen across a video

Transcribe spoken audio in supported languages with timestamps

Flag explicit content frame-by-frame for moderation pipelines

GET STARTED

Start building with Cloud Video Intelligence API API

Explore with Jentic

View OpenAPI Document

Use Cases

Patterns agents use Cloud Video Intelligence API API for, with concrete tasks.

★ User-Generated Content Moderation

Platforms hosting user uploads run every video through the Video Intelligence API to flag explicit content, violence-adjacent labels, and profanity in transcribed speech before the asset goes live. The annotate endpoint accepts a Cloud Storage URI and returns a per-frame confidence score for each enabled feature, which the moderation pipeline thresholds before approving or routing to human review. This is dramatically faster than running an in-house vision model.

Annotate a Cloud Storage video for EXPLICIT_CONTENT_DETECTION and reject uploads with any frame above 0.8 confidence

Video Search and Metadata Indexing

Media libraries enrich every uploaded video with label, shot, and OCR annotations so editors can search across footage by subject, scene, or on-screen text. The API returns timestamped annotations that the indexer maps into a search engine document, turning raw video into queryable metadata. Annotation is asynchronous and free of model maintenance overhead.

Submit a videos:annotate request with LABEL_DETECTION, SHOT_CHANGE_DETECTION, and TEXT_DETECTION on a Cloud Storage URI

Automated Captioning and Transcripts

Publishers generate first-pass captions and searchable transcripts by enabling SPEECH_TRANSCRIPTION on the annotate request. The API returns word-level timing that the captioning pipeline converts into WebVTT or SRT files for upload to a video player. This dramatically reduces editor time on large back-catalogs.

Annotate a video with SPEECH_TRANSCRIPTION and convert the response into a WebVTT caption file

AI Agent Video Understanding

An AI agent integrated through Jentic answers prompts like 'summarize this clip' by submitting a Video Intelligence annotation, polling the long-running operation, and assembling the labels, shots, and transcripts into a structured summary. Because the API uses OAuth 2.0 with the cloud-platform scope, Jentic isolates the token in the MAXsystem vault and exposes only a scoped reference to the agent.

Search Jentic for annotate a video, submit annotation, and poll until the operation completes

Key Endpoints

8 endpoints — the cloud video intelligence api analyzes video stored in cloud storage or supplied inline to detect objects, label content, identify shot changes, recognize on-screen text, transcribe speech, and flag explicit content.

METHOD

PATH

DESCRIPTION

POST

/v1/videos:annotate

Submit an annotation request for one or more features

GET

/v1/{+name}/operations

List annotation operations

POST

/v1/operations/{+name}:cancel

Cancel a running annotation operation

DELETE

/v1/operations/{+name}

Delete a completed operation

POST

/v1/videos:annotate

Submit an annotation request for one or more features

GET

/v1/{+name}/operations

List annotation operations

POST

/v1/operations/{+name}:cancel

Cancel a running annotation operation

DELETE

/v1/operations/{+name}

Delete a completed operation

Why though Jentic?

Three things that make agents converge on Jentic-routed access.

Credential isolation

Video Intelligence OAuth tokens with the cloud-platform scope are stored encrypted in the Jentic vault (MAXsystem). Agents receive scoped access tokens — raw OAuth tokens never enter the agent's context, which matters because the cloud-platform scope grants broad project-level access.

Intent-based discovery

Agents search Jentic with intents like 'transcribe a video' or 'detect explicit content' and Jentic returns videos:annotate with its full input schema, so the agent can submit a valid request without browsing Google's discovery doc.

Time to first call

Direct Video Intelligence integration: 2-3 days for OAuth setup, async operation polling, and feature selection. Through Jentic: under 1 hour — search, load schema, execute and poll.

Related APIs

Alternatives and complements available in the Jentic catalogue.

Alternative

Cloud Vision API

Image-level annotation; Video Intelligence is the per-frame, per-shot equivalent for video

Choose Vision when the input is a still image; choose Video Intelligence when the input is a video file or stream.

Alternative

Cloud Speech-to-Text API

Audio-only transcription without the rest of the video annotation feature set

Choose Speech-to-Text when only transcription is needed; choose Video Intelligence when transcription must be aligned with shots, labels, or explicit content detection.

Complementary

Cloud Translation API

Translate transcripts produced by Video Intelligence into other languages

Choose Translation when an agent needs to localize the transcript that Video Intelligence returns.

Complementary

Cloud Storage API

Stage video files in a bucket so Video Intelligence can read them by URI

Choose Cloud Storage to upload and manage the source videos; Video Intelligence reads them via gs:// URIs.

FAQs

Specific to using Cloud Video Intelligence API API through Jentic.

What authentication does the Cloud Video Intelligence API use?

The Cloud Video Intelligence API uses OAuth 2.0 with the https://www.googleapis.com/auth/cloud-platform scope. Through Jentic, the OAuth token is stored encrypted in the MAXsystem vault and only a scoped reference is exposed to the agent at execution time.

Can I detect explicit content in videos with the Video Intelligence API?

Yes. Submit a POST /v1/videos:annotate with features set to EXPLICIT_CONTENT_DETECTION. The response (delivered via the long-running operation) contains per-frame likelihood values from VERY_UNLIKELY to VERY_LIKELY that a moderation pipeline can threshold.

What are the rate limits for the Cloud Video Intelligence API?

Default project quotas allow 5 concurrent annotation operations and 1,000 annotation requests per day, with file-size and length caps documented per feature. Higher quotas can be requested in the Google Cloud Console for production workloads.

How do I annotate a video through Jentic with the Video Intelligence API?

Install Jentic with pip install jentic, search for annotate a video, load the schema for POST /v1/videos:annotate, then call it with the Cloud Storage URI and the features array. Poll the returned operation name until done is true to retrieve the annotations.

Does the Video Intelligence API support inline video bytes or only Cloud Storage URIs?

Both. The annotate request accepts inputUri for a Cloud Storage object or inputContent for base64-encoded video bytes up to a per-feature size limit. Cloud Storage is recommended for files over a few megabytes to avoid request timeouts.

Why is my annotation operation taking so long to complete?

Annotation runtime scales with video length and the number of enabled features. SPEECH_TRANSCRIPTION and OBJECT_TRACKING are the slowest; expect minutes for a 10-minute clip with multiple features enabled. Use GET /v1/{+name}/operations to monitor progress and cancel runs that exceed your budget.