For Agents
Annotate videos for labels, shots, text, speech, and explicit content so an agent can build searchable metadata or moderate content without running its own ML.
Get started with Cloud Video Intelligence API in minutes using your preferred integration method.
# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
"jentic": {
"url": "https://api.jentic.com/mcp",
"auth": "oauth"
}
}
# Then ask your agent:
"annotate a video for labels and explicit content"
# → Jentic returns the GET /events tool with parameter schema, agent executes.What an agent can do with Cloud Video Intelligence API API.
Annotate a video for object tracking, label detection, and shot change detection
Run optical character recognition on text that appears on screen across a video
Transcribe spoken audio in supported languages with timestamps
Flag explicit content frame-by-frame for moderation pipelines
GET STARTED
Use for: I need to annotate a video for label detection, Transcribe the spoken audio in a marketing video stored in Cloud Storage, Detect explicit content in user-uploaded videos before publishing, Find every shot change in a long-form documentary
Not supported: Does not handle video editing, transcoding, or live streaming — use for asynchronous video annotation (labels, shots, text, speech, explicit content) only.
The Cloud Video Intelligence API analyzes video stored in Cloud Storage or supplied inline to detect objects, label content, identify shot changes, recognize on-screen text, transcribe speech, and flag explicit content. Annotation is performed asynchronously via long-running operations, with results returned per shot, per frame, or per segment depending on the requested feature. It powers content moderation, video search, and automated metadata pipelines.
Track long-running annotation operations from submission through completion
Cancel or delete an annotation operation that is no longer needed
Patterns agents use Cloud Video Intelligence API API for, with concrete tasks.
★ User-Generated Content Moderation
Platforms hosting user uploads run every video through the Video Intelligence API to flag explicit content, violence-adjacent labels, and profanity in transcribed speech before the asset goes live. The annotate endpoint accepts a Cloud Storage URI and returns a per-frame confidence score for each enabled feature, which the moderation pipeline thresholds before approving or routing to human review. This is dramatically faster than running an in-house vision model.
Annotate a Cloud Storage video for EXPLICIT_CONTENT_DETECTION and reject uploads with any frame above 0.8 confidence
Video Search and Metadata Indexing
Media libraries enrich every uploaded video with label, shot, and OCR annotations so editors can search across footage by subject, scene, or on-screen text. The API returns timestamped annotations that the indexer maps into a search engine document, turning raw video into queryable metadata. Annotation is asynchronous and free of model maintenance overhead.
Submit a videos:annotate request with LABEL_DETECTION, SHOT_CHANGE_DETECTION, and TEXT_DETECTION on a Cloud Storage URI
Automated Captioning and Transcripts
Publishers generate first-pass captions and searchable transcripts by enabling SPEECH_TRANSCRIPTION on the annotate request. The API returns word-level timing that the captioning pipeline converts into WebVTT or SRT files for upload to a video player. This dramatically reduces editor time on large back-catalogs.
Annotate a video with SPEECH_TRANSCRIPTION and convert the response into a WebVTT caption file
AI Agent Video Understanding
An AI agent integrated through Jentic answers prompts like 'summarize this clip' by submitting a Video Intelligence annotation, polling the long-running operation, and assembling the labels, shots, and transcripts into a structured summary. Because the API uses OAuth 2.0 with the cloud-platform scope, Jentic isolates the token in the MAXsystem vault and exposes only a scoped reference to the agent.
Search Jentic for annotate a video, submit annotation, and poll until the operation completes
8 endpoints — the cloud video intelligence api analyzes video stored in cloud storage or supplied inline to detect objects, label content, identify shot changes, recognize on-screen text, transcribe speech, and flag explicit content.
METHOD
PATH
DESCRIPTION
/v1/videos:annotate
Submit an annotation request for one or more features
/v1/{+name}/operations
List annotation operations
/v1/operations/{+name}:cancel
Cancel a running annotation operation
/v1/operations/{+name}
Delete a completed operation
/v1/videos:annotate
Submit an annotation request for one or more features
/v1/{+name}/operations
List annotation operations
/v1/operations/{+name}:cancel
Cancel a running annotation operation
/v1/operations/{+name}
Delete a completed operation
Three things that make agents converge on Jentic-routed access.
Credential isolation
Video Intelligence OAuth tokens with the cloud-platform scope are stored encrypted in the Jentic vault (MAXsystem). Agents receive scoped access tokens — raw OAuth tokens never enter the agent's context, which matters because the cloud-platform scope grants broad project-level access.
Intent-based discovery
Agents search Jentic with intents like 'transcribe a video' or 'detect explicit content' and Jentic returns videos:annotate with its full input schema, so the agent can submit a valid request without browsing Google's discovery doc.
Time to first call
Direct Video Intelligence integration: 2-3 days for OAuth setup, async operation polling, and feature selection. Through Jentic: under 1 hour — search, load schema, execute and poll.
Alternatives and complements available in the Jentic catalogue.
Cloud Vision API
Image-level annotation; Video Intelligence is the per-frame, per-shot equivalent for video
Choose Vision when the input is a still image; choose Video Intelligence when the input is a video file or stream.
Cloud Speech-to-Text API
Audio-only transcription without the rest of the video annotation feature set
Choose Speech-to-Text when only transcription is needed; choose Video Intelligence when transcription must be aligned with shots, labels, or explicit content detection.
Cloud Translation API
Translate transcripts produced by Video Intelligence into other languages
Choose Translation when an agent needs to localize the transcript that Video Intelligence returns.
Cloud Storage API
Stage video files in a bucket so Video Intelligence can read them by URI
Choose Cloud Storage to upload and manage the source videos; Video Intelligence reads them via gs:// URIs.
Specific to using Cloud Video Intelligence API API through Jentic.
What authentication does the Cloud Video Intelligence API use?
The Cloud Video Intelligence API uses OAuth 2.0 with the https://www.googleapis.com/auth/cloud-platform scope. Through Jentic, the OAuth token is stored encrypted in the MAXsystem vault and only a scoped reference is exposed to the agent at execution time.
Can I detect explicit content in videos with the Video Intelligence API?
Yes. Submit a POST /v1/videos:annotate with features set to EXPLICIT_CONTENT_DETECTION. The response (delivered via the long-running operation) contains per-frame likelihood values from VERY_UNLIKELY to VERY_LIKELY that a moderation pipeline can threshold.
What are the rate limits for the Cloud Video Intelligence API?
Default project quotas allow 5 concurrent annotation operations and 1,000 annotation requests per day, with file-size and length caps documented per feature. Higher quotas can be requested in the Google Cloud Console for production workloads.
How do I annotate a video through Jentic with the Video Intelligence API?
Install Jentic with pip install jentic, search for annotate a video, load the schema for POST /v1/videos:annotate, then call it with the Cloud Storage URI and the features array. Poll the returned operation name until done is true to retrieve the annotations.
Does the Video Intelligence API support inline video bytes or only Cloud Storage URIs?
Both. The annotate request accepts inputUri for a Cloud Storage object or inputContent for base64-encoded video bytes up to a per-feature size limit. Cloud Storage is recommended for files over a few megabytes to avoid request timeouts.
Why is my annotation operation taking so long to complete?
Annotation runtime scales with video length and the number of enabled features. SPEECH_TRANSCRIPTION and OBJECT_TRACKING are the slowest; expect minutes for a 10-minute clip with multiple features enabled. Use GET /v1/{+name}/operations to monitor progress and cancel runs that exceed your budget.