For Agents
Run OCR, label detection, face detection, landmark recognition, and explicit content checks on images and PDFs so an agent can extract structured data from visual content.
Get started with Cloud Vision API in minutes using your preferred integration method.
# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
"jentic": {
"url": "https://api.jentic.com/mcp",
"auth": "oauth"
}
}
# Then ask your agent:
"extract text from an image with OCR"
# → Jentic returns the GET /events tool with parameter schema, agent executes.What an agent can do with Cloud Vision API API.
Extract typed and handwritten text from images and PDFs with full OCR layout
Detect object labels and bounding boxes for inventory and content tagging
Recognize landmarks, logos, and well-known products in user-supplied images
Score images for adult, violent, racy, medical, and spoof content via SafeSearch
GET STARTED
Use for: I need to extract text from a scanned invoice, Detect labels in a user-uploaded image for tagging, Check whether an image contains explicit or violent content, Recognize the brand logo in a marketing photo
Not supported: Does not handle video annotation, image generation, or image editing — use for image and PDF analysis (OCR, labels, faces, logos, SafeSearch, product search) only.
The Cloud Vision API performs image and PDF analysis including label detection, OCR, face detection, landmark and logo recognition, explicit content (SafeSearch) detection, object localization, and product search. Requests can analyze images inline or stored in Cloud Storage and can be batched synchronously or run asynchronously for large PDF and TIFF documents. It powers content moderation, document digitization, and visual search workloads.
Run async batch annotation on large PDF and TIFF documents in Cloud Storage
Match a query image against a custom product catalog for visual search
Patterns agents use Cloud Vision API API for, with concrete tasks.
★ Document Digitization and OCR
Operations and back-office teams convert scanned invoices, contracts, and forms into searchable text using the Vision API's DOCUMENT_TEXT_DETECTION feature. The API returns the full hierarchical layout of pages, blocks, paragraphs, words, and symbols with confidence scores, which the digitization pipeline maps into structured records. Asynchronous batch endpoints handle large multi-page PDFs without blocking caller threads.
Submit a files:asyncBatchAnnotate with DOCUMENT_TEXT_DETECTION on a 200-page PDF in gs://invoices/q3.pdf
User-Generated Image Moderation
Marketplaces and social platforms screen every user-uploaded photo with SafeSearch and label detection before publishing. The images:annotate endpoint returns likelihood ratings for adult, violent, racy, medical, and spoof content alongside detected labels, letting the moderation pipeline auto-block clear violations and route ambiguous cases to human review. Synchronous mode keeps response times suitable for upload flows.
Annotate an image with SAFE_SEARCH_DETECTION and reject if adult or violent likelihood is LIKELY or VERY_LIKELY
Retail Visual Search
Retailers index their product catalog into a Vision Product Search corpus, then accept query images at runtime to return visually similar SKUs ranked by similarity. The productSearch annotate path matches against the configured product set and returns matching product IDs with bounding boxes for each detected object in the query image. This powers in-app 'find similar products' features without training a custom model.
Annotate a customer photo with PRODUCT_SEARCH and return the top three matching SKUs from the home goods product set
AI Agent Image Understanding
An AI agent integrated through Jentic answers prompts like 'what is in this photo?' or 'is this image safe to publish?' by discovering the Vision API by intent search, calling images:annotate with the relevant feature set, and returning the structured response. Because the API uses OAuth 2.0 with the cloud-platform scope, Jentic isolates the token in the MAXsystem vault and exposes only a scoped reference.
Search Jentic for analyze an image, load the schema, and call images:annotate with LABEL_DETECTION and SAFE_SEARCH_DETECTION
23 endpoints — the cloud vision api performs image and pdf analysis including label detection, ocr, face detection, landmark and logo recognition, explicit content (safesearch) detection, object localization, and product search.
METHOD
PATH
DESCRIPTION
/v1/images:annotate
Synchronous batch annotation of one or more images
/v1/images:asyncBatchAnnotate
Asynchronous batch annotation for large image sets
/v1/files:annotate
Synchronous annotation of multi-page PDF and TIFF files
/v1/files:asyncBatchAnnotate
Asynchronous annotation of large PDF and TIFF documents in Cloud Storage
/v1/images:annotate
Synchronous batch annotation of one or more images
/v1/images:asyncBatchAnnotate
Asynchronous batch annotation for large image sets
/v1/files:annotate
Synchronous annotation of multi-page PDF and TIFF files
/v1/files:asyncBatchAnnotate
Asynchronous annotation of large PDF and TIFF documents in Cloud Storage
Three things that make agents converge on Jentic-routed access.
Credential isolation
Cloud Vision OAuth tokens are stored encrypted in the Jentic vault (MAXsystem). Agents receive scoped access tokens — raw OAuth tokens never enter the agent's context, which matters because the cloud-platform scope grants broad project-level access.
Intent-based discovery
Agents search Jentic with intents like 'extract text from an image' or 'detect labels' and Jentic returns the images:annotate operation with its full input schema, including the feature enum and image source options, so the agent can construct a valid request without reading Google's discovery doc.
Time to first call
Direct Cloud Vision integration: 2-3 days for OAuth setup, request batching, and feature-by-feature response parsing. Through Jentic: under 1 hour — search, load schema, execute.
Alternatives and complements available in the Jentic catalogue.
Cloud Video Intelligence API
Per-frame and per-shot annotation for video; Vision is the still-image equivalent
Choose Video Intelligence when the input is a video file; choose Vision when the input is an image or PDF page.
Cloud Translation API
Translate text extracted by Vision OCR into other languages
Choose Translation when an agent needs to localize OCR output that Vision returns.
Cloud Storage API
Stage images and PDFs in a bucket so Vision can read them by URI
Choose Cloud Storage to upload and manage the source files; Vision reads them via gs:// URIs.
Sensitive Data Protection (DLP) API
Scan OCR output for PII and PHI before storing or surfacing it
Choose DLP when an agent needs to redact sensitive content discovered in Vision OCR results.
Specific to using Cloud Vision API API through Jentic.
What authentication does the Cloud Vision API use?
The Cloud Vision API uses OAuth 2.0 with the https://www.googleapis.com/auth/cloud-platform or cloud-vision scope. Through Jentic, the OAuth token is stored encrypted in the MAXsystem vault and only a scoped reference is exposed to the agent at execution time.
Can I run OCR on a multi-page PDF with the Cloud Vision API?
Yes. Use POST /v1/files:asyncBatchAnnotate with DOCUMENT_TEXT_DETECTION to OCR a PDF or TIFF stored in Cloud Storage. The async endpoint returns an operation name; the final annotation result is written to a Cloud Storage destination you specify in the request.
What are the rate limits for the Cloud Vision API?
Default project quotas allow 1,800 requests per minute and 16 images per request, with feature-specific image-size and PDF-page caps. Higher quotas can be requested in the Google Cloud Console; pricing is per feature per image.
How do I detect explicit content in an image through Jentic with the Cloud Vision API?
Install Jentic with pip install jentic, search for detect explicit content in image, load the schema for POST /v1/images:annotate, then call it with features set to SAFE_SEARCH_DETECTION and the image source as either a Cloud Storage URI or base64 content. The response includes adult, violent, racy, medical, and spoof likelihood values.
Does the Cloud Vision API support handwriting recognition?
Yes. Use DOCUMENT_TEXT_DETECTION rather than the simpler TEXT_DETECTION feature; DOCUMENT_TEXT_DETECTION is tuned for dense text and handwriting and returns full document layout. Accuracy depends on legibility, contrast, and language.
Why does my Vision API request return INVALID_ARGUMENT for an inline image?
Inline image content must be valid base64-encoded bytes under the 10 MB request limit, and the image format must be one of JPEG, PNG, GIF, BMP, WEBP, RAW, ICO, PDF, or TIFF. For larger files, upload to Cloud Storage and pass the gs:// URI in image.source.imageUri instead.