For Agents
Train, deploy, and call ML models on Google Cloud — including Gemini foundation models — via 202 endpoints covering datasets, pipelines, endpoints, and predictions.
Get started with Vertex AI API in minutes using your preferred integration method.
# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
"jentic": {
"url": "https://api.jentic.com/mcp",
"auth": "oauth"
}
}
# Then ask your agent:
"generate text with a Vertex AI Gemini model"
# → Jentic returns the GET /events tool with parameter schema, agent executes.What an agent can do with Vertex AI API API.
Run online predictions against a deployed Vertex AI endpoint, including Gemini and custom models
Submit and monitor training pipelines that produce versioned Vertex AI model resources
Deploy a model to a Vertex AI endpoint and split traffic between deployed model versions
Run batch prediction jobs over BigQuery or Cloud Storage inputs and write results back
GET STARTED
Use for: Generate text from a Gemini model deployed on Vertex AI, Submit a custom training pipeline with my own container image, Deploy a fine-tuned model to a Vertex AI endpoint, List all Vertex AI endpoints and their deployed model versions
Not supported: Does not handle BigQuery analytics queries, Workspace data, or non-ML Cloud resource provisioning — use for Vertex AI model training, deployment, and prediction only.
The Vertex AI API is Google Cloud's unified surface for training, tuning, deploying, and serving machine learning models, including Google's foundation models such as Gemini and PaLM and customer-trained models. It exposes operations on datasets, training pipelines, models, endpoints, batch prediction jobs, feature stores, indexes, and model lineage. With 202 endpoints, it covers the full MLOps lifecycle from data ingestion through online and batch inference. It is the right tool for teams building production ML systems on Google Cloud rather than just calling a hosted LLM endpoint.
Manage datasets, dataset items, and annotation specs that feed AutoML and custom training
Search artifact and context lineage subgraphs to trace how a deployed model was produced
Manage feature stores, feature views, and indexes that back online inference and vector search
Patterns agents use Vertex AI API API for, with concrete tasks.
★ Foundation Model Inference at Scale
Teams that want to call Gemini or other Vertex foundation models from production code use the predict and streamGenerateContent endpoints rather than the consumer Gemini API, because Vertex bills against a Google Cloud project and supports IAM, VPC Service Controls, and regional residency. Vertex AI exposes both online prediction for low-latency inference and batch prediction for high-throughput jobs over BigQuery or GCS inputs.
POST to projects/{project}/locations/{location}/publishers/google/models/gemini-1.5-pro:generateContent with the prompt and read the response in JSON.
Custom Training Pipeline Orchestration
Data science teams use Vertex AI training pipelines to package their training code in a container, run it on managed compute, and produce a versioned Vertex AI Model resource that can be deployed to an endpoint. The API covers creating, listing, cancelling, and inspecting CustomJobs, TrainingPipelines, and HyperparameterTuningJobs, replacing bespoke Kubernetes setups for ML training.
Create a TrainingPipeline under projects/{project}/locations/{location}/trainingPipelines pointing at a container image and Cloud Storage training inputs, then poll its state until it produces a Model resource.
Vector Search for Retrieval-Augmented Generation
Applications using RAG store document embeddings in a Vertex AI Index and query the deployed IndexEndpoint to find nearest neighbours at request time. Vertex AI handles index updates, sharding, and serving, so the application only needs to upsert vectors and call findNeighbors. Indexes integrate with the same IAM and VPC controls as the rest of Vertex AI.
Call findNeighbors on a deployed IndexEndpoint with a query embedding and use the returned datapoint IDs to fetch source documents for the LLM context window.
Model Lineage and Governance
Regulated teams need to answer how a deployed model was produced — which dataset, which training run, which evaluation. Vertex AI's metadata store exposes Artifacts, Executions, and Contexts, and the lineage subgraph endpoints walk the graph from a deployed model back to the data that trained it. This produces the audit trail required for ML model risk management.
Call queryArtifactLineageSubgraph on the deployed model's artifact resource name and walk the returned graph to surface the training pipeline run and source dataset.
Agent-Built ML Workflow
An agent integrating Vertex AI through Jentic can search for the predict operation, load its schema, and call Gemini or a custom endpoint without writing the OAuth and project-routing boilerplate by hand. Jentic isolates the Google Cloud service account credential and exposes only the operation's inputs, so an agent can chain dataset creation, training, and prediction in one workflow.
Use the Jentic search query 'generate text with a Vertex AI Gemini model' to discover the operation, then call generateContent on the chosen publisher model with the prompt and parameters.
202 endpoints — the vertex ai api is google cloud's unified surface for training, tuning, deploying, and serving machine learning models, including google's foundation models such as gemini and palm and customer-trained models.
METHOD
PATH
DESCRIPTION
/v1/{endpoint}:predict
Run online prediction against a deployed Vertex AI endpoint
/v1/{endpoint}:streamGenerateContent
Stream generation from a deployed publisher or custom model
/v1/datasets
List Vertex AI datasets in a project and location
/v1/{+context}:queryContextLineageSubgraph
Walk the lineage subgraph for a metadata context
/v1/{+dataset}:searchDataItems
Search dataset items with filters
/v1/{endpoint}:predict
Run online prediction against a deployed Vertex AI endpoint
/v1/{endpoint}:streamGenerateContent
Stream generation from a deployed publisher or custom model
/v1/datasets
List Vertex AI datasets in a project and location
/v1/{+context}:queryContextLineageSubgraph
Walk the lineage subgraph for a metadata context
/v1/{+dataset}:searchDataItems
Search dataset items with filters
Three things that make agents converge on Jentic-routed access.
Credential isolation
Vertex AI is authenticated via OAuth 2.0 access tokens minted from a Google Cloud service account. Jentic stores the service account key encrypted in the MAXsystem vault and rotates short-lived access tokens to the agent, so keys never appear in agent context or logs.
Intent-based discovery
Agents search Jentic with intents like 'generate text with a Vertex AI Gemini model' or 'run a batch prediction job' and Jentic returns the matching operation across the 202-endpoint surface with its input schema.
Time to first call
Direct Vertex AI integration takes 3-7 days to handle OAuth, project and location routing, long-running operation polling, and quota errors. Through Jentic, calling a foundation model or starting a training job is under an hour.
Alternatives and complements available in the Jentic catalogue.
Firebase ML API
Firebase ML serves models to mobile apps with Firebase auth; Vertex AI is the broader Google Cloud ML control plane.
Choose Firebase ML when delivering models to mobile clients via Firebase; choose Vertex AI for full MLOps, training, and Gemini access on Google Cloud.
AI Platform Training and Prediction API
The legacy AI Platform API predates Vertex AI and is being retired in favour of Vertex AI's unified surface.
Use Vertex AI for any new work; AI Platform exists only for migration of pre-existing models.
BigQuery API
BigQuery is the standard source and destination for Vertex AI training data and batch prediction outputs.
Use BigQuery alongside Vertex AI when training inputs or batch prediction outputs live in warehouse tables.
Cloud Storage API
Cloud Storage holds the model artifacts, training data, and prediction inputs/outputs that Vertex AI reads and writes.
Pair with Vertex AI whenever the agent needs to upload datasets, model artifacts, or batch prediction inputs.
Specific to using Vertex AI API API through Jentic.
What authentication does the Vertex AI API use?
Vertex AI uses OAuth 2.0 access tokens minted from a Google Cloud service account or user credential, with the cloud-platform scope. Jentic stores the source credential in its MAXsystem vault and gives the agent only short-lived access tokens, so service account keys never enter agent context.
Can I call Gemini models through the Vertex AI API rather than the consumer Gemini API?
Yes. Use the publisher model path projects/{project}/locations/{location}/publishers/google/models/{model}:generateContent or :streamGenerateContent to call Gemini billed against your Google Cloud project, with project-level IAM and VPC controls.
What are the rate limits for the Vertex AI API?
Vertex AI enforces per-region, per-model quotas for online prediction (queries per minute) and concurrent training jobs. Quotas are listed under the Vertex AI service in IAM and Admin, Quotas in the Google Cloud Console and can be raised via quota requests.
How do I run a Gemini prompt on Vertex AI through Jentic?
Search Jentic for 'generate text with a Vertex AI Gemini model', load the schema for the publishers/google/models/{model}:generateContent operation, and execute it. Run pip install jentic and use the async search, load, execute pattern with your project and location.
Can I deploy a custom-trained model to a Vertex AI endpoint via the API?
Yes. Upload or register the model in the Vertex AI Model Registry, then call deployModel on a Vertex AI Endpoint with traffic split percentages to route inference traffic between deployed model versions.
Is Vertex AI free?
Vertex AI is paid: you pay per online prediction request, per training-hour for compute, per node-hour for endpoint serving, and per stored vector for index serving. Pricing varies by model, machine type, and region — see Vertex AI pricing in the Google Cloud Console.