Vertex AI API

Name: Vertex AI API API
Brand: Vertex AI API
Availability: InStock

✓ Official Vendor SpecAI/MLMl Inferenceoauth2202 EndpointsREST

For Agents

Train, deploy, and call ML models on Google Cloud — including Gemini foundation models — via 202 endpoints covering datasets, pipelines, endpoints, and predictions.

Quickstart

Get started with Vertex AI API in minutes using your preferred integration method.

# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
  "jentic": {
    "url": "https://api.jentic.com/mcp",
    "auth": "oauth"
  }
}

# Then ask your agent:
"generate text with a Vertex AI Gemini model"

# → Jentic returns the GET /events tool with parameter schema, agent executes.

Capabilities

What an agent can do with Vertex AI API API.

Run online predictions against a deployed Vertex AI endpoint, including Gemini and custom models

Submit and monitor training pipelines that produce versioned Vertex AI model resources

Deploy a model to a Vertex AI endpoint and split traffic between deployed model versions

Run batch prediction jobs over BigQuery or Cloud Storage inputs and write results back

GET STARTED

Start building with Vertex AI API API

Explore with Jentic

View OpenAPI Document

Use for: Generate text from a Gemini model deployed on Vertex AI, Submit a custom training pipeline with my own container image, Deploy a fine-tuned model to a Vertex AI endpoint, List all Vertex AI endpoints and their deployed model versions

Not supported: Does not handle BigQuery analytics queries, Workspace data, or non-ML Cloud resource provisioning — use for Vertex AI model training, deployment, and prediction only.

The Vertex AI API is Google Cloud's unified surface for training, tuning, deploying, and serving machine learning models, including Google's foundation models such as Gemini and PaLM and customer-trained models. It exposes operations on datasets, training pipelines, models, endpoints, batch prediction jobs, feature stores, indexes, and model lineage. With 202 endpoints, it covers the full MLOps lifecycle from data ingestion through online and batch inference. It is the right tool for teams building production ML systems on Google Cloud rather than just calling a hosted LLM endpoint.

Use Cases

Patterns agents use Vertex AI API API for, with concrete tasks.

★ Foundation Model Inference at Scale

Teams that want to call Gemini or other Vertex foundation models from production code use the predict and streamGenerateContent endpoints rather than the consumer Gemini API, because Vertex bills against a Google Cloud project and supports IAM, VPC Service Controls, and regional residency. Vertex AI exposes both online prediction for low-latency inference and batch prediction for high-throughput jobs over BigQuery or GCS inputs.

POST to projects/{project}/locations/{location}/publishers/google/models/gemini-1.5-pro:generateContent with the prompt and read the response in JSON.

Custom Training Pipeline Orchestration

Data science teams use Vertex AI training pipelines to package their training code in a container, run it on managed compute, and produce a versioned Vertex AI Model resource that can be deployed to an endpoint. The API covers creating, listing, cancelling, and inspecting CustomJobs, TrainingPipelines, and HyperparameterTuningJobs, replacing bespoke Kubernetes setups for ML training.

Create a TrainingPipeline under projects/{project}/locations/{location}/trainingPipelines pointing at a container image and Cloud Storage training inputs, then poll its state until it produces a Model resource.

Vector Search for Retrieval-Augmented Generation

Applications using RAG store document embeddings in a Vertex AI Index and query the deployed IndexEndpoint to find nearest neighbours at request time. Vertex AI handles index updates, sharding, and serving, so the application only needs to upsert vectors and call findNeighbors. Indexes integrate with the same IAM and VPC controls as the rest of Vertex AI.

Call findNeighbors on a deployed IndexEndpoint with a query embedding and use the returned datapoint IDs to fetch source documents for the LLM context window.

Model Lineage and Governance

Regulated teams need to answer how a deployed model was produced — which dataset, which training run, which evaluation. Vertex AI's metadata store exposes Artifacts, Executions, and Contexts, and the lineage subgraph endpoints walk the graph from a deployed model back to the data that trained it. This produces the audit trail required for ML model risk management.

Call queryArtifactLineageSubgraph on the deployed model's artifact resource name and walk the returned graph to surface the training pipeline run and source dataset.

Agent-Built ML Workflow

An agent integrating Vertex AI through Jentic can search for the predict operation, load its schema, and call Gemini or a custom endpoint without writing the OAuth and project-routing boilerplate by hand. Jentic isolates the Google Cloud service account credential and exposes only the operation's inputs, so an agent can chain dataset creation, training, and prediction in one workflow.

Use the Jentic search query 'generate text with a Vertex AI Gemini model' to discover the operation, then call generateContent on the chosen publisher model with the prompt and parameters.

Key Endpoints

202 endpoints — the vertex ai api is google cloud's unified surface for training, tuning, deploying, and serving machine learning models, including google's foundation models such as gemini and palm and customer-trained models.

METHOD

PATH

DESCRIPTION

POST

/v1/{endpoint}:predict

Run online prediction against a deployed Vertex AI endpoint

POST

/v1/{endpoint}:streamGenerateContent

Stream generation from a deployed publisher or custom model

GET

/v1/datasets

List Vertex AI datasets in a project and location

POST

/v1/{+context}:queryContextLineageSubgraph

Walk the lineage subgraph for a metadata context

POST

/v1/{+dataset}:searchDataItems

Search dataset items with filters

POST

/v1/{endpoint}:predict

Run online prediction against a deployed Vertex AI endpoint

POST

/v1/{endpoint}:streamGenerateContent

Stream generation from a deployed publisher or custom model

GET

/v1/datasets

List Vertex AI datasets in a project and location

POST

/v1/{+context}:queryContextLineageSubgraph

Walk the lineage subgraph for a metadata context

POST

/v1/{+dataset}:searchDataItems

Search dataset items with filters

Why though Jentic?

Three things that make agents converge on Jentic-routed access.

Credential isolation

Vertex AI is authenticated via OAuth 2.0 access tokens minted from a Google Cloud service account. Jentic stores the service account key encrypted in the MAXsystem vault and rotates short-lived access tokens to the agent, so keys never appear in agent context or logs.

Intent-based discovery

Agents search Jentic with intents like 'generate text with a Vertex AI Gemini model' or 'run a batch prediction job' and Jentic returns the matching operation across the 202-endpoint surface with its input schema.

Time to first call

Direct Vertex AI integration takes 3-7 days to handle OAuth, project and location routing, long-running operation polling, and quota errors. Through Jentic, calling a foundation model or starting a training job is under an hour.

Related APIs

Alternatives and complements available in the Jentic catalogue.

Alternative

Firebase ML API

Firebase ML serves models to mobile apps with Firebase auth; Vertex AI is the broader Google Cloud ML control plane.

Choose Firebase ML when delivering models to mobile clients via Firebase; choose Vertex AI for full MLOps, training, and Gemini access on Google Cloud.

Alternative

AI Platform Training and Prediction API

The legacy AI Platform API predates Vertex AI and is being retired in favour of Vertex AI's unified surface.

Use Vertex AI for any new work; AI Platform exists only for migration of pre-existing models.

Complementary

BigQuery API

BigQuery is the standard source and destination for Vertex AI training data and batch prediction outputs.

Use BigQuery alongside Vertex AI when training inputs or batch prediction outputs live in warehouse tables.

Complementary

Cloud Storage API

Cloud Storage holds the model artifacts, training data, and prediction inputs/outputs that Vertex AI reads and writes.

Pair with Vertex AI whenever the agent needs to upload datasets, model artifacts, or batch prediction inputs.

FAQs

Specific to using Vertex AI API API through Jentic.

What authentication does the Vertex AI API use?

Vertex AI uses OAuth 2.0 access tokens minted from a Google Cloud service account or user credential, with the cloud-platform scope. Jentic stores the source credential in its MAXsystem vault and gives the agent only short-lived access tokens, so service account keys never enter agent context.

Can I call Gemini models through the Vertex AI API rather than the consumer Gemini API?

Yes. Use the publisher model path projects/{project}/locations/{location}/publishers/google/models/{model}:generateContent or :streamGenerateContent to call Gemini billed against your Google Cloud project, with project-level IAM and VPC controls.

What are the rate limits for the Vertex AI API?

Vertex AI enforces per-region, per-model quotas for online prediction (queries per minute) and concurrent training jobs. Quotas are listed under the Vertex AI service in IAM and Admin, Quotas in the Google Cloud Console and can be raised via quota requests.

How do I run a Gemini prompt on Vertex AI through Jentic?

Search Jentic for 'generate text with a Vertex AI Gemini model', load the schema for the publishers/google/models/{model}:generateContent operation, and execute it. Run pip install jentic and use the async search, load, execute pattern with your project and location.

Can I deploy a custom-trained model to a Vertex AI endpoint via the API?

Yes. Upload or register the model in the Vertex AI Model Registry, then call deployModel on a Vertex AI Endpoint with traffic split percentages to route inference traffic between deployed model versions.

Is Vertex AI free?

Vertex AI is paid: you pay per online prediction request, per training-hour for compute, per node-hour for endpoint serving, and per stored vector for index serving. Pricing varies by model, machine type, and region — see Vertex AI pricing in the Google Cloud Console.