Dataflow API

Name: Dataflow API API
Brand: Google Cloud Dataflow API
Availability: InStock

✓ Official Vendor SpecAnalyticsData Pipelinesoauth241 EndpointsREST

For Agents

Launch Apache Beam pipelines from templates, monitor job state and worker metrics, and drain streaming jobs so an agent can run and govern Dataflow workloads.

Quickstart

Get started with Dataflow API in minutes using your preferred integration method.

# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
  "jentic": {
    "url": "https://api.jentic.com/mcp",
    "auth": "oauth"
  }
}

# Then ask your agent:
"launch a google cloud dataflow flex template"

# → Jentic returns the GET /events tool with parameter schema, agent executes.

Capabilities

What an agent can do with Dataflow API API.

Launch a Dataflow job from a Flex Template or classic template with custom parameters

List jobs in a project and region with filters by state and creation time

Retrieve job graph, current state, and per-stage metrics

Drain or cancel a running streaming job to release resources gracefully

GET STARTED

Start building with Dataflow API API

Explore with Jentic

View OpenAPI Document

Use for: I need to launch a Dataflow job from a Flex Template, List all currently running streaming jobs in europe-west1, Retrieve the state and metrics of a specific job, Drain a streaming job before redeploying it

Not supported: Does not author Apache Beam code, store data, or schedule recurring runs — use only to launch, monitor, drain, and snapshot Dataflow jobs.

Google Cloud Dataflow API is the control plane for streaming and batch Apache Beam pipelines on Google Cloud. It exposes endpoints to launch jobs from Flex or classic templates, list and inspect jobs and their workers, drain or cancel running streaming jobs, and update the parameters of in-flight pipelines. The API also surfaces job metrics, debug snapshots, and template metadata so platform teams can build dashboards and self-service launchers without bespoke Beam code. Jobs run on managed worker VMs in the chosen region, with autoscaling controlled per job.

Use Cases

Patterns agents use Dataflow API API for, with concrete tasks.

★ Self-Service Pipeline Launch

A data platform team publishes Flex Templates for common ETL shapes and a portal calls the Dataflow API to launch them with user-supplied parameters. Each launch returns a job ID and immediate state, then the portal polls the job for progress. End users start a pipeline without touching gcloud or Beam code.

Launch the Flex Template at gs://templates/etl-template-spec.json in us-central1 with parameters input='gs://raw/2026-06-10' and output='bq:proj.ds.facts'.

Streaming Job Lifecycle Management

A SRE runbook drains a streaming job, deploys a new version, and snapshots the previous state for rollback. The Dataflow API exposes drain and snapshot operations so the runbook completes without manual Console steps. Drains preserve in-flight messages instead of dropping them.

Drain streaming job 'jid-2026-06-10-abc' in us-central1 and create a snapshot named 'pre-deploy-snap' before launching the new version.

Failed Job Triage Bot

A Slack bot monitors Dataflow for jobs entering JOB_STATE_FAILED and posts the job ID, error log link, and last successful checkpoint. The Dataflow API supplies job state, current workers, and error messages so the bot can surface high-signal context. Engineers click through to logs only when needed.

List jobs in us-central1 with stateFilter=ACTIVE created in the last 24 hours, return any whose currentState is JOB_STATE_FAILED with their error messages.

Capacity and Cost Telemetry

A FinOps dashboard collects per-job worker counts, vCPU hours, and shuffled data from the Dataflow API metrics endpoint and aggregates them per team. Teams see cost drivers without opening individual job pages. Decisions to right-size workers are grounded in actual Dataflow telemetry.

Get the metrics for job 'jid-2026-06-10-xyz' and return totalVcpuTime and currentNumWorkers.

AI Agent Pipeline Operator

An on-call AI agent gets a 'job lagging' alert, asks Jentic for the Dataflow API operations needed, retrieves current watermark and worker count, and drains/relaunches the job with more workers. Jentic isolates the Google service account credential so raw keys never enter the agent context.

For job 'jid-2026-06-10-zzz', get its watermark and worker count, drain if watermark lag exceeds 5 minutes, and relaunch the same Flex Template with maxWorkers doubled.

Key Endpoints

41 endpoints — google cloud dataflow api is the control plane for streaming and batch apache beam pipelines on google cloud.

METHOD

PATH

DESCRIPTION

GET

/v1b3/projects/{projectId}/jobs

List jobs in a project

GET

/v1b3/projects/{projectId}/jobs/{jobId}

Get a specific job's state and graph

POST

/v1b3/projects/{projectId}/locations/{location}/flexTemplates:launch

Launch a job from a Flex Template

POST

/v1b3/projects/{projectId}/jobs/{jobId}/debug/sendCapture

Send a debug capture for a job

GET

/v1b3/projects/{projectId}/jobs/{jobId}/debug/getConfig

Get the debug config for a job

GET

/v1b3/projects/{projectId}/jobs

List jobs in a project

GET

/v1b3/projects/{projectId}/jobs/{jobId}

Get a specific job's state and graph

POST

/v1b3/projects/{projectId}/locations/{location}/flexTemplates:launch

Launch a job from a Flex Template

POST

/v1b3/projects/{projectId}/jobs/{jobId}/debug/sendCapture

Send a debug capture for a job

GET

/v1b3/projects/{projectId}/jobs/{jobId}/debug/getConfig

Get the debug config for a job

Why though Jentic?

Three things that make agents converge on Jentic-routed access.

Credential isolation

Google service account credentials are stored encrypted in the Jentic vault. Agents call dataflow.googleapis.com with short-lived OAuth access tokens with the cloud-platform scope, never the underlying JSON key.

Intent-based discovery

Agents search Jentic for 'launch dataflow pipeline' and Jentic returns the flexTemplates.launch and jobs.update operations with their full request schemas, including parameter maps and requestedState values.

Time to first call

Direct integration: 2-3 days to handle OAuth, regional endpoints, Flex Template parameter validation, and long-running operation polling. Through Jentic: under 1 hour for the same scope of pipeline orchestration.

Related APIs

Alternatives and complements available in the Jentic catalogue.

Complementary

Google BigQuery API

Common sink and source for Dataflow pipelines

Pair with Dataflow when the pipeline reads from or writes to BigQuery tables

Complementary

Google Cloud Pub/Sub API

Streaming source for low-latency Dataflow jobs

Use Pub/Sub topics as the input to streaming Dataflow pipelines

Alternative

Google Cloud Dataproc API

Managed Spark and Hadoop alternative for batch data processing

Choose Dataproc when teams already use Spark/Hadoop; choose Dataflow for Beam unified streaming and batch

Alternative

Google Cloud Data Fusion API

Visual pipeline builder that compiles to Dataflow under the hood

Choose Data Fusion when low-code authoring is preferred; choose Dataflow when full Beam control is required

FAQs

Specific to using Dataflow API API through Jentic.

What authentication does the Google Cloud Dataflow API use?

The Dataflow API uses Google OAuth 2.0 with the cloud-platform scope. Through Jentic the service account credentials are stored encrypted in the Jentic vault and the agent receives scoped, short-lived access tokens per request.

Can I launch a Dataflow job from a Flex Template using only the Dataflow API?

Yes. Call /v1b3/projects/{projectId}/locations/{location}/flexTemplates:launch with the template GCS spec URI and a parameters map. The response contains the new job's ID and current state, and you can poll the job endpoint until it reaches JOB_STATE_RUNNING.

What are the rate limits for the Google Cloud Dataflow API?

Dataflow applies per-project quotas on jobs.create and jobs.list, plus per-region worker and vCPU quotas that limit how many concurrent jobs can run. Inspect the Cloud Console Quotas page for the precise limits in your project.

How do I drain a streaming job through Jentic?

Run pip install jentic, search Jentic for 'drain dataflow streaming job', load the schema for jobs.update on dataflow.googleapis.com, and execute it with the job ID and requestedState=JOB_STATE_DRAINED.

Does the Dataflow API let me write or compile Apache Beam code?

No. This API is a control plane: launching, monitoring, draining, and snapshotting jobs whose Beam graph is supplied as a template or compiled artifact. Beam pipeline development happens locally with the Beam SDK before publishing a template.