Groq API

Name: Groq API API
Brand: Groq API
Availability: InStock

✓ Official Vendor SpecAI/MLLanguage Modelsbearer6 EndpointsREST

For Agents

Run chat completions, audio transcription, translation, and embeddings on Groq's low-latency LPU inference using OpenAI-compatible endpoints.

Quickstart

Get started with Groq API in minutes using your preferred integration method.

# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
  "jentic": {
    "url": "https://api.jentic.com/mcp",
    "auth": "oauth"
  }
}

# Then ask your agent:
"run a chat completion on Groq"

# → Jentic returns the GET /events tool with parameter schema, agent executes.

Capabilities

What an agent can do with Groq API API.

Run chat completions with POST /openai/v1/chat/completions

List the models available on Groq via GET /openai/v1/models

Inspect a single model with GET /openai/v1/models/{model}

Transcribe audio to text with POST /openai/v1/audio/transcriptions

Translate audio into English text with POST /openai/v1/audio/translations

GET STARTED

Start building with Groq API API

Explore with Jentic

View OpenAPI Document

Use Cases

Patterns agents use Groq API API for, with concrete tasks.

★ Low-Latency Chat Inference

Run high-throughput chat completions when latency matters more than the largest possible context window. Agents call POST /openai/v1/chat/completions with the same payload shape they would send to OpenAI; Groq's LPU hardware typically returns tokens at far higher tokens-per-second than GPU inference, useful for live agent loops.

Call POST /openai/v1/chat/completions with model=llama-3.3-70b-versatile and the user's messages, then return the assistant message content.

Audio Transcription Pipelines

Transcribe meeting recordings or voice notes with Groq-hosted Whisper. POST /openai/v1/audio/transcriptions accepts an audio file and returns the transcript, while /audio/translations returns an English translation. Useful for batch transcription where speed and cost both matter.

Upload the audio file to POST /openai/v1/audio/transcriptions with model=whisper-large-v3 and persist the returned transcript to the case record.

Embeddings for Semantic Search

Generate vector embeddings for documents and queries using Groq's embeddings endpoint, then store them in a vector database for retrieval. POST /openai/v1/embeddings returns the same response shape as OpenAI's embeddings endpoint, so the consuming code is largely identical.

Call POST /openai/v1/embeddings for each chunk of a document with the chosen embedding model, then upsert the vectors into the project's vector store.

AI Agent Model Routing

An agent that routes tasks to the cheapest viable model uses Jentic to call Groq for low-latency steps and falls back to other providers when needed. Jentic's intent search returns the right Groq operation by description, so the agent does not need to hard-code OpenAI-compatible paths in multiple code paths.

Search Jentic for 'run a chat completion on Groq', load the POST /openai/v1/chat/completions schema, and execute it with the chosen Groq model name.

Key Endpoints

6 endpoints — groq runs open-weight large language models on its custom lpu inference hardware, exposing an openai-compatible api surface.

METHOD

PATH

DESCRIPTION

POST

/openai/v1/chat/completions

Run a chat completion

GET

/openai/v1/models

List available models

GET

/openai/v1/models/{model}

Get a single model's metadata

POST

/openai/v1/audio/transcriptions

Transcribe audio to text

POST

/openai/v1/audio/translations

Translate audio to English text

POST

/openai/v1/embeddings

Create text embeddings

POST

/openai/v1/chat/completions

Run a chat completion

GET

/openai/v1/models

List available models

GET

/openai/v1/models/{model}

Get a single model's metadata

POST

/openai/v1/audio/transcriptions

Transcribe audio to text

POST

/openai/v1/audio/translations

Translate audio to English text

Why though Jentic?

Three things that make agents converge on Jentic-routed access.

Credential isolation

The Groq bearer API key is stored encrypted in the Jentic MAXsystem vault. The executor injects the Authorization: Bearer header at call time, so the raw key never enters the agent's prompt or logs — important when many agent operations share the same Groq key.

Intent-based discovery

Agents search by intent — 'run a chat completion', 'transcribe audio', 'create embeddings' — and Jentic returns the matching Groq operation with its parameter schema, so the agent calls the right OpenAI-compatible path without hard-coding URLs.

Time to first call

Direct integration: a few hours since the API mirrors OpenAI's shape. Through Jentic: minutes — search, load schema, execute — with the upside of reusing the same agent code across other LLM providers.

Related APIs

Alternatives and complements available in the Jentic catalogue.

Alternative

OpenAI API

Closed-weight frontier models on the same OpenAI-compatible surface Groq mirrors.

Choose OpenAI when the workload needs frontier proprietary models; choose Groq for low-latency open-weight inference at typically lower cost.

Alternative

Anthropic Messages API

Claude family of models with a different request shape than OpenAI's.

Use Anthropic for Claude-specific behaviour or longer-context tasks; use Groq for high-throughput Llama or Mixtral calls.

Alternative

Mistral API

Mistral's hosted models on its own API surface.

Pick Mistral for first-party access to Mistral models with their full feature set; pick Groq when LPU-class latency on Llama/Mixtral matters more.

FAQs

Specific to using Groq API API through Jentic.

What authentication does the Groq API use?

Groq uses an HTTP bearer token issued from the Groq console. Through Jentic, the key is stored encrypted in the MAXsystem vault and the executor injects the Authorization: Bearer header at call time, so the raw key never enters the agent's context window.

Is the Groq API OpenAI-compatible?

Yes. The paths are mounted under /openai/v1 (e.g. /openai/v1/chat/completions, /openai/v1/embeddings) and the request and response bodies match OpenAI's. Existing OpenAI client code can usually target Groq with only a base URL change.

What are the rate limits for the Groq API?

The OpenAPI spec does not publish explicit rate limits — Groq enforces them per API key based on the account tier. Production agents should retry on 429 with exponential backoff and respect any Retry-After header.

How do I run a chat completion on Groq through Jentic?

Search Jentic for 'run a chat completion on Groq', load the POST /openai/v1/chat/completions schema, and execute it with a model name and messages array. Jentic injects the bearer token from the vault.

Can I transcribe audio with the Groq API?

Yes. POST /openai/v1/audio/transcriptions accepts an audio file and a model name (Groq hosts Whisper variants) and returns the transcript. POST /openai/v1/audio/translations returns an English translation of non-English audio.

Which models can I call on Groq?

Call GET /openai/v1/models to retrieve the current list — Groq updates the available models periodically and the spec does not pin specific names. GET /openai/v1/models/{model} returns the metadata for a single model.