For Agents
Run chat completions, audio transcription, translation, and embeddings on Groq's low-latency LPU inference using OpenAI-compatible endpoints.
Get started with Groq API in minutes using your preferred integration method.
# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
"jentic": {
"url": "https://api.jentic.com/mcp",
"auth": "oauth"
}
}
# Then ask your agent:
"run a chat completion on Groq"
# → Jentic returns the GET /events tool with parameter schema, agent executes.What an agent can do with Groq API API.
Run chat completions with POST /openai/v1/chat/completions
List the models available on Groq via GET /openai/v1/models
Inspect a single model with GET /openai/v1/models/{model}
Transcribe audio to text with POST /openai/v1/audio/transcriptions
Translate audio into English text with POST /openai/v1/audio/translations
GET STARTED
Use for: Run a chat completion on Llama 3 via Groq, List the models I can call on Groq, Transcribe an audio file using Groq, Translate a non-English audio clip to English text
Not supported: Does not handle image generation, fine-tuning, or assistants/threads — use for chat completions, audio transcription/translation, and embeddings only.
Groq runs open-weight large language models on its custom LPU inference hardware, exposing an OpenAI-compatible API surface. Agents can run chat completions, list available models, transcribe and translate audio, and create text embeddings — all through the same OpenAI v1 paths so existing OpenAI client code can target Groq with a base URL change. The API authenticates with a bearer token issued from the Groq console.
Create text embeddings with POST /openai/v1/embeddings
Patterns agents use Groq API API for, with concrete tasks.
★ Low-Latency Chat Inference
Run high-throughput chat completions when latency matters more than the largest possible context window. Agents call POST /openai/v1/chat/completions with the same payload shape they would send to OpenAI; Groq's LPU hardware typically returns tokens at far higher tokens-per-second than GPU inference, useful for live agent loops.
Call POST /openai/v1/chat/completions with model=llama-3.3-70b-versatile and the user's messages, then return the assistant message content.
Audio Transcription Pipelines
Transcribe meeting recordings or voice notes with Groq-hosted Whisper. POST /openai/v1/audio/transcriptions accepts an audio file and returns the transcript, while /audio/translations returns an English translation. Useful for batch transcription where speed and cost both matter.
Upload the audio file to POST /openai/v1/audio/transcriptions with model=whisper-large-v3 and persist the returned transcript to the case record.
Embeddings for Semantic Search
Generate vector embeddings for documents and queries using Groq's embeddings endpoint, then store them in a vector database for retrieval. POST /openai/v1/embeddings returns the same response shape as OpenAI's embeddings endpoint, so the consuming code is largely identical.
Call POST /openai/v1/embeddings for each chunk of a document with the chosen embedding model, then upsert the vectors into the project's vector store.
AI Agent Model Routing
An agent that routes tasks to the cheapest viable model uses Jentic to call Groq for low-latency steps and falls back to other providers when needed. Jentic's intent search returns the right Groq operation by description, so the agent does not need to hard-code OpenAI-compatible paths in multiple code paths.
Search Jentic for 'run a chat completion on Groq', load the POST /openai/v1/chat/completions schema, and execute it with the chosen Groq model name.
6 endpoints — groq runs open-weight large language models on its custom lpu inference hardware, exposing an openai-compatible api surface.
METHOD
PATH
DESCRIPTION
/openai/v1/chat/completions
Run a chat completion
/openai/v1/models
List available models
/openai/v1/models/{model}
Get a single model's metadata
/openai/v1/audio/transcriptions
Transcribe audio to text
/openai/v1/audio/translations
Translate audio to English text
/openai/v1/embeddings
Create text embeddings
/openai/v1/chat/completions
Run a chat completion
/openai/v1/models
List available models
/openai/v1/models/{model}
Get a single model's metadata
/openai/v1/audio/transcriptions
Transcribe audio to text
/openai/v1/audio/translations
Translate audio to English text
Three things that make agents converge on Jentic-routed access.
Credential isolation
The Groq bearer API key is stored encrypted in the Jentic MAXsystem vault. The executor injects the Authorization: Bearer header at call time, so the raw key never enters the agent's prompt or logs — important when many agent operations share the same Groq key.
Intent-based discovery
Agents search by intent — 'run a chat completion', 'transcribe audio', 'create embeddings' — and Jentic returns the matching Groq operation with its parameter schema, so the agent calls the right OpenAI-compatible path without hard-coding URLs.
Time to first call
Direct integration: a few hours since the API mirrors OpenAI's shape. Through Jentic: minutes — search, load schema, execute — with the upside of reusing the same agent code across other LLM providers.
Alternatives and complements available in the Jentic catalogue.
OpenAI API
Closed-weight frontier models on the same OpenAI-compatible surface Groq mirrors.
Choose OpenAI when the workload needs frontier proprietary models; choose Groq for low-latency open-weight inference at typically lower cost.
Anthropic Messages API
Claude family of models with a different request shape than OpenAI's.
Use Anthropic for Claude-specific behaviour or longer-context tasks; use Groq for high-throughput Llama or Mixtral calls.
Mistral API
Mistral's hosted models on its own API surface.
Pick Mistral for first-party access to Mistral models with their full feature set; pick Groq when LPU-class latency on Llama/Mixtral matters more.
Specific to using Groq API API through Jentic.
What authentication does the Groq API use?
Groq uses an HTTP bearer token issued from the Groq console. Through Jentic, the key is stored encrypted in the MAXsystem vault and the executor injects the Authorization: Bearer header at call time, so the raw key never enters the agent's context window.
Is the Groq API OpenAI-compatible?
Yes. The paths are mounted under /openai/v1 (e.g. /openai/v1/chat/completions, /openai/v1/embeddings) and the request and response bodies match OpenAI's. Existing OpenAI client code can usually target Groq with only a base URL change.
What are the rate limits for the Groq API?
The OpenAPI spec does not publish explicit rate limits — Groq enforces them per API key based on the account tier. Production agents should retry on 429 with exponential backoff and respect any Retry-After header.
How do I run a chat completion on Groq through Jentic?
Search Jentic for 'run a chat completion on Groq', load the POST /openai/v1/chat/completions schema, and execute it with a model name and messages array. Jentic injects the bearer token from the vault.
Can I transcribe audio with the Groq API?
Yes. POST /openai/v1/audio/transcriptions accepts an audio file and a model name (Groq hosts Whisper variants) and returns the transcript. POST /openai/v1/audio/translations returns an English translation of non-English audio.
Which models can I call on Groq?
Call GET /openai/v1/models to retrieve the current list — Groq updates the available models periodically and the spec does not pin specific names. GET /openai/v1/models/{model} returns the metadata for a single model.
/openai/v1/embeddings
Create text embeddings