For Agents
Generate text completions, create embeddings, transcribe audio, and produce images using OpenAI models. Supports function calling, multi-modal inputs, and structured JSON output for agent workflows.
Get started with OpenAI API in minutes using your preferred integration method.
# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
"jentic": {
"url": "https://api.jentic.com/mcp",
"auth": "oauth"
}
}
# Then ask your agent:
"generate a chat completion with GPT-4o"
# → Jentic returns the GET /events tool with parameter schema, agent executes.What an agent can do with OpenAI API API.
Generate multi-turn chat completions with function calling and tool use
Create vector embeddings for semantic search and retrieval-augmented generation
Transcribe and translate audio files using Whisper models
Synthesize natural-sounding speech from text with multiple voice options
GET STARTED
Use for: I need to generate a chat completion with GPT-4o, I want to create embeddings for a set of documents, Transcribe this audio recording to text, Generate an image from a text prompt using DALL-E 3
Not supported: Does not handle vector storage, fine-tuned model hosting infrastructure, or real-time streaming WebSocket connections — use for AI model inference and training only.
Generate text, images, embeddings, and audio with 126 endpoints spanning chat completions, assistants, fine-tuning, batch processing, and moderation. Supports GPT-4o, GPT-4, DALL-E 3, Whisper, and TTS models through a single unified REST interface with bearer-token authentication. Handles multi-turn conversations, function calling, vision inputs, and JSON mode for structured outputs.
Generate and edit images using DALL-E 3
Fine-tune base models on custom training datasets with checkpointing
Run batch inference jobs for high-volume asynchronous processing
Patterns agents use OpenAI API API for, with concrete tasks.
★ AI Agent Tool Calling via Jentic
AI agents discover and invoke OpenAI chat completions through Jentic's intent-based search, enabling multi-step reasoning with function calling. Agents search for the operation by intent, receive the input schema, and execute completions without manual SDK setup. Supports tool_choice parameters for deterministic function invocation and parallel tool calls for complex agent workflows.
Search Jentic for 'generate a chat completion with function calling', load the /chat/completions schema, and execute a request with model gpt-4o and a weather tool definition
Retrieval-Augmented Generation Pipeline
Build RAG pipelines by generating vector embeddings with the /embeddings endpoint and combining them with chat completions for grounded answers. The text-embedding-3-small model produces 1536-dimensional vectors at low cost, while text-embedding-3-large offers higher accuracy for production retrieval systems. Agents can index thousands of documents and query relevant chunks in a single orchestration flow.
Generate embeddings for 10 document chunks using text-embedding-3-small via POST /embeddings and store the resulting vectors
Audio Transcription and Translation
Transcribe audio files in 57 languages using the Whisper model through the /audio/transcriptions endpoint, or translate non-English audio directly to English text via /audio/translations. Accepts mp3, mp4, mpeg, mpga, m4a, wav, and webm formats up to 25 MB. Returns timestamped segments for subtitle generation or plain text for downstream processing.
Transcribe a 5-minute WAV audio file using POST /audio/transcriptions with model whisper-1 and return timestamped JSON segments
Batch Inference for High-Volume Processing
Process thousands of requests asynchronously at 50% reduced cost using the Batch API. Submit JSONL files containing chat completion, embedding, or moderation requests via POST /batches, then poll for completion. Results are returned as downloadable output files. Ideal for nightly report generation, bulk content moderation, or large-scale embedding jobs that do not require real-time responses.
Upload a JSONL file with 500 chat completion requests via POST /files, create a batch job via POST /batches, and poll GET /batches/{batch_id} until completion
Custom Model Fine-Tuning
Fine-tune GPT-4o-mini or GPT-3.5-turbo on domain-specific training data to improve accuracy for specialized tasks like classification, extraction, or tone matching. Upload training JSONL via /files, launch a fine-tuning job via POST /fine_tuning/jobs, monitor with checkpoints, and deploy the resulting model for inference. Supports validation files and hyperparameter configuration.
Upload a training JSONL file via POST /files with purpose 'fine-tune', then create a fine-tuning job via POST /fine_tuning/jobs with model gpt-4o-mini-2024-07-18
126 endpoints — generate text, images, embeddings, and audio with 126 endpoints spanning chat completions, assistants, fine-tuning, batch processing, and moderation.
METHOD
PATH
DESCRIPTION
/chat/completions
Generate a chat completion with optional function calling
/embeddings
Create vector embeddings from input text
/audio/transcriptions
Transcribe audio to text using Whisper
/audio/speech
Generate speech audio from text input
/images/generations
Generate images from text prompts
/fine_tuning/jobs
Create a model fine-tuning job
/batches
Create an asynchronous batch processing job
/models
List all available models
/chat/completions
Generate a chat completion with optional function calling
/embeddings
Create vector embeddings from input text
/audio/transcriptions
Transcribe audio to text using Whisper
/audio/speech
Generate speech audio from text input
/images/generations
Generate images from text prompts
Three things that make agents converge on Jentic-routed access.
Credential isolation
OpenAI Bearer tokens are stored encrypted in the Jentic vault (MAXsystem). Agents receive scoped access tokens — raw API keys (sk-...) never enter the agent's context or logs.
Intent-based discovery
Agents search by intent (e.g., 'generate a chat completion with function calling') and Jentic returns matching OpenAI operations with their full input schemas, including tool definitions and response_format options, so the agent can invoke the right endpoint without parsing SDK docs.
Time to first call
Direct OpenAI integration: 1-3 days for auth setup, error handling, retry logic, and model routing. Through Jentic: under 1 hour — search for the operation, load the schema, execute.
Alternatives and complements available in the Jentic catalogue.
Anthropic API
Claude models with 200K context window, focused on safety and instruction-following
Choose Anthropic when you need longer context windows (200K tokens), stronger instruction adherence, or prefer Claude's reasoning style for complex analytical tasks
Cohere API
Enterprise-focused LLM API with native RAG support and reranking
Choose Cohere when you need built-in reranking for search results, enterprise data privacy requirements, or Command models fine-tuned for business tasks
Mistral AI API
Open-weight models with competitive performance at lower cost
Choose Mistral when you need cost-efficient inference, open-weight model flexibility, or EU data residency for GDPR compliance
Pinecone API
Vector database for storing and querying OpenAI embeddings
Use Pinecone alongside OpenAI to store generated embeddings and perform similarity search at scale in RAG pipelines
ElevenLabs API
Advanced voice synthesis and cloning beyond OpenAI's TTS capabilities
Use ElevenLabs when you need voice cloning, multilingual voice synthesis with emotion control, or higher-quality audio output than OpenAI TTS
Specific to using OpenAI API API through Jentic.
What authentication does the OpenAI API use?
The OpenAI API uses Bearer token authentication. You pass your API key in the Authorization header as 'Bearer sk-...'. Through Jentic, your OpenAI API key is stored encrypted in the MAXsystem vault and agents receive scoped access tokens, so the raw secret key never enters the agent context.
Can I generate structured JSON output with the OpenAI API?
Yes. The POST /chat/completions endpoint supports a response_format parameter set to 'json_object' which constrains the model to output valid JSON. For stricter schemas, use function calling with a JSON Schema definition to guarantee the output matches your expected structure.
What are the rate limits for the OpenAI API?
Rate limits vary by model tier and organization. GPT-4o allows up to 10,000 requests per minute on Tier 5 accounts, while lower tiers start at 500 RPM. Embedding models support higher throughput. The API returns x-ratelimit-remaining and x-ratelimit-reset headers with each response for programmatic tracking.
How do I generate embeddings for a RAG pipeline through Jentic?
Search Jentic for 'create text embeddings' to discover the POST /embeddings operation. Load the schema, then execute with model 'text-embedding-3-small' and your input texts. Jentic returns the operation schema with parameters so your agent can call it directly without browsing OpenAI documentation. Install the SDK with pip install jentic to get started.
What models are available through the OpenAI API?
The GET /models endpoint lists all available models. Current options include GPT-4o and GPT-4o-mini for chat, text-embedding-3-small and text-embedding-3-large for embeddings, whisper-1 for transcription, tts-1 and tts-1-hd for speech synthesis, and dall-e-3 for image generation. Fine-tuned models also appear in this list after training completes.
Can I use function calling to connect GPT to external tools?
Yes. Pass a tools array to POST /chat/completions with function definitions including name, description, and a JSON Schema for parameters. The model returns a tool_calls array when it decides to invoke a function. You execute the function and pass results back in a follow-up message with role 'tool'. Supports parallel tool calls and forced tool_choice.
Is the OpenAI API free to use?
OpenAI offers a free tier with limited credits for new accounts. After that, pricing is pay-per-token: GPT-4o costs $2.50 per million input tokens and $10 per million output tokens. Embedding models are significantly cheaper at $0.02 per million tokens for text-embedding-3-small. The Batch API provides a 50% discount for non-real-time workloads.
/fine_tuning/jobs
Create a model fine-tuning job
/batches
Create an asynchronous batch processing job
/models
List all available models