For Agents
Generate text responses with Cohere's Command models, create embeddings for semantic search, and rerank document lists by relevance. Supports tool use and streaming for agent workflows.
Get started with Cohere API in minutes using your preferred integration method.
# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
"jentic": {
"url": "https://api.jentic.com/mcp",
"auth": "oauth"
}
}
# Then ask your agent:
"generate text embeddings for semantic search"
# → Jentic returns the GET /events tool with parameter schema, agent executes.What an agent can do with Cohere API API.
Generate multi-turn chat responses with tool use and citation grounding
Produce vector embeddings optimized for either document storage or query matching
Rerank a list of documents by relevance to a given query
Stream partial chat responses token-by-token for real-time display
GET STARTED
Use for: I need to generate a text response with Cohere Command, I want to create embeddings for a batch of documents, Rerank these search results by relevance to my query, Generate a streaming chat completion with citations
Not supported: Does not handle vector storage, image generation, or audio processing — use for text generation, embeddings, and reranking only.
Jentic publishes the only available OpenAPI document for Cohere API, keeping it validated and agent-ready.
Jentic publishes the only available OpenAPI specification for Cohere API, keeping it validated and agent-ready. Generate text with conversational chat models, produce vector embeddings for semantic search and RAG pipelines, and rerank search results for relevance using 4 focused endpoints. The v2 API supports streaming responses, tool use in chat, and configurable embedding input types for document indexing versus search queries.
Configure embedding input types to distinguish indexing from retrieval contexts
Patterns agents use Cohere API API for, with concrete tasks.
★ AI Agent RAG with Jentic Discovery
AI agents discover Cohere's embed and rerank endpoints through Jentic's intent-based search to build retrieval-augmented generation pipelines. Agents generate embeddings with input_type set to 'search_document' for indexing, then use 'search_query' at retrieval time and rerank candidates for final relevance scoring. The entire flow — embed, retrieve, rerank, generate — runs through Jentic without manual SDK configuration.
Search Jentic for 'rerank documents by relevance', load the POST /v2/rerank schema, and execute with a query and 20 candidate documents to get relevance scores
Semantic Search Embeddings
Generate high-dimensional vector embeddings using POST /v2/embed for semantic search, clustering, and classification. Cohere's embed models support configurable input_type parameters — use 'search_document' when indexing and 'search_query' when searching — which optimizes vector quality for asymmetric retrieval. Processes batches of texts in a single call for efficient bulk embedding generation.
Generate embeddings for 50 document chunks via POST /v2/embed with model embed-english-v3.0 and input_type 'search_document'
Conversational AI with Grounded Citations
Build conversational interfaces using POST /v2/chat that ground responses in provided documents and return inline citations. Cohere's Command models support multi-turn context, connectors for real-time document retrieval, and tool use for agent orchestration. Streaming mode delivers tokens incrementally for responsive user experiences.
Send a multi-turn conversation to POST /v2/chat with model command-r-plus, include 3 documents for grounding, and parse citation annotations from the response
Search Result Reranking
Improve search precision by reranking candidate documents using POST /v2/rerank. Pass a query and a list of text documents, and receive relevance scores that reorder results by semantic match quality. Works with any upstream retrieval system — BM25, vector search, or hybrid — to boost the most relevant results to the top without re-indexing.
Rerank 25 candidate documents against a user query via POST /v2/rerank with model rerank-english-v3.0 and return the top 5 by relevance score
4 endpoints — jentic publishes the only available openapi specification for cohere api, keeping it validated and agent-ready.
METHOD
PATH
DESCRIPTION
/v2/chat
Generate chat responses with tool use and citations
/v2/embed
Create vector embeddings for text inputs
/v2/rerank
Rerank documents by relevance to a query
/v1/chat
Legacy chat endpoint (v1 compatibility)
/v2/chat
Generate chat responses with tool use and citations
/v2/embed
Create vector embeddings for text inputs
/v2/rerank
Rerank documents by relevance to a query
/v1/chat
Legacy chat endpoint (v1 compatibility)
Three things that make agents converge on Jentic-routed access.
Credential isolation
Cohere Bearer tokens are stored encrypted in the Jentic vault (MAXsystem). Agents receive scoped access tokens — raw API keys never enter the agent's context or appear in logs.
Intent-based discovery
Agents search by intent (e.g., 'rerank documents by relevance') and Jentic returns matching Cohere operations with their input schemas, including input_type options and model selections, so the agent can call the right endpoint without browsing docs.
Time to first call
Direct Cohere integration: 1-2 days for auth setup, embedding pipeline configuration, and error handling. Through Jentic: under 30 minutes — search, load schema, execute.
Alternatives and complements available in the Jentic catalogue.
OpenAI API
Broader model ecosystem with GPT-4o, DALL-E, Whisper, and 126 endpoints
Choose OpenAI when you need image generation, audio processing, or the widest model selection beyond text and embeddings
Mistral AI API
Open-weight European LLM with competitive pricing
Choose Mistral when you need EU data residency, open-weight models, or lower cost per token for high-volume text generation
Pinecone API
Vector database for storing and querying Cohere embeddings at scale
Use Pinecone alongside Cohere to store generated embeddings and perform low-latency similarity search in production RAG systems
Specific to using Cohere API API through Jentic.
Why is there no official OpenAPI spec for Cohere API?
Cohere does not publish an OpenAPI specification. Jentic generates and maintains this spec so that AI agents and developers can call Cohere API via structured tooling. It is validated against the live API and kept up to date. Get started at https://app.jentic.com/sign-up.
What authentication does the Cohere API use?
The Cohere API uses Bearer token authentication. You pass your API key in the Authorization header as 'Bearer {your-api-key}'. Through Jentic, your Cohere API key is stored encrypted in the MAXsystem vault and agents receive scoped access tokens without exposing the raw key.
Can I rerank search results with the Cohere API?
Yes. The POST /v2/rerank endpoint accepts a query and a list of documents, returning relevance scores for each. Use the rerank-english-v3.0 model for English content or rerank-multilingual-v3.0 for other languages. Pass up to 1000 documents per request with a configurable top_n parameter to limit results.
What are the rate limits for the Cohere API?
Rate limits depend on your plan tier. The Production tier allows 10,000 API calls per minute. Trial keys are limited to 20 requests per minute and 1000 per month. The API returns 429 status codes when limits are exceeded, with Retry-After headers indicating when to retry.
How do I generate embeddings for a RAG pipeline through Jentic?
Search Jentic for 'create text embeddings for search' to discover the POST /v2/embed operation. Set input_type to 'search_document' when indexing your corpus and 'search_query' when embedding user queries. This asymmetric configuration optimizes retrieval quality. Install with pip install jentic and use the search-load-execute flow.
What is the difference between input_type 'search_document' and 'search_query'?
The POST /v2/embed endpoint's input_type parameter tells the model whether you are embedding documents for storage or queries for retrieval. Use 'search_document' when indexing your corpus to optimize vectors for being found. Use 'search_query' when embedding a user's question to optimize for finding relevant documents. This asymmetric approach improves retrieval accuracy compared to using the same type for both.