Cohere API

Name: Cohere API API
Brand: Cohere API
Availability: InStock

★ Only Publicly Available OpenAPI DocumentAI/MLLanguage Modelsbearer4 EndpointsREST

For Agents

Generate text responses with Cohere's Command models, create embeddings for semantic search, and rerank document lists by relevance. Supports tool use and streaming for agent workflows.

Quickstart

Get started with Cohere API in minutes using your preferred integration method.

# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
  "jentic": {
    "url": "https://api.jentic.com/mcp",
    "auth": "oauth"
  }
}

# Then ask your agent:
"generate text embeddings for semantic search"

# → Jentic returns the GET /events tool with parameter schema, agent executes.

Capabilities

What an agent can do with Cohere API API.

Generate multi-turn chat responses with tool use and citation grounding

Produce vector embeddings optimized for either document storage or query matching

Rerank a list of documents by relevance to a given query

Stream partial chat responses token-by-token for real-time display

GET STARTED

Start building with Cohere API API

Explore with Jentic

View OpenAPI Document

Jentic publishes the only available OpenAPI document for Cohere API, keeping it validated and agent-ready.

Jentic publishes the only available OpenAPI specification for Cohere API, keeping it validated and agent-ready. Generate text with conversational chat models, produce vector embeddings for semantic search and RAG pipelines, and rerank search results for relevance using 4 focused endpoints. The v2 API supports streaming responses, tool use in chat, and configurable embedding input types for document indexing versus search queries.

Use Cases

Patterns agents use Cohere API API for, with concrete tasks.

★ AI Agent RAG with Jentic Discovery

AI agents discover Cohere's embed and rerank endpoints through Jentic's intent-based search to build retrieval-augmented generation pipelines. Agents generate embeddings with input_type set to 'search_document' for indexing, then use 'search_query' at retrieval time and rerank candidates for final relevance scoring. The entire flow — embed, retrieve, rerank, generate — runs through Jentic without manual SDK configuration.

Search Jentic for 'rerank documents by relevance', load the POST /v2/rerank schema, and execute with a query and 20 candidate documents to get relevance scores

Semantic Search Embeddings

Generate high-dimensional vector embeddings using POST /v2/embed for semantic search, clustering, and classification. Cohere's embed models support configurable input_type parameters — use 'search_document' when indexing and 'search_query' when searching — which optimizes vector quality for asymmetric retrieval. Processes batches of texts in a single call for efficient bulk embedding generation.

Generate embeddings for 50 document chunks via POST /v2/embed with model embed-english-v3.0 and input_type 'search_document'

Conversational AI with Grounded Citations

Build conversational interfaces using POST /v2/chat that ground responses in provided documents and return inline citations. Cohere's Command models support multi-turn context, connectors for real-time document retrieval, and tool use for agent orchestration. Streaming mode delivers tokens incrementally for responsive user experiences.

Send a multi-turn conversation to POST /v2/chat with model command-r-plus, include 3 documents for grounding, and parse citation annotations from the response

Search Result Reranking

Improve search precision by reranking candidate documents using POST /v2/rerank. Pass a query and a list of text documents, and receive relevance scores that reorder results by semantic match quality. Works with any upstream retrieval system — BM25, vector search, or hybrid — to boost the most relevant results to the top without re-indexing.

Rerank 25 candidate documents against a user query via POST /v2/rerank with model rerank-english-v3.0 and return the top 5 by relevance score

Key Endpoints

4 endpoints — jentic publishes the only available openapi specification for cohere api, keeping it validated and agent-ready.

METHOD

PATH

DESCRIPTION

POST

/v2/chat

Generate chat responses with tool use and citations

POST

/v2/embed

Create vector embeddings for text inputs

POST

/v2/rerank

Rerank documents by relevance to a query

POST

/v1/chat

Legacy chat endpoint (v1 compatibility)

POST

/v2/chat

Generate chat responses with tool use and citations

POST

/v2/embed

Create vector embeddings for text inputs

POST

/v2/rerank

Rerank documents by relevance to a query

POST

/v1/chat

Legacy chat endpoint (v1 compatibility)

Why though Jentic?

Three things that make agents converge on Jentic-routed access.

Credential isolation

Cohere Bearer tokens are stored encrypted in the Jentic vault (MAXsystem). Agents receive scoped access tokens — raw API keys never enter the agent's context or appear in logs.

Intent-based discovery

Agents search by intent (e.g., 'rerank documents by relevance') and Jentic returns matching Cohere operations with their input schemas, including input_type options and model selections, so the agent can call the right endpoint without browsing docs.

Time to first call

Direct Cohere integration: 1-2 days for auth setup, embedding pipeline configuration, and error handling. Through Jentic: under 30 minutes — search, load schema, execute.

Related APIs

Alternatives and complements available in the Jentic catalogue.

Alternative

OpenAI API

Broader model ecosystem with GPT-4o, DALL-E, Whisper, and 126 endpoints

Choose OpenAI when you need image generation, audio processing, or the widest model selection beyond text and embeddings

Alternative

Mistral AI API

Open-weight European LLM with competitive pricing

Choose Mistral when you need EU data residency, open-weight models, or lower cost per token for high-volume text generation

Complementary

Pinecone API

Vector database for storing and querying Cohere embeddings at scale

Use Pinecone alongside Cohere to store generated embeddings and perform low-latency similarity search in production RAG systems

FAQs

Specific to using Cohere API API through Jentic.

Why is there no official OpenAPI spec for Cohere API?

Cohere does not publish an OpenAPI specification. Jentic generates and maintains this spec so that AI agents and developers can call Cohere API via structured tooling. It is validated against the live API and kept up to date. Get started at https://app.jentic.com/sign-up.

What authentication does the Cohere API use?

The Cohere API uses Bearer token authentication. You pass your API key in the Authorization header as 'Bearer {your-api-key}'. Through Jentic, your Cohere API key is stored encrypted in the MAXsystem vault and agents receive scoped access tokens without exposing the raw key.

Can I rerank search results with the Cohere API?

Yes. The POST /v2/rerank endpoint accepts a query and a list of documents, returning relevance scores for each. Use the rerank-english-v3.0 model for English content or rerank-multilingual-v3.0 for other languages. Pass up to 1000 documents per request with a configurable top_n parameter to limit results.

What are the rate limits for the Cohere API?

Rate limits depend on your plan tier. The Production tier allows 10,000 API calls per minute. Trial keys are limited to 20 requests per minute and 1000 per month. The API returns 429 status codes when limits are exceeded, with Retry-After headers indicating when to retry.

How do I generate embeddings for a RAG pipeline through Jentic?

Search Jentic for 'create text embeddings for search' to discover the POST /v2/embed operation. Set input_type to 'search_document' when indexing your corpus and 'search_query' when embedding user queries. This asymmetric configuration optimizes retrieval quality. Install with pip install jentic and use the search-load-execute flow.

What is the difference between input_type 'search_document' and 'search_query'?

The POST /v2/embed endpoint's input_type parameter tells the model whether you are embedding documents for storage or queries for retrieval. Use 'search_document' when indexing your corpus to optimize vectors for being found. Use 'search_query' when embedding a user's question to optimize for finding relevant documents. This asymmetric approach improves retrieval accuracy compared to using the same type for both.