Model Router

With Model Router, you can access the most popular models with a single endpoint and bill. Experiment with new models and scale your app without worrying about the underlying infrastructure.

Setup

Getting started with Model Router is simple. Generate an API key and drop it into your favorite framework.

Generate API key

API keys for Model Router are generated within your workspace. Generate a key by logging into the console and navigating to Model router → API keys.

Connect via framework

Model Router integrates easily into the most popular frameworks.

OpenAI SDK
Vercel AI SDK
Modus

Model Router is a drop-in replacement for OpenAI’s API.

import openai

# Configure with your Hypermode Workspace API key and Hypermode Model Router base url
client = openai.OpenAI(
    api_key="<YOUR_HYP_WKS_KEY>",
    base_url="https://models.hypermode.host/v1",
)

# Set up the request
response = client.chat.completions.create(
    model="meta-llama/llama-4-scout-17b-16e-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Modus?"},
    ],
    max_tokens=150,
    temperature=0.7,
)

# Print the response
print(response.choices[0].message.content)

Connect directly via API

You can also access the API directly.

Generation
Embedding

curl -X POST \
  https://models.hypermode.host/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $YOUR_HYP_WKS_KEY" \
  -d '{
    "model": "meta-llama/llama-4-scout-17b-16e-instruct",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Dgraph?"}
    ],
    "max_tokens": 150,
    "temperature": 0.7
  }'

Available models

Hypermode provides a variety of the most popular open source and commercial models.

We’re constantly evaluating model usage in determining new models to add to our catalog. Interested in using a model not listed here? Let us know at help@hypermode.com.

Model Introspection

The full list of available models is available via the API.

curl

curl -X POST \
  https://models.hypermode.host/v1/models \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR_HYP_WKS_KEY>"

The most popular models are included below for your convenience.

Generation

Large language models provide text generation and reasoning capabilities.

Provider	Model	Slug
Anthropic	Claude 4 Sonnet	`claude-sonnet-4-20250514`
Anthropic	Claude 4 Opus	`claude-opus-4-20250514`
Anthropic	Claude 3.7 Sonnet (latest)	`claude-3-7-sonnet-latest`
Anthropic	Claude 3.7 Sonnet	`claude-3-7-sonnet-20250219`
Anthropic	Claude 3.5 Sonnet (latest)	`claude-3-5-sonnet-latest`
Anthropic	Claude 3.5 Sonnet	`claude-3-5-sonnet-20241022`
Anthropic	Claude 3.5 Sonnet	`claude-3-5-sonnet-20240620`
Anthropic	Claude 3.5 Haiku (latest)	`claude-3-5-haiku-latest`
Anthropic	Claude 3.5 Haiku	`claude-3-5-haiku-20241022`
Anthropic	Claude 3 Opus (latest)	`claude-3-opus-latest`
Anthropic	Claude 3 Opus	`claude-3-opus-20240229`
Anthropic	Claude 3 Sonnet	`claude-3-sonnet-20240229`
Anthropic	Claude 3 Haiku	`claude-3-haiku-20240307`
DeepSeek	DeepSeek-R1-Distill-Llama	`deepseek-ai/deepseek-r1-distill-llama-70b`
Google	Gemini 2.5 Pro	`gemini-2.5-pro-exp-03-25`
Google	Gemini 2.5 Pro Preview	`gemini-2.5-pro-preview-05-06`
Google	Gemini 2.5 Flash Preview	`gemini-2.5-flash-preview-04-17`
Google	Gemini 2.0 Flash Lite	`gemini-2.0-flash-lite`
Google	Gemini 2.0 Flash Image Generation	`gemini-2.0-flash-exp-image-generation`
Google	Gemini 2.0 Flash Live	`gemini-2.0-flash-live-001`
Google	Gemini 2.0 Flash (latest)	`gemini-2.0-flash`
Google	Gemini 2.0 Flash	`gemini-2.0-flash-001`
Google	Gemini 1.5 Pro (latest)	`gemini-1.5-pro-latest`
Google	Gemini 1.5 Pro	`gemini-1.5-pro`
Google	Gemini 1.5 Pro	`gemini-1.5-pro-002`
Google	Gemini 1.5 Pro	`gemini-1.5-pro-001`
Google	Gemini 1.5 Flash (latest)	`gemini-1.5-flash-latest`
Google	Gemini 1.5 Flash	`gemini-1.5-flash`
Google	Gemini 1.5 Flash	`gemini-1.5-flash-002`
Google	Gemini 1.5 Flash	`gemini-1.5-flash-001`
Google	Gemini 1.5 Flash 8B (latest)	`gemini-1.5-flash-8b-latest`
Google	Gemini 1.5 Flash 8B	`gemini-1.5-flash-8b`
Google	Gemini 1.5 Flash 8B	`gemini-1.5-flash-8b-exp-0924`
Google	Gemini 1.5 Flash 8B	`gemini-1.5-flash-8b-exp-0827`
Google	Gemini 1.5 Flash 8B	`gemini-1.5-flash-8b-001`
Meta	Llama 4 Scout	`meta-llama/llama-4-scout-17b-16e-instruct`
Meta	Llama 3.3	`meta-llama/llama-3.3-70b-versatile`
OpenAI	O3 (latest)	`o3`
OpenAI	O3	`o3-2025-04-16`
OpenAI	O4 Mini (latest)	`o4-mini`
OpenAI	O4 Mini	`o4-mini-2025-04-16`
OpenAI	GPT 4.5 Preview (latest)	`gpt-4.5-preview`
OpenAI	GPT 4.5 Preview	`gpt-4.5-preview-2025-02-27`
OpenAI	O3 Mini (latest)	`o3-mini`
OpenAI	O3 Mini	`o3-mini-2025-01-31`
OpenAI	O1 (latest)	`o1`
OpenAI	O1	`o1-2024-12-17`
OpenAI	O1 Preview (latest)	`o1-preview`
OpenAI	O1 Preview	`o1-preview-2024-09-12`
OpenAI	O1 Mini (latest)	`o1-mini`
OpenAI	O1 Mini	`o1-mini-2024-09-12`
OpenAI	GPT 4.1 (latest)	`gpt-4.1`
OpenAI	GPT 4.1	`gpt-4.1-2025-04-14`
OpenAI	GPT 4.1 Mini (latest)	`gpt-4.1-mini`
OpenAI	GPT 4.1 Mini	`gpt-4.1-mini-2025-04-14`
OpenAI	GPT 4.1 Nano (latest)	`gpt-4.1-nano`
OpenAI	GPT 4.1 Nano	`gpt-4.1-nano-2025-04-14`
OpenAI	GPT-4o Mini Search Preview (latest)	`gpt-4o-mini-search-preview`
OpenAI	GPT-4o Mini Search Preview	`gpt-4o-mini-search-preview-2025-03-11`
OpenAI	GPT 4o (latest)	`gpt-4o`
OpenAI	GPT 4o	`gpt-4o-2024-11-20`
OpenAI	GPT 4o	`gpt-4o-2024-08-06`
OpenAI	GPT 4o	`gpt-4o-2024-05-13`
OpenAI	GPT 4o Mini (latest)	`gpt-4o-mini`
OpenAI	GPT 4o Mini	`gpt-4o-mini-2024-07-18`
OpenAI	GPT 4o Audio Preview (latest)	`gpt-4o-audio-preview`
OpenAI	GPT 4o Audio Preview	`gpt-4o-audio-preview-2024-12-17`
OpenAI	GPT 4o Audio Preview	`gpt-4o-audio-preview-2024-10-01`
OpenAI	GPT 4o Search Preview (latest)	`gpt-4o-search-preview`
OpenAI	GPT 4o Search Preview	`gpt-4o-search-preview-2025-03-11`
OpenAI	GPT 4o Search Preview	`gpt-4o-search-preview-2025-03-11`
OpenAI	ChatGPT 4o	`chatgpt-4o-latest`
OpenAI	GPT 4 (latest)	`gpt-4`
OpenAI	GPT 4	`gpt-4-0613`
OpenAI	GPT 4 Turbo	`gpt-4-turbo-2024-04-09`
OpenAI	GPT 4 Turbo Preview	`gpt-4-turbo-preview`
OpenAI	GPT 4 Preview (latest)	`gpt-4-1106-preview`
OpenAI	GPT 4 Preview	`gpt-4-0125-preview`
OpenAI	GPT 3.5 Turbo (latest)	`gpt-3.5-turbo`
OpenAI	GPT 3.5 Turbo	`gpt-3.5-turbo-1106`
OpenAI	GPT 3.5 Turbo	`gpt-3.5-turbo-0125`
Mistral	Mistral Large (coming soon)	`mistral-large-latest`
Mistral	Pixtral Large (coming soon)	`pixtral-large-latest`
Mistral	Mistral Medium (coming soon)	`mistral-medium-latest`
Mistral	Mistral Moderation (coming soon)	`mistral-moderation-latest`
Mistral	Ministral 3B (coming soon)	`ministral-3b-latest`
Mistral	Ministral 8B (coming soon)	`ministral-8b-latest`
Mistral	Open Mistral Nemo (coming soon)	`open-mistral-nemo`
Mistral	Mistral Small (coming soon)	`mistral-small-latest`
Mistral	Mistral Saba (coming soon)	`mistral-saba-latest`
Mistral	Codestral (coming soon)	`codestral-latest`
xAI	Grok 3 Beta (coming soon)	`grok-3-beta`
xAI	Grok 3 Fast Beta (coming soon)	`grok-3-fast-beta`
xAI	Grok 3 Mini Beta (coming soon)	`grok-3-mini-beta`
xAI	Grok 3 Mini Fast Beta (coming soon)	`grok-3-mini-fast-beta`

Embedding

Embedding models provide vector representations of text for similarity matching and other applications.

Provider	Model	Slug
Nomic AI	Embed Text V1.5	`nomic-ai/nomic-embed-text-v1.5`
OpenAI	Embedding 3 Large	`text-embedding-3-large`
OpenAI	Embedding 3 Small	`text-embedding-3-small`
OpenAI	ADA Embedding	`text-embedding-ada-002`
Hugging Face	MiniLM-L6-v2 (coming soon)	`sentence-transformers/all-MiniLM-L6-v2`

Choosing the right model

Choosing the right model is essential to building effective agents. This section helps you evaluate trade-offs, pick the right model for your use case, and iterate quickly.

Key considerations

Accuracy and output quality: Advanced logic, mathematical problem-solving, and multi-step analysis may require high-capability models.
Domain expertise: Performance varies by domain (for example, creative writing, code, scientific analysis). Review model benchmarks or test with your own examples.
Context window: Long documents, extensive conversations, or large codebases require models with longer context windows.
Embeddings: For semantic search or similarity, consider embedding models. These aren’t for text generation.
Latency: Real-time apps may need low-latency responses. Smaller models (or “Mini,” “Nano,” and “Flash” variants) typically respond faster than larger models.

Models by task / use case at a glance

Task / use case	Example models	Key strengths	Considerations
General-purpose conversation	Claude 4 Sonnet, GPT-4.1, Gemini Pro	Balanced, reliable, creative	May not handle edge cases as well
Complex reasoning and research	Claude 4 Opus, O3, Gemini 2.5 Pro	Highest accuracy, multi-step analysis	Higher cost, quality critical
Creative writing and content	Claude 4 Opus, GPT-4.1, Gemini 2.5 Pro	High-quality output, creativity, style control	High cost for premium content
Document analysis and summarization	Claude 4 Opus, Gemini 2.5 Pro, Llama 3.3	Handles long inputs, comprehension	Higher cost, slower
Real-time apps	Claude 3.5 Haiku, GPT-4o Mini, Gemini 1.5 Flash 8B	Low latency, high throughput	Less nuanced, shorter context
Semantic search and embeddings	OpenAI Embedding 3, Nomic AI, Hugging Face	Vector search, similarity, retrieval	Not for text generation
Custom model training & experimentation	Llama 4 Scout, Llama 3.3, DeepSeek, Mistral	Open source, customizable	Requires setup, variable performance

Hypermode provides access to the most popular open source and commercial models through Hypermode Model Router documentation. We’re constantly evaluating model usage and adding new models to our catalog based on demand.

Get started

You can change models at any time in your agent settings. Start with a general-purpose model, then iterate and optimize as you learn more about your agent’s needs.

Create an agent with GPT-4.1 (default).
Define clear instructions and connections for the agent’s role.
Test with real examples from your workflow.
Refine and iterate based on results.
Evaluate alternatives once you understand patterns and outcomes.

Value first, optimize second. Clarify the task requirements before tuning for specialized capabilities or cost.

Comparison of select large language models

Model	Best For	Considerations	Context Window+	Speed	Cost++
Claude 4 Opus	Complex reasoning, long docs	Higher cost, slower than lighter models	Very long (200K+)	Moderate	$$$$
Claude 4 Sonnet	General-purpose, balanced workloads	Less capable than Opus for edge cases	Long (100K+)	Fast	$$$
GPT-4.1	Most tasks, nuanced output	Higher cost, moderate speed	Long (128K)	Moderate	$$$
GPT-4.1 Mini	High-volume, cost-sensitive	Less nuanced, shorter context	Medium (32K-64K)	Very Fast	$$
GPT o3	General chat, broad compatibility	May lack latest features/capabilities	Medium (32K-64K)	Fast	$$
Gemini 2.5 Pro	Up-to-date info	Limited access, higher cost	Long (128K+)	Moderate	$$$
Gemini 2.5 Flash	Real-time, rapid responses	Shorter context, less nuanced	Medium (32K-64K)	Very Fast	$$
Llama 4 Scout	Privacy, customization, open source	Variable performance	Medium-Long (varies)	Fast	$

^{+ Context window sizes are approximate and may vary by deployment/version.} ^{++ Relative cost per 1K tokens ($ = lowest, $$$$ = highest)}

Logging

By default, all model invocations are logged for future display in the console. If you’d like to opt out of model logging, please contact us.

Hypermode

Agents

Apps

Graphs

Tools

Resources

Setup

Generate API key

Connect via framework

Connect directly via API

Available models

Model Introspection

Generation

Embedding

Choosing the right model

Key considerations

Models by task / use case at a glance

Get started

Comparison of select large language models

Logging

Hypermode

Agents

Apps

Graphs

Tools

Resources

​Setup

​Generate API key

​Connect via framework

​Connect directly via API

​Available models

​Model Introspection

​Generation

​Embedding

​Choosing the right model

​Key considerations

​Models by task / use case at a glance

​Get started

​Comparison of select large language models

​Logging

Setup

Generate API key

Connect via framework

Connect directly via API

Available models

Model Introspection

Generation

Embedding

Choosing the right model

Key considerations

Models by task / use case at a glance

Get started

Comparison of select large language models

Logging