Skip to main content
With Model Router, you can access the most popular models with a single endpoint and bill. Experiment with new models and scale your app without worrying about the underlying infrastructure. Select your model

Setup

Getting started with Model Router is simple. Generate an API key and drop it into your favorite framework.

Generate API key

API keys for Model Router are generated within your workspace. Generate a key by logging into the console and navigating to Model routerAPI keys.

Connect via framework

Model Router integrates easily into the most popular frameworks.
  • OpenAI SDK
  • Vercel AI SDK
  • Modus
Model Router is a drop-in replacement for OpenAI’s API.
import openai

# Configure with your Hypermode Workspace API key and Hypermode Model Router base url
client = openai.OpenAI(
    api_key="<YOUR_HYP_WKS_KEY>",
    base_url="https://models.hypermode.host/v1",
)

# Set up the request
response = client.chat.completions.create(
    model="meta-llama/llama-4-scout-17b-16e-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Modus?"},
    ],
    max_tokens=150,
    temperature=0.7,
)

# Print the response
print(response.choices[0].message.content)

Connect directly via API

You can also access the API directly.
  • Generation
  • Embedding
curl -X POST \
  https://models.hypermode.host/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $YOUR_HYP_WKS_KEY" \
  -d '{
    "model": "meta-llama/llama-4-scout-17b-16e-instruct",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Dgraph?"}
    ],
    "max_tokens": 150,
    "temperature": 0.7
  }'

Available models

Hypermode provides a variety of the most popular open source and commercial models.
We’re constantly evaluating model usage in determining new models to add to our catalog. Interested in using a model not listed here? Let us know at help@hypermode.com.

Model Introspection

The full list of available models is available via the API.
curl
curl -X POST \
  https://models.hypermode.host/v1/models \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR_HYP_WKS_KEY>"
The most popular models are included below for your convenience.

Generation

Large language models provide text generation and reasoning capabilities.
ProviderModelSlug
AnthropicClaude 4 Sonnetclaude-sonnet-4-20250514
AnthropicClaude 4 Opusclaude-opus-4-20250514
AnthropicClaude 3.7 Sonnet (latest)claude-3-7-sonnet-latest
AnthropicClaude 3.7 Sonnetclaude-3-7-sonnet-20250219
AnthropicClaude 3.5 Sonnet (latest)claude-3-5-sonnet-latest
AnthropicClaude 3.5 Sonnetclaude-3-5-sonnet-20241022
AnthropicClaude 3.5 Sonnetclaude-3-5-sonnet-20240620
AnthropicClaude 3.5 Haiku (latest)claude-3-5-haiku-latest
AnthropicClaude 3.5 Haikuclaude-3-5-haiku-20241022
AnthropicClaude 3 Opus (latest)claude-3-opus-latest
AnthropicClaude 3 Opusclaude-3-opus-20240229
AnthropicClaude 3 Sonnetclaude-3-sonnet-20240229
AnthropicClaude 3 Haikuclaude-3-haiku-20240307
DeepSeekDeepSeek-R1-Distill-Llamadeepseek-ai/deepseek-r1-distill-llama-70b
GoogleGemini 2.5 Progemini-2.5-pro-exp-03-25
GoogleGemini 2.5 Pro Previewgemini-2.5-pro-preview-05-06
GoogleGemini 2.5 Flash Previewgemini-2.5-flash-preview-04-17
GoogleGemini 2.0 Flash Litegemini-2.0-flash-lite
GoogleGemini 2.0 Flash Image Generationgemini-2.0-flash-exp-image-generation
GoogleGemini 2.0 Flash Livegemini-2.0-flash-live-001
GoogleGemini 2.0 Flash (latest)gemini-2.0-flash
GoogleGemini 2.0 Flashgemini-2.0-flash-001
GoogleGemini 1.5 Pro (latest)gemini-1.5-pro-latest
GoogleGemini 1.5 Progemini-1.5-pro
GoogleGemini 1.5 Progemini-1.5-pro-002
GoogleGemini 1.5 Progemini-1.5-pro-001
GoogleGemini 1.5 Flash (latest)gemini-1.5-flash-latest
GoogleGemini 1.5 Flashgemini-1.5-flash
GoogleGemini 1.5 Flashgemini-1.5-flash-002
GoogleGemini 1.5 Flashgemini-1.5-flash-001
GoogleGemini 1.5 Flash 8B (latest)gemini-1.5-flash-8b-latest
GoogleGemini 1.5 Flash 8Bgemini-1.5-flash-8b
GoogleGemini 1.5 Flash 8Bgemini-1.5-flash-8b-exp-0924
GoogleGemini 1.5 Flash 8Bgemini-1.5-flash-8b-exp-0827
GoogleGemini 1.5 Flash 8Bgemini-1.5-flash-8b-001
MetaLlama 4 Scoutmeta-llama/llama-4-scout-17b-16e-instruct
MetaLlama 3.3meta-llama/llama-3.3-70b-versatile
OpenAIO3 (latest)o3
OpenAIO3o3-2025-04-16
OpenAIO4 Mini (latest)o4-mini
OpenAIO4 Minio4-mini-2025-04-16
OpenAIGPT 4.5 Preview (latest)gpt-4.5-preview
OpenAIGPT 4.5 Previewgpt-4.5-preview-2025-02-27
OpenAIO3 Mini (latest)o3-mini
OpenAIO3 Minio3-mini-2025-01-31
OpenAIO1 (latest)o1
OpenAIO1o1-2024-12-17
OpenAIO1 Preview (latest)o1-preview
OpenAIO1 Previewo1-preview-2024-09-12
OpenAIO1 Mini (latest)o1-mini
OpenAIO1 Minio1-mini-2024-09-12
OpenAIGPT 4.1 (latest)gpt-4.1
OpenAIGPT 4.1gpt-4.1-2025-04-14
OpenAIGPT 4.1 Mini (latest)gpt-4.1-mini
OpenAIGPT 4.1 Minigpt-4.1-mini-2025-04-14
OpenAIGPT 4.1 Nano (latest)gpt-4.1-nano
OpenAIGPT 4.1 Nanogpt-4.1-nano-2025-04-14
OpenAIGPT-4o Mini Search Preview (latest)gpt-4o-mini-search-preview
OpenAIGPT-4o Mini Search Previewgpt-4o-mini-search-preview-2025-03-11
OpenAIGPT 4o (latest)gpt-4o
OpenAIGPT 4ogpt-4o-2024-11-20
OpenAIGPT 4ogpt-4o-2024-08-06
OpenAIGPT 4ogpt-4o-2024-05-13
OpenAIGPT 4o Mini (latest)gpt-4o-mini
OpenAIGPT 4o Minigpt-4o-mini-2024-07-18
OpenAIGPT 4o Audio Preview (latest)gpt-4o-audio-preview
OpenAIGPT 4o Audio Previewgpt-4o-audio-preview-2024-12-17
OpenAIGPT 4o Audio Previewgpt-4o-audio-preview-2024-10-01
OpenAIGPT 4o Search Preview (latest)gpt-4o-search-preview
OpenAIGPT 4o Search Previewgpt-4o-search-preview-2025-03-11
OpenAIGPT 4o Search Previewgpt-4o-search-preview-2025-03-11
OpenAIChatGPT 4ochatgpt-4o-latest
OpenAIGPT 4 (latest)gpt-4
OpenAIGPT 4gpt-4-0613
OpenAIGPT 4 Turbogpt-4-turbo-2024-04-09
OpenAIGPT 4 Turbo Previewgpt-4-turbo-preview
OpenAIGPT 4 Preview (latest)gpt-4-1106-preview
OpenAIGPT 4 Previewgpt-4-0125-preview
OpenAIGPT 3.5 Turbo (latest)gpt-3.5-turbo
OpenAIGPT 3.5 Turbogpt-3.5-turbo-1106
OpenAIGPT 3.5 Turbogpt-3.5-turbo-0125
MistralMistral Large (coming soon)mistral-large-latest
MistralPixtral Large (coming soon)pixtral-large-latest
MistralMistral Medium (coming soon)mistral-medium-latest
MistralMistral Moderation (coming soon)mistral-moderation-latest
MistralMinistral 3B (coming soon)ministral-3b-latest
MistralMinistral 8B (coming soon)ministral-8b-latest
MistralOpen Mistral Nemo (coming soon)open-mistral-nemo
MistralMistral Small (coming soon)mistral-small-latest
MistralMistral Saba (coming soon)mistral-saba-latest
MistralCodestral (coming soon)codestral-latest
xAIGrok 3 Beta (coming soon)grok-3-beta
xAIGrok 3 Fast Beta (coming soon)grok-3-fast-beta
xAIGrok 3 Mini Beta (coming soon)grok-3-mini-beta
xAIGrok 3 Mini Fast Beta (coming soon)grok-3-mini-fast-beta

Embedding

Embedding models provide vector representations of text for similarity matching and other applications.
ProviderModelSlug
Nomic AIEmbed Text V1.5nomic-ai/nomic-embed-text-v1.5
OpenAIEmbedding 3 Largetext-embedding-3-large
OpenAIEmbedding 3 Smalltext-embedding-3-small
OpenAIADA Embeddingtext-embedding-ada-002
Hugging FaceMiniLM-L6-v2 (coming soon)sentence-transformers/all-MiniLM-L6-v2

Choosing the right model

Choosing the right model is essential to building effective agents. This section helps you evaluate trade-offs, pick the right model for your use case, and iterate quickly.

Key considerations

  • Accuracy and output quality: Advanced logic, mathematical problem-solving, and multi-step analysis may require high-capability models.
  • Domain expertise: Performance varies by domain (for example, creative writing, code, scientific analysis). Review model benchmarks or test with your own examples.
  • Context window: Long documents, extensive conversations, or large codebases require models with longer context windows.
  • Embeddings: For semantic search or similarity, consider embedding models. These aren’t for text generation.
  • Latency: Real-time apps may need low-latency responses. Smaller models (or “Mini,” “Nano,” and “Flash” variants) typically respond faster than larger models.

Models by task / use case at a glance

Task / use caseExample modelsKey strengthsConsiderations
General-purpose conversationClaude 4 Sonnet, GPT-4.1, Gemini ProBalanced, reliable, creativeMay not handle edge cases as well
Complex reasoning and researchClaude 4 Opus, O3, Gemini 2.5 ProHighest accuracy, multi-step analysisHigher cost, quality critical
Creative writing and contentClaude 4 Opus, GPT-4.1, Gemini 2.5 ProHigh-quality output, creativity, style controlHigh cost for premium content
Document analysis and summarizationClaude 4 Opus, Gemini 2.5 Pro, Llama 3.3Handles long inputs, comprehensionHigher cost, slower
Real-time appsClaude 3.5 Haiku, GPT-4o Mini, Gemini 1.5 Flash 8BLow latency, high throughputLess nuanced, shorter context
Semantic search and embeddingsOpenAI Embedding 3, Nomic AI, Hugging FaceVector search, similarity, retrievalNot for text generation
Custom model training & experimentationLlama 4 Scout, Llama 3.3, DeepSeek, MistralOpen source, customizableRequires setup, variable performance
Hypermode provides access to the most popular open source and commercial models through Hypermode Model Router documentation. We’re constantly evaluating model usage and adding new models to our catalog based on demand.

Get started

You can change models at any time in your agent settings. Start with a general-purpose model, then iterate and optimize as you learn more about your agent’s needs.
  1. Create an agent with GPT-4.1 (default).
  2. Define clear instructions and connections for the agent’s role.
  3. Test with real examples from your workflow.
  4. Refine and iterate based on results.
  5. Evaluate alternatives once you understand patterns and outcomes.
Value first, optimize second. Clarify the task requirements before tuning for specialized capabilities or cost.

Comparison of select large language models

ModelBest ForConsiderationsContext Window+SpeedCost++
Claude 4 OpusComplex reasoning, long docsHigher cost, slower than lighter modelsVery long (200K+)Moderate$$$$
Claude 4 SonnetGeneral-purpose, balanced workloadsLess capable than Opus for edge casesLong (100K+)Fast$$$
GPT-4.1Most tasks, nuanced outputHigher cost, moderate speedLong (128K)Moderate$$$
GPT-4.1 MiniHigh-volume, cost-sensitiveLess nuanced, shorter contextMedium (32K-64K)Very Fast$$
GPT o3General chat, broad compatibilityMay lack latest features/capabilitiesMedium (32K-64K)Fast$$
Gemini 2.5 ProUp-to-date infoLimited access, higher costLong (128K+)Moderate$$$
Gemini 2.5 FlashReal-time, rapid responsesShorter context, less nuancedMedium (32K-64K)Very Fast$$
Llama 4 ScoutPrivacy, customization, open sourceVariable performanceMedium-Long (varies)Fast$
+ Context window sizes are approximate and may vary by deployment/version. ++ Relative cost per 1K tokens ($ = lowest, $$$$ = highest)

Logging

By default, all model invocations are logged for future display in the console. If you’d like to opt out of model logging, please contact us.