Route, Secure & Manage Any LLM with ngrok AI Gateway [proxy]

One gateway for every AI model.

Route, secure, and manage traffic to any LLM—cloud or local—with one unified platform.

Any LLM, same API

Connect to any LLM or provider, cloud or self-hosted, with the same API.

OpenAI

Anthropic

GMI Cloud

Google Vertex

Fireworks

Open Router

Azure Foundry

z.ai

Groq

Moonshot

Always send requests to the best model

Automatically direct each request to the fastest, most reliable, or most affordable model, no manual intervention required.

Manage spend without lifting a finger

Monitor usage and costs in real-time to avoid expensive models or stay within budget. Keep your costs predictable and under control.

Keep your product online

If a provider is slow or unavailable, instantly route traffic to healthy models so your users never experience downtime.

Faster responses, lower costs

Cache common prompts and responses to improve speed and reduce unnecessary calls.

Stay in the know

See exactly how requests are being routed, so you can feel confident that your system is working as intended.

Protect your users' data

Redact sensitive information and choose which providers can access your data, keeping you in control of privacy.

Stay compliant wherever you operate

Ensure requests and data are only routed to trusted models and approved regions, meeting your privacy and regulatory requirements.

Scale seamlessly, every time

Distribute requests across multiple providers and keys to avoid rate limits and maintain high performance, even as you grow.

Mitigate abuse

Prevent abuse and unexpected spikes with easy-to-set rate limits, so you can scale safely.

How does it work?

Configure your endpoint

Update your SDK

Prompt and send traffic

on_http_request:
- actions:- type: ai-router
    config: {}

import OpenAI from "openai";

const ngrokClient = new OpenAI({baseURL: 'https://your_endpoint.ngrok.dev',
  apiKey: 'YOUR_PROVIDER_API_KEY',
});

const completion = await ngrokClient.chat.completions.create({
  model: 'openai/gpt-4o',
  messages: [
    { role: 'system', content: 'Talk like a pirate.' },{ role: 'user', content: `Are semicolons optional in
      JavaScript?` },
  ],
  stream: true
});

Configure your endpoint

on_http_request:
- actions:- type: ai-router
    config: {}

Update your SDK

import OpenAI from "openai";

const ngrokClient = new OpenAI({baseURL: 'https://your_endpoint.ngrok.dev',
  apiKey: 'YOUR_PROVIDER_API_KEY',
});

Prompt and send traffic

const completion = await ngrokClient.chat.completions.create({
  model: 'openai/gpt-4o',
  messages: [
    { role: 'system', content: 'Talk like a pirate.' },{ role: 'user', content: `Are semicolons optional in
      JavaScript?` },
  ],
  stream: true
});

Build smarter, ship faster.

Get early access, help shape the platform, and never fight AI traffic headaches again.