ai infra · dispatch 005
Switchmaxxer: An Architect's Case for Local-First AI Gateways
Switchmaxxer is now in pre-release public beta. It is a local-first LLM gateway that gives operators and agents one observable control point across multiple LLMs and AI providers.
Repo: github.com/adamkessler/switchmaxxer.
Site: switchmaxxer.com.
How it started
In March 2026, I found myself with some extra time on my hands.
Like a lot of people who were paying attention to the AI news, I spent the time going in deep on local-first agentic frameworks. OpenClaw, Hermes Agents, AutoGen on the agentic framework axis. Paperclip AI and CrewAI on the orchestration axis. I was wiring local-first LLMs and AI tools into AI ecosystems for learning, development, and curiosity. Then the token prices spiked. I remember the day it hit. That sent me running to make comparisons across the landscape: Anthropic, OpenAI, Minimax, and OpenRouter. I found that LLMs and Service Providers have become a commodity token marketplace, which meant I needed to shop around, try them out, and compare my options.
What I learned quickly was that switching providers, which is supposed to be a feature of the modern AI stack, was in practice quite ad hoc and burdensome. Every app needed its own configuration change. Every agent framework had its own way of naming models. Every SDK had opinions about base URLs, headers, and dialect. At one point I was spending eighty percent of my available energy reconfiguring clients every time I changed which provider was serving which workload.
So I did what any architect does when they hit the same problem in five different places. I extracted it.
The first version was a tight LLM gateway: a reverse proxy that spoke OpenAI on one port and Anthropic on another, with a JSON catalog defining named routes. I pointed the client SDKs at the gateway endpoint as if it were the origin. Then I let the gateway decide which upstream actually served the request. That single change gave me back most of the eighty percent.
Next I added a CLI so I could control the gateway from the terminal. Then I realized that if the CLI could read and mutate runtime state, an MCP surface could mirror those exact operations and give agents the same control plane I was using. But letting an agent make changes demands observability. And at any rate, debugging multi-provider routing without traces is impractical. Then it occurred to me that a reverse proxy with full observability is exactly the right place to do benchmarking and route optimization, by cost or by latency or by what I started calling intellectual horsepower.
All this work culminated in Switchmaxxer. It was not designed top-down. I built it from the ground up and gave it good bones. I wanted it to solve the actual problems I encountered as an integrator building out AI ecosystems where many agents and many providers and many models are supposed to collaborate without anyone hard-coding the relationships in advance.
The architectural argument
The apps in the local-first agentic AI space are what the apps in the early enterprise software era looked like before the integration layer matured. Everyone is writing their own glue. All the APIs and configurations are bespoke.
The lesson the enterprise software industry eventually learned is that the integration layer needs to become a control plane. Routing, policy, observability, and audit belong behind one stable interface that survives vendor change. The companies that internalized this got the next decade right. The ones that didn't were refactored away.
The AI ecosystem is at the same inflection point. Models will keep changing. Providers will keep changing. Pricing and latency profiles will keep changing. The applications you build today should not require rewriting every time the landscape shifts. That requires a control plane. The question is what shape it should take.
Switchmaxxer makes two specific bets.
The first is local-first. Not because local is intrinsically superior, but because there are some very strong use cases for it. AI integrators, developers, field operators, customer-facing execs all need the benefits that a local trusted boundary provides: privacy, control, and security.
The second is operator/agent symmetry. This is the bet I think matters most. Traditional infrastructure was built for human operators with scheduled jobs as background helpers. AI infrastructure has a genuinely new constraint: agents are first-class operators. They need to read state, mutate routes, run benchmarks, apply optimizations, and roll back changes through interfaces that are as legitimate, scoped, and auditable as anything a human does. In Switchmaxxer, the CLI is the operator's hand and MCP is the agent's glove. Both reach the same runtime through aligned contracts. Same state, same mutations, same audit ledger.
I believe this symmetry is the architectural primitive the next decade of AI infrastructure gets built on.
What I'm asking for
The current build runs OpenAI and Anthropic listeners on a single port, with an observability store, audit ledger, persisted benchmarking, and persisted route optimization with apply and restore.
If you build with multiple LLM providers, run agents that should manage their own routing, or operate AI workflows that need a trusted local control point, I would like your feedback. The integration layer is going to matter more than most people currently believe, and it would be best to build in the open with serious operators contributing their knowledge.