
If you want to run open-weight LLMs in Claude Code with Qwen3-Coder, MiniMax, DeepSeek, or Kimi, without a GPU, without vLLM, and without any serving infrastructure, you are three environment variables away from doing it right now.
There is a small, satisfying fact buried in how Claude Code talks to its backend. The CLI never hard-codes where requests go. It reads the target, the credential, and the model name out of your environment, which means pointing it at a custom LLM provider comes down to editing a shell profile. No config file, no proxy, no fork. That is why running open-weight models through Claude Code against Tensormesh's serverless LLM inference API takes about as long as opening ~/.zshrc.
Tensormesh implements the full Anthropic Messages API on the server side, so the CLI cannot tell the difference between our open-weight model API endpoint and the one it shipped with. Set three variables, name a model, and the AI agent loop you already rely on keeps running on the model you actually chose, with no GPU to provision and no weights to warm. If your team uses Codex CLI instead of Claude Code, the same approach works there too. See our Codex CLI guide.
It helps to be precise about how little actually moves. The harness stays exactly where it is, so the read, edit, run, iterate loop behaves the same, the permission prompts behave the same, and the tool calls behave the same. Your credentials shift from Anthropic's keychain-and-OAuth path to a single environment variable, which is a deliberate simplification rather than a downgrade. The model is the one genuinely new variable, and it becomes something you choose per task instead of something the CLI decides on your behalf.
For an agent or application developer, that separation is the entire appeal. The harness and the model have always been two decisions wearing one interface, and most ways of running an open model force you to give up a good harness to get model freedom. Here the trade simply does not exist, because nothing about the Claude Code experience changes except the address its requests go to and the slug it sends along with them. There is no GPU to provision, no weights to warm, and no serving layer of your own to keep alive.

You will need an account and an API key generated under Profile โ API keys, the CLI installed through npm, and macOS or Linux running bash or zsh. Here is the complete setup, top to bottom, ready to paste into a fresh terminal once you have dropped in your key:
โ


The installation drops a claude command into your global node_modules/bin, and claude --version confirms it landed; all of this is validated against version 2.1.145, with chat and full tool use working from end to end.
Each of the three variables earns its place. ANTHROPIC_BASE_URL redirects the Anthropic SDK away from api.anthropic.com and toward Tensormesh, and it wants the host on its own with no /v1 suffix, because the SDK appends the version path itself and a manual one gets doubled into a route that does not exist (a bare trailing slash is harmless). ANTHROPIC_API_KEY rides along as the x-api-key header on every request, which is the SDK's native convention, so there is no claude login step and no OAuth dance to complete. ANTHROPIC_MODEL matters more than its quiet position suggests, because without it Claude Code reaches for one of its own Claude-family models that Tensormesh does not serve; naming a Tensormesh slug here is what keeps the default session pointed at something real. The --bare flag on that last line is doing real work too, enough that it deserves the section below.

On the surface --bare looks like an afterthought, yet it is the difference between the interactive TUI authenticating and the interactive TUI insisting that you are not logged in. When a custom ANTHROPIC_BASE_URL is present, Claude Code's interactive mode still tries OAuth and the keychain before it considers your environment variable, so it will print "Not logged in" with your key sitting right there unused. The flag tells the CLI to authenticate strictly through ANTHROPIC_API_KEY, which is the path Tensormesh expects, and that single change is what makes the session connect.
Because retyping the flag for every session gets tiresome, the friendly move is to make it the default once:

There is a second reason to keep --bare close at hand. Beyond fixing auth, it strips hooks, plugins, auto-memory, keychain reads, and CLAUDE.md auto-discovery, which leaves a CLI with one explicit credential path and very few surprises. That is precisely the behavior you want from a step in a pipeline, so --bare paired with -p is the mode to reach for in scripts and CI:

For one-shot prompts the flag is technically optional, since the SDK already honors ANTHROPIC_API_KEY in non-interactive mode, though using it everywhere keeps behavior identical between your terminal and your automation.
Claude Code takes its model from --model when you pass one and from ANTHROPIC_MODEL otherwise, which gives you a standing default plus an override that changes nothing else. A per-invocation flag is the quickest way to try something for a single run:

If you rotate between models regularly, a short set of aliases removes the typing entirely:

Every model below runs the full Claude Code flow, meaning chat and tool use together, so the decision is about fit rather than raw capability.

On fit, MiniMax-M2.5 is the pick that rarely lets you down, because it carries the full agentic loop of reading, editing, running a shell command, and iterating without drifting across long multi-turn sessions. The coding-tuned Qwen/Qwen3-Coder-30B-A3B-Instruct and Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 shine on focused generation and refactoring, though they can wobble on composite prompts that chain several tools together, such as creating a file, running it, then summarizing the result; when that happens, a simpler one-step prompt or a fall back to MiniMax usually settles things. For an input that will not fit anywhere else, deepseek-ai/DeepSeek-V4-Flash and its million-token window are where it belongs.
One operational note about the table itself: GET /v1/models lists only what is currently warm, so a model can drop off for a short while when it goes cold, and the first request brings it back within roughly 30 to 60 seconds. Outside Claude Code, every one of these models also answers on /v1/messages and /v1/chat/completions for your Python SDK, curl, and application code.

The permission model deserves a deliberate choice rather than a default you inherit by accident. Left alone, the CLI asks before each file write, shell command, and tool call, which is the posture you want for code you have not vetted. The --dangerously-skip-permissions flag removes every one of those checks at once, and Anthropic's own guidance is to use it only inside a sandbox with no internet access, meaning an isolated container or VM where an agent that misbehaves cannot reach anything that matters. Treat the convenience and the blast radius as a single decision.
Nearly every failure in this setup is a local-configuration problem wearing a server-error costume, and the line worth trusting is the most recent one Claude Code prints. A handful of them account for almost all the support email.
The "Not logged in, please run /login" message in the TUI, arriving with your key plainly set, is the OAuth-first behavior from earlier, so the cure is claude --bare or the alias that makes it permanent. An API Error: 500 Internal Server Error, often dressed up in Claude Code's own retry suggestion, is usually an invalid or wrong-environment key, since authentication failures come back as 5xx; confirm the ak-<env>-... prefix matches the environment you mean to hit, then test it with a request that genuinely exercises auth:

A 500 from that call points squarely at the key. The note that there is an issue with the selected model, which may not exist or may be inaccessible, has three reliable causes: a mistyped slug, a model that is cold and therefore hidden from /v1/models, or that /v1 suffix on the base URL doubling into a dead route. Check the slug against the table, confirm that echo $ANTHROPIC_BASE_URL returns https://serverless.tensormesh.ai with nothing after it, and give a cold model 30 to 60 seconds to warm. The same message with a slug that reads like claude-opus-4-7 or claude-sonnet-... is the tell that no model was named at all, so Claude Code fell back to its own family; setting ANTHROPIC_MODEL or passing --model puts it right. A claude that hangs silently for many seconds with no error is almost always a base URL pointing at an unreachable host, which the same echo $ANTHROPIC_BASE_URL check will expose.
The integration is small on purpose. Three environment variables move Claude Code from Anthropic's hosted models to open weights on Tensormesh, the --bare flag keeps auth honest in the TUI and predictable in CI, and a single ANTHROPIC_MODEL line decides which model your agent runs by default. Nothing else about the harness changes, and nothing gets left on disk to clean up afterward. To undo the whole thing, unset the three variables and Claude Code returns to Anthropic's API on its next run.
The full docs go further into per-environment keys, the permission modes, and the complete troubleshooting matrix. If something refuses to cooperate, email saas-support@tensormesh.ai with the meaningful error line and the output of claude --version.