Pricing

Pricing that rewards context reuse

Tensormesh does not charge for cached tokens. The more your workload reuses context, the more your cost per request can drop.

Product

Tensormesh Inference

Build faster agents, copilots, and RAG apps with Serverless Inference or Reserved GPUs. Tensormesh caching reuses repeated context across requests, helping your applications respond faster while reducing inference costs.

Serverless Inference

Run models through a simple API with no servers to manage. Pay for input and output tokens, with cached tokens at $0.

Z.ai
DeepSeek
Google
Moonshot AI
Qwen
OpenAI
Mistral
MiniMax
Prices are per 1M tokens · USD
Qwen

Qwen3 Coder 30B A3B Instruct

Input
$0.15
Cached tokens
$0.00
Output
$0.60
262K context

Try the Model

Qwen

Qwen3 30B

Input
$0.15
Cached tokens
$0.00
Output
$0.60
131K context

Try the Model

Qwen

Qwen3 235B

Input
$0.22
Cached tokens
$0.00
Output
$0.88
131K context

Try the Model

OpenAI

OpenAI gpt-oss-20b

Input
$0.07
Cached tokens
$0.00
Output
$0.28
131K context

Try the Model

OpenAI

OpenAI gpt-oss-120b

Input
$0.15
Cached tokens
$0.00
Output
$0.60
131K context

Try the Model

Qwen

QWEN3.6-27B-FP8

Input
$0.32
Output
$3.20
262K context

Try the Model

Qwen

QWEN3 Coder 480B A35B Instruct FP8

Input
$0.45
Cached tokens
$0.00
Output
$1.80
262K context

Try the Model

Qwen

QWEN3.5-397B-A17B-FP8

Input
$0.60
Output
$3.60
262K context

Try the Model

OpenAI

GPT-OSS-20B

Input
$0.07
Cached tokens
$0.00
Output
$0.28
131K context

Try the Model

OpenAI

GPT-OSS-120B

Input
$0.15
Cached tokens
$0.00
Output
$0.60
131K context

Try the Model

DeepSeek

DeepSeek V4 Flash

Input
$0.14
Output
$0.28
1M context

Try the Model

Google

Gemma4-31B-it

Input
$0.14
Cached tokens
$0.00
Output
$0.56
256K context

Try the Model

Moonshot AI

KIMI K2.6

Input
$0.96
Cached tokens
$0.00
Output
$4
256K context

Try the Model

Z.ai

GLM-5.1-NVFP4

Input
$1.40
Output
$4.40
128K context

Try the Model

Mistral

Devstral-2 123B Instruct

Input
$0.50
Cached tokens
$0.00
Output
$2.00
256K context

Try the Model

MiniMax

MiniMax-M2.5

Input
$0.30
Cached tokens
$0.00
Output
$1.20
196K context

Try the Model

Don’t see the model you need?

Request a new model

Reserved GPUs

Reserve dedicated GPU capacity for production AI workloads that need predictable performance, scale, and control. Tensormesh caching is included to help repeated context run faster and cost less.

Nebius

GPU
H200
Hourly
$2.50 /hr
Monthly*
~$1,825
Auto-scale
Yes (beta)
*Monthly assumes 730 hrs of continuous use, single replica.

Yotta

GPU
H200
Hourly
$2.50 /hr
Monthly*
~$1,825
Auto-scale
Yes (beta)
*Monthly assumes 730 hrs of continuous use, single replica.

Reserve now

Calculator

See what you'll actually pay.

Estimate your monthly cost from GPU usage, token volume, and cached context.

Serverless cost estimator

Estimate token-based API costs with cached tokens priced at $0.

Estimated Monthly Cost

$0.00

Savings vs. another provider

Compare your current inference spend against Tensormesh pricing and cached-token savings.

Estimated Monthly Savings

$0.00

Special Offer

Prove caching saves on your workload.

Start with $100 in free credits and see how much faster and cheaper repeated context runs with Tensormesh.

Claim $100 Credits

Have questions about our billing formula?

Read the Docs