Pricing | Pay Less for Cached AI Inference

Product

Tensormesh Inference

Build faster agents, copilots, and RAG apps with Serverless Inference or Reserved GPUs. Tensormesh caching reuses repeated context across requests, helping your applications respond faster while reducing inference costs.

Start Now

Estimate my cost

Serverless Inference

Run models through a simple API with no servers to manage. Pay for input and output tokens, with cached tokens at $0.

Qwen

Qwen3 Coder 30B A3B Instruct

Input

$0.15

Cached tokens

$0.00

Output

$0.60

262K context

Try the Model

Qwen

Qwen3 30B

Input

$0.15

Cached tokens

$0.00

Output

$0.60

131K context

Try the Model

Qwen

Qwen3 235B

Input

$0.22

Cached tokens

$0.00

Output

$0.88

131K context

Try the Model

OpenAI

OpenAI gpt-oss-20b

Input

$0.07

Cached tokens

$0.00

Output

$0.28

131K context

Try the Model

OpenAI

OpenAI gpt-oss-120b

Input

$0.15

Cached tokens

$0.00

Output

$0.60

131K context

Try the Model

Qwen

QWEN3.6-27B-FP8

Input

$0.32

Output

$3.20

262K context

Try the Model

Qwen

QWEN3 Coder 480B A35B Instruct FP8

Input

$0.45

Cached tokens

$0.00

Output

$1.80

262K context

Try the Model

Qwen

QWEN3.5-397B-A17B-FP8

Input

$0.60

Output

$3.60

262K context

Try the Model

OpenAI

GPT-OSS-20B

Input

$0.07

Cached tokens

$0.00

Output

$0.28

131K context

Try the Model

OpenAI

GPT-OSS-120B

Input

$0.15

Cached tokens

$0.00

Output

$0.60

131K context

Try the Model

DeepSeek

DeepSeek V4 Flash

Input

$0.14

Output

$0.28

1M context

Try the Model

Google

Gemma4-31B-it

Input

$0.14

Cached tokens

$0.00

Output

$0.56

256K context

Try the Model

Moonshot AI

KIMI K2.6

Input

$0.96

Cached tokens

$0.00

Output

$4

256K context

Try the Model

Z.ai

GLM-5.1-NVFP4

Input

$1.40

Output

$4.40

128K context

Try the Model

Mistral

Devstral-2 123B Instruct

Input

$0.50

Cached tokens

$0.00

Output

$2.00

256K context

Try the Model

MiniMax

MiniMax-M2.5

Input

$0.30

Cached tokens

$0.00

Output

$1.20

196K context

Try the Model

Don’t see the model you need?

Request a new model

Reserved GPUs

Reserve dedicated GPU capacity for production AI workloads that need predictable performance, scale, and control. Tensormesh caching is included to help repeated context run faster and cost less.

Nebius

GPU

H200

Hourly

$2.50 /hr

Monthly*

~$1,825

Auto-scale

Yes (beta)

*Monthly assumes 730 hrs of continuous use, single replica.

Yotta

GPU

H200

Hourly

$2.50 /hr

Monthly*

~$1,825

Auto-scale

Yes (beta)

*Monthly assumes 730 hrs of continuous use, single replica.

Reserve now

Calculator

See what you'll actually pay.

Estimate your monthly cost from GPU usage, token volume, and cached context.

Serverless cost estimator

Estimate token-based API costs with cached tokens priced at $0.

Estimated Monthly Cost

$0.00

—

Savings vs. another provider

Compare your current inference spend against Tensormesh pricing and cached-token savings.

Estimated Monthly Savings

$0.00

—

Make repeated context work for you

Test your workload, measure the savings, and see how much cached-token pricing can reduce your bill.

Talk to an engineer

Have questions about our billing formula?

Read the Docs

Pricing that rewards context reuse

Tensormesh Inference

Serverless Inference

Qwen3 Coder 30B A3B Instruct

Qwen3 30B

Qwen3 235B

OpenAI gpt-oss-20b

OpenAI gpt-oss-120b

QWEN3.6-27B-FP8

QWEN3 Coder 480B A35B Instruct FP8

QWEN3.5-397B-A17B-FP8

GPT-OSS-20B

GPT-OSS-120B

DeepSeek V4 Flash

Gemma4-31B-it

KIMI K2.6

GLM-5.1-NVFP4

Devstral-2 123B Instruct

MiniMax-M2.5

Reserved GPUs

Nebius

Yotta

See what you'll actually pay.

Serverless cost estimator

$0.00

Savings vs. another provider

$0.00

Make repeated context work for you