Tensormesh Emerges From Stealth to Slash AI Inference Costs and Latency by up to 10x

SAN FRANCISCO — October 23, 2025 — Tensormesh, the company pioneering caching-accelerated inference optimization for enterprise AI, today emerged from stealth with $4.5 million in seed funding led by Laude Ventures. Tensormesh’s technology eliminates redundant computation in AI inference, reducing latency and GPU spend by up to 10x while giving enterprises full control of their data and infrastructure.

Founded by faculty and PhD researchers from the University of Chicago, UC Berkeley, and Carnegie Mellon, Tensormesh builds on years of academic research in distributed systems and AI infrastructure. The company is led by Junchen Jiang, University of Chicago faculty member and co-creator of LMCache, the leading open-source KV caching project with 5K+ GitHub stars and 100+ contributors. LMCache is integrated with frameworks such as vLLM and NVIDIA Dynamo, and has been used across the ecosystem by organizations including Bloomberg, Red Hat, Redis, Tencent, GMI Cloud, and WEKA.

Tensormesh is the first commercial platform to productize caching for large-scale AI inference, pairing LMCache-inspired techniques with enterprise-grade usability, security, and manageability.

“Enterprises today must either send their most sensitive data to third parties or hire entire engineering teams to rebuild infrastructure from scratch,” said Junchen Jiang, Founder and CEO of Tensormesh. “Tensormesh offers a third path: run AI wherever you want, with state-of-the-art optimizations, cost savings, and performance built in.”
“Enterprises everywhere are wrestling with the huge costs of AI inference,” said Ion Stoica, advisor to Tensormesh and Co-Founder and Executive Chairman of Databricks. “Tensormesh’s approach delivers a fundamental breakthrough in efficiency and is poised to become essential infrastructure for any company betting on AI.”

Sharing KV-cache across nodes in a cluster is a key driver of throughput and cost savings. Tensormesh supports storage backends to enable distributed cache sharing for low-latency, high-throughput deployments.

“We have closely collaborated with Tensormesh to deliver an impressive solution for distributed LLM KVCache sharing across multiple servers. Redis combined with Tensormesh delivers a scalable solution for low-latency, high-throughput LLM deployments. The benchmarks we ran together demonstrated remarkable improvements in both performance and efficiency and we’re excited to see the Tensormesh product, which we believe will set a new bar for LLM hosting performance." said Rowan Trollope, CEO of Redis.

Organizations face immense pressure to balance performance, cost, and control in their AI deployments. Tensormesh lets organizations run AI inference on the infrastructure of their choice while maintaining strong security and low cost. It is cloud-agnostic and available as SaaS or as standalone software, so teams can start small and scale across any public cloud or in-house environment.

“Our partnership with Tensormesh and integration with LMCache played a critical role in helping WEKA open-source aspects of our breakthrough Augmented Memory Grid solution, enabling the broader AI community to tackle some of the toughest challenges in inference today,” said Callan Fox, Lead Product Manager at WEKA.

As inference workloads surge and enterprises search for sustainable ways to scale, the demand for new efficiency layers in the AI stack is growing quickly. Tensormesh is positioned to meet this moment through its deep research roots and wide open-source adoption, bringing caching into the enterprise mainstream and setting the stage for a stronger foundation in AI infrastructure.

“Caching is one of the most underutilized levers in AI infrastructure, and this team has found a smart, practical way to apply it at scale,” said Pete Sonsini, Co-Founder and General Partner at Laude Ventures. “This is the moment to define a critical layer in the AI stack, and Tensormesh is well positioned to own it.”

The Tensormesh beta is available now. Sign up at tensormesh.ai.

About Tensormesh

Tensormesh is the AI infrastructure optimization company enabling up to 10x faster inference while keeping full control of data and deployment. Founded by faculty and researchers from the University of Chicago, UC Berkeley, and Carnegie Mellon, Tensormesh commercializes state-of-the-art research to eliminate GPU waste and latency. The software captures and reuses intermediate data other systems discard, delivering breakthrough performance on infrastructure customers own and control. Learn more at www.tensormesh.ai.

Media Contact
Sam Polstein
tensormesh@deeptech.agency

Recent Blog Posts

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.Lorem ipsum dolor sit amet.

Name

Position
April 22, 2026

Enterprise AI Vendor Lock-In: What It Costs When Your Provider Pulls Access

Read article

April 15, 2026

Introducing Tensormesh Beta 2.2: Serverless Inference & $0 Cached Input Tokens

Read article

April 8, 2026

How We Optimized Redis for LLM KV Cache: 0.3 GB/s to 10 GB/s

Read article

February 25, 2026

Introducing Tensormesh Beta 2: One-Click LLM Deployment, New UI & Real-Time Cost Savings

Read article

February 18, 2026

Agent Skills Caching with CacheBlend: Achieving 85% Cache Hit Rates for LLM Agents

Read article

February 11, 2026

Beyond Prefix Caching: How Non-Prefix Caching Achieves 25x Better Hit Rates for AI Agents

Read article

February 4, 2026

The Open Source Revolution: Why Open-Weight AI Models Are Redefining the Future

Read article

January 28, 2026

LMCache's Production-Ready P2P Architecture: Powers Tensormesh's 5-10x Cost Reduction

Read article

January 21, 2026

The Document Reprocessing Problem: How LLMs Waste 93% of Your GPU Budget

Read article

January 15, 2026

Building Tensormesh: A conversation with the CEO (Junchen Jiang)

Read article

January 7, 2026

The Hidden Metric That's Destroying Your AI Agent's Performance & Budget

Read article

December 17, 2025

LMCache ROI Calculator: When KV Cache Storage Reduces AI Inference Costs

Read article

December 10, 2025

AI Inference Costs in 2025: The $255B Market's Energy Crisis and Path to Sustainable Scaling

Read article

December 3, 2025

New Hugging Face Integration: Access 300,000+ AI Models with Real-Time Performance Monitoring

Read article

November 26, 2025

The AI Inference Throughput Challenge: Scaling LLM Applications Efficiently

Read article

November 19, 2025

Solving AI Inference Latency: How Slow Response Times Cost You Millions in Revenue

Read article

November 13, 2025

GPU Cost Crisis: How Model Memory Caching Cuts AI Inference Costs Up to 10×

Read article

October 21, 2025

Comparing LLM Serving Stacks: Introduction to Tensormesh Benchmark

Read article