TENSORMESH SECURES $4M TO ACCELERATE AI INFERENCE - JOIN THE BETA WAITLIST
Frequently asked questions
What is the roadmap for the product?
chevron down icon
Here is the list of feature that we intend to deliver progressively until our v1:
  • REST API + CLI parity with UI
  • SOC 2 readiness and audit logging
  • Savings-based billing engine
  • Cluster-based cache storage & sharing
  • SSO / RBAC integration + usage quotas & alerts
  • User-customized model serving + auto-config for new models
  • Model catalog & pre-configured templates
  • Enhanced monitoring and cost reporting dashboard
  • Enterprise install packages (K8s + Helm support)
  • Private cloud & on-prem deployment options
  • Automated deployment validator & health check tool
  • Unified control plane for SaaS + On-Prem hybrid management
How is pricing working?
chevron down icon
During the beta phase, we will not be charging you anything but the cost of the GPU you will be renting through us. Our intention is, once our beta has ended, to charge our customer for our value add. Namely, we will be charging based on the savings our customers are realizing through our caching technique. This model ensures that while you enjoy more GPU power for the same amount spent, we also have the incentive to further enhance our techniques. During the beta, you will be shown what those savings are but you will not be charged this.

The formula that we will use for pricing once the product is in v1 is as follow:

GPU = Number of GPU hours consumed
GPH = Price per GPU/hour for the chosen provider
EST = Estimated Saving based on cache hit rate reported

Pricing: (GPU * GPH) + ( GPU * GPH * EST * 0.3 )
Baseline: (GPU * GPH) + ( GPU * GPH * EST)

Where Baseline is the estimated cost if you serve the same amount of workload by yourself when renting the GPU servers directly from the cloudExample:

GPU = 100h
GPH = $2
EST = 60%
Customers pays: 200 + ( 200 * 0.6 * 0.3 )  = $233
Baseline :    200 + ( 200 * 0.6 ) = $320
How do you estimate the cost savings?
chevron down icon
We estimate the cost saving based on the cache hit rate. Every time the cache is hit, this is counted as GPU time saved.
What does tensormesh consider a cache hit?
chevron down icon
We consider the cache as being hit when cache stored outside of the GPU VRAM is being pulled back into the GPU.
How long before I can access the beta?
chevron down icon
We’re gradually onboarding new users in batches to ensure the smoothest possible experience as we scale. You’ll receive an email as soon as it’s your turn to join.
What if my model is not offered, can I still use it?
chevron down icon
  • If the model is available on Huggingface, you can use it. The caveat is it may have longer bootstrap time because the model needs to be downloaded from Huggingface. If the model weights are “private”, e.g., post-trained and stored on your own S3 bucket, we will support it later during the beta phase.
  • If the model weights are “private”, e.g., post-trained and stored on your own S3 bucket, we will support it later during the beta phase.
What if I want to use another GPU provider?
chevron down icon
Please let us know which GPU provider you would like to see added on discord on using the feedback tool.
Can I use Ternsormesh on my own hardware?
chevron down icon
We plan to offer an on prem version of Tensormesh shortly after our v1 is released.
Can I share the cache between multiple servers in a cluster?
chevron down icon
The first beta version does not do this but it is on our roadmap to deliver this before our v1.
Are incoming queries routed to the right node based on cache affinity?
chevron down icon
Yes, Tensormesh includes a cache aware routing.
Can I use a different inference server?
chevron down icon
No, at this time Tensormesh is a full inference stack experience.  You still can use LMCache with the supported list of inference servers, but that is outside of our product offering.
Can I perform the same action I can do on the Web UI through an API or a CLI?
chevron down icon
This is functionality that will be added to the beta before we go to final.
For security reasons, I need to be hosting my own inference stack, is Tensormesh available for an on-prem deployment?
chevron down icon
Tensormesh v1 will be available on prem. Feel free to contact us for more details.
If you have any further questions or just want to reach our team, click the button below.
Contact us
Contact us