In April 2026, a verified user posted publicly on X after Anthropic's Safeguards Team revoked access for his entire organisation, more than 60 accounts belonging to a self-described legitimate business, without identifying a single violated policy clause or specific output. The termination email cited automated systems detecting unspecified signals and offered a Google Form as the only option for appeal. The post circulated widely across engineering communities, and the responses it generated reflected something beyond sympathy, with leaders across infrastructure and platform teams noting that the scenario was one they had considered but never formally planned for.

Two other publicly documented cases reflect the same structural dynamic, each involving different providers and different stated rationales but arriving at the same outcome: access terminated at the provider's sole discretion, with no contractual liability for downstream business losses and no guaranteed restoration path available to the customer.

Every major closed-weight provider's publicly available terms of service share the same underlying architecture: the provider determines unilaterally what constitutes a violation, no specific content needs to be identified before enforcement action is taken, no appeals timeline is guaranteed, and no liability applies for the business losses that follow a termination decision. The cost picture is more nuanced and volume-dependent, and we cover it in detail separately, but for teams running at meaningful inference scale the economics of self-hosted open-weight models are increasingly competitive with per-token API pricing.
The contrast with traditional cloud infrastructure is worth making explicit for teams conducting risk assessments. When Amazon Web Services or Google Cloud experience an outage, contractual uptime obligations apply and the provider carries strong commercial incentives to restore service as quickly as possible, whereas when a closed-weight AI provider terminates access, none of those mechanisms exist, leaving your SLA obligations to your own clients fully intact while your infrastructure provider owes you nothing in return.
Andreessen Horowitz's 2025 survey of enterprise CIOs found that 37% now run five or more models in production, up from 29% the previous year, with multi-provider strategies increasingly driven by the desire to reduce exactly this kind of single-vendor concentration risk.
When your inference runs on infrastructure your team controls, the scenarios described above become architecturally impossible. Unlike closed-weight API providers, Tensormesh does not control the model weights your inference runs on, meaning no policy enforcement team can revoke your access to them regardless of whether you are running on dedicated or serverless GPU infrastructure. The provider relationship ends at the point of model weight selection rather than persisting through every inference call, which means no third-party enforcement system has access to your endpoint and no acceptable use policy can reclassify your use case overnight.
The data governance case is equally direct. For teams operating under HIPAA, GDPR, or financial services compliance requirements, running inference on open-weight models removes the boundary crossing entirely:
The performance gap that once made closed-weight dependency feel necessary has narrowed considerably, with leading open-weight models now competitive across reasoning, code generation, and instruction-following for the majority of enterprise use cases. The cost picture is more nuanced and volume-dependent, and we cover it in detail separately, but for teams running at meaningful inference scale the economics are increasingly competitive with per-token API pricing.
The honest tradeoff is that your team takes ownership of uptime, security patching, and scaling without a vendor support line for production incidents. For any organisation where a termination event would cause serious damage across client relationships and SLA commitments, that tradeoff resolves clearly in favour of open-weight infrastructure.

Tensormesh is GPU cloud infrastructure built for enterprise teams who want the control of self-hosted open-weight inference without the overhead of building and maintaining the underlying compute themselves. Standard single-model deployments typically move from initial scoping to a live endpoint within two weeks, with more complex configurations scoped individually during the infrastructure assessment.
Reasoning and general purpose
Coding and agentic workloads

For teams with requirements outside the curated catalog, Tensormesh connects directly to Hugging Face, giving access to over 300,000 open-weight models that can be deployed on your infrastructure without vendor approval, usage review, or third-party negotiation of any kind.
Closed-weight APIs were the right starting architecture for most enterprise AI teams and remain a reasonable fit for certain use cases, but for teams where AI inference has become genuinely load-bearing infrastructure, the question worth asking is whether a commercial agreement that offers no termination protection, no appeals mechanism, and no provider liability still makes sense as a long-term foundation to build on.
The infrastructure and model quality are in place, and the migration path is more straightforward than most teams expect when they begin the assessment.
Explore Tensormesh’s model catalog
Don't see the model you need? Let us know
Try Tensormesh now with $100 in free GPU Credits