Just how artificial intelligence model providers might improve their economics is a key business model issue.
A shift to inference operations also emphasizes the importance of reducing cost per token at scale.
Where software often has marginal costs close to zero, use of AI models seem to have costs that scale almost linearly with usage, so marginal costs are high.
A few key business issues are clear enough.
A greater shift of cost towards fixed cost rather than variable cost seems necessary. For example, model creators could move in the direction of owning their compute infrastructure rather than renting cloud capacity.
The problems with high variable cost are clear:
Unit economics
Cash burn and need for capital injections
Competitive pressure
Capital allocation
Right now, model queries do not show software-style marginal cost trends, where marginal costs are close to zero.
Every additional user or query drives proportional costs for GPUs, power, and data center capacity.
Reports suggest inference alone consumes 50 percent of revenue in some cases. Some suggest the problem is worse, with at least some model providers spending more than 100 percent of revenue on compute services.
By itself, that might not be existential, as model providers are in the early stages of growth, meaning incremental revenues would not be expected to cover the full costs of creating and operating the models at scale.
But observers do worry about marginal costs that seemingly do not have software-style economics with near-zero marginal costs.
Cash burn and capital intensity also are issues. Rapid revenue growth is offset by even faster cost scaling, which necessitates high investor funding requirements, borrowing, equity raises.
Pricing issues also are an issue. When model suppliers raise prices, introduce usage-based tiers, or limit free access, they risk customer churn or slower adoption, even as they address revenue issues.
Strategic vulnerabilities also exist when model suppliers are dependent on external suppliers for crucial computing services (operating costs, ability to manage surges of demand).
Capital allocation is an issue as well. One might argue that model builders should divert capital into compute infrastructure, but that is hugely expensive and detracts from the job of developing the next generation of models.
So the issue is how to fix the problem. Revenue growth with scale obviously helps, but doesn't solve the marginal cost issue. Efficiency improvements, owned infrastructure and pricing innovations all will play a role.
Smaller or distilled models, sparse activations, better architectures, prompt caching, batching, quantization, and routing to cheaper models for simpler tasks will happen.
Inference costs per token have dropped significantly in some cases, allowing gross compute margins to improve.
Stranded assets always are a problem, so higher GPU utilization rates help. So do custom silicon and algorithmic advances.
Owned infrastructure is a partial answer. Model builders and compute suppliers are investing heavily in their own data centers and chips (Anthropic's $50 billion commitments for custom U.S. facilities with partners like Fluidstack; OpenAI's Stargate).
Revenue models also are adjusting. Higher enterprise pricing, usage-based tiers, value-based pricing (charge relative to delivered value, not just tokens) and premium features or apps are introduced.
Given all that, one logical historical precedent is for mergers and acquisitions that place model building and compute functions under a single ownership. On the other hand, antitrust regulations will probably tend to restrict options for some of the most-likely buyers (Alphabet, AWS, Microsoft, SpaceX, for example).
So other forms of cooperation are likely to develop.
Expect partnerships, joint ventures, dedicated capacity deals, and partial ownership rather than full-scale mergers and acquisitions, which will face antitrust opposition.
When feasible, model builders are creating their own compute infrastructure. OpenAI's Stargate project with Oracle and SoftBank (up to $500B, multi-gigawatt scale) provides an example. Investments in neocloud suppliers is another example.
But that might be the exception to the rule. Anthropic, for example, has chosen to sign big supply deals rather than build its own facilities.
Partnerships for Tensor Processing Units, Trainium, and other accelerators reduce reliance on expensive third-party GPUs and improve efficiency also are growing.
But full vertical integration might not be the immediate or mid-term path forward, partly for regulatory scrutiny reasons; partly for capital intensity reasons and partly for business diversity reasons. Both model builders and compute infra providers prefer a diversity of partners.
So joint ventures and consortia that also have the advantage of off-balance-sheet implications will happen.
In summary, tight strategic integration and partial ownership rather than blockbuster mergers are the main approaches. That avoids regulatory opposition and also is capital efficient.