One reason bigger firms are likely to have advantages as suppliers and operators of large language models is that LLMs are quite expensive--at the moment--compared to search operations. All that matters for LLM business models.
Though costs should change over time, the current cost delta between a single search query and a single inference operation are quite substantial. It is estimated, for example, that a search engine query costs between $0.0001 and $0.001 per query.
In comparison, a single LLM inference operation might cost between $0.01 and $0.10 per inference, depending on model size, prompt complexity, and cloud provider pricing.
Costs might vary substantially if a general-purpose LLM is used, compared to a specialized, smaller LLM adapted for a single firm or industry, for example. It is not unheard of for a single inference operation using a general-purpose model to cost a few dollars, for example, though costs in the cents per operation are likely more common.
In other words, an LLM inference operation might cost 10 to 100 times what a search query costs.
Here, for example, are recent quotes by Google Cloud’s Vertex AI service.
But training and inference costs could well decline over time, experts argue. Smaller, more efficient models are likely to develop, using cost-reduction techniques like parameter pruning, knowledge distillation, and low-rank factorization, some will argue.
Sparse training methods focused only on the relevant parts of the model for specific tasks also will help.
Use of existing pre-trained models that are fine-tuned for specific tasks also can reduce training costs.
Dedicated hardware specifically optimized for LLM workloads already is happening. In similar fashion, optimizing training algorithms; quantization and pruning (removing unnecessary parameters); automatic model optimization (tools and frameworks that automatically optimize models for specific hardware and inference requirements) and open source all will help lower costs.