IP Carrier: To Disrupt, Generative AI has to be More Like the Internet Was

Sunday, September 22, 2024

To Disrupt, Generative AI has to be More Like the Internet Was

The cost of acquiring and using a generative artificial intelligence model matters, both for model suppliers and users of such models, as is true for any technology. That might be especially important now, in the early days of deployment, as end users remain unsure about return on investment.

Strategically, one might also argue that the cost-benefit of GenAI has to eventually resemble the cost-benefit and economics of the internet to succeed. Namely, GenAI has to become a low-cost solution for high-cost problems.

In other words, the internet has proven so disruptive and useful because it provided low-cost solutions for high-cost problems. So far, the issue with generative AI has been that it often seems a high-cost solution for lower-value problems. And that is not a surefire recipe for success.

To be sure, we will move up the experience curve, and GenAI costs will drop. All that suggests we eventually will discover ways to leverage GenAI in a low-cost way to solve high-cost problems. The best precedent is the internet, as a platform.

The internet dramatically lowered the costs of communication and information sharing across distances. Tasks that previously required expensive long-distance phone calls, postal mail, or in-person meetings could now be done instantly and cheaply using, text messages, app messages, email, file sharing, videoconferencing and so forth.

The low-cost infrastructure of the internet allowed new types of businesses to emerge that would not have been viable before, including wide area or global e-commerce, digital content distribution, online advertising, and software and content distributed by virtual networks rather than physical media.

Also, the internet made vast amounts of information freely accessible that was previously locked behind high-cost barriers like libraries, academic institutions, or proprietary databases. This dramatically reduced the cost of learning and research for individuals and organizations.

Many costs for creating or running businesses also were reduced.

Tools such as wikis, open source software, and cloud computing allowed large-scale collaboration and resource sharing at very low marginal costs, enabling new forms of innovation and problem-solving.

The internet also reduced the capital costs required to start and scale many types of businesses.

Online marketplaces and platforms dramatically reduced search and transaction costs for buyers and sellers across many industries as well. So many manual, labor-intensive processes could be automated.

The key insight is that by providing a standardized, open platform with very low marginal costs, the internet enabled solutions to problems and inefficiencies across many domains that were previously prohibitively expensive.

To have the expected impact, GenAI will have to move in those directions as well. It will have to attack the cost basis for lots of business processes, and do so at much-lower cost.

But it is a safe prediction that the costs of acquiring use of a large language model; training them and generating inferences will drop over time, as tends to be the rule for any computing-driven use case. And that matters as generative artificial intelligence is the top AI solution deployed in organizations, according to a new survey by Gartner.

According to a Gartner survey conducted in the fourth quarter of 2023, 29% of the 644 respondents from organizations in the U.S., Germany and the U.K. said that they have deployed and are using GenAI, making GenAI the most frequently deployed AI solution. GenAI was found to be more common than other solutions like graph techniques, optimization algorithms, rule-based systems, natural language processing and other types of machine learning.

The survey also found that utilizing GenAI embedded in existing applications (such as Microsoft’s Copilot for 365 or Adobe Firefly) is the top way to fulfill GenAI use cases, with 34% of respondents saying this is their primary method of using GenAI. This was found to be more common than other options such as customizing GenAI models with prompt engineering (25 percent), training or fine-tuning bespoke GenAI models (21 percent), or using standalone GenAI tools, like ChatGPT or Gemini (19 percent).

Activity	2020 Cost (cents/1000 tokens)	2024 Cost (cents/1000 tokens)	Study	Date	Publisher	Key Conclusions
Creating LLMs	5,333 - 106,667	602	"Large language model"	2024	Wikipedia	Training costs have decreased significantly since 2020. In 2020, a 1.5B parameter model cost $80K-$1.6M, while in 2023, a 12B parameter model costs about $120K
Modifying (Fine-tuning)	N/A	60	"Breaking Down the Cost of Large Language Models"	2024	Qwak	Fine-tuning costs are generally lower than training from scratch, but still significant
Using (Inference) - GPT-3	60 (output)	20 (output)	"Breaking Down the Cost of AI for Organizations"	2024	TensorOps	Inference costs have decreased, with GPT-3.5 being cheaper than earlier versions
Using (Inference) - Claude	N/A	1500 (output)	"Breaking Down the Cost of Large Language Models"	2024	Qwak	More advanced models like Claude Opus have higher inference costs

In a pre-training scenario involving a model with 70 billion parameters, using YaFSDP can save the resources of approximately 150 GPUs, says Yandex. This translates to potential monthly savings of roughly $0.5 to $1.5 million, depending on the virtual GPU provider or platform.

But innovations including architecture; hardware acceleration; model size; algorithms; open source- and training methods all will contribute to reducing the cost of creating and using large language models.

Innovation	Study	Date	Publisher	Key Conclusions
Efficient Training Algorithms	"Chinchilla: Training Language Models with Compute-Optimal Scale"	Mar 2022	DeepMind	Smaller models trained on more data can match performance of larger models, reducing compute costs
Hardware Acceleration	"A Survey on Hardware Accelerators for Large Language Models"	Jan 2024	arXiv	Custom hardware like GPUs, FPGAs and ASICs can significantly improve LLM performance and energy efficiency
Model Compression	"LLM in a flash: Efficient Large Language Model Inference with Limited Memory"	Dec 2023	arXiv	Techniques like quantization and pruning can reduce model size and memory requirements without major performance loss
Sparse Models	"GLaM: Efficient Scaling of Language Models with Mixture-of-Experts"	Dec 2021	Google	Sparse mixture-of-experts models can be more parameter efficient than dense models
Distributed Training	"Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism"	Sep 2019	NVIDIA	Techniques for efficiently training very large models across multiple GPUs/nodes
Few-Shot Learning	"Language Models are Few-Shot Learners"	May 2020	OpenAI	Large models can perform well on new tasks with just a few examples, reducing task-specific training data needs
Open Source Models	"OPT: Open Pre-trained Transformer Language Models"	May 2022	Meta AI	Open sourcing large models enables wider research and reduces duplication of training efforts
Efficient Architectures	"Efficient Transformers: A Survey"	Dec 2020	arXiv	Architectural innovations like sparse attention can improve efficiency of transformer models

----------------

IP Carrier

Sunday, September 22, 2024

To Disrupt, Generative AI has to be More Like the Internet Was

No comments:

Yes, Follow the Data. Even if it Does Not Fit Your Agenda

Translate

Blog Archive

Translate

Report Abuse

Pages