IP Carrier: 2023

Friday, December 29, 2023

GPU Moves Show "Channel Conflict" and "Frenemy" Behavior

“Channel conflict,” where suppliers compete with their customers, has a long history in the computing industry. These days we tend to call this a “frenemy” dynamic, where partners or suppliers become competitors or competitors become partners.

The latest version is the move by Nvidia to get into the business of “GPUs as a service” while big cloud services and device suppliers look to build their own GPUs.

The motivation is simple enough: a single Nvidia H100 GPU now lists for $57,000 on hardware vendor CDW’s online site. And cloud computing as a service firms buy thousands of such units or their equivalents.

Google has invested significantly in its in-house Tensor Processing Unit chips, while Amazon and Microsoft are building their own custom AI accelerators as well. OpenAI and Meta also are developing their own in-house GPUs.

It is not clear how many Nvidia GPUs were sold to cloud computing as a service suppliers in 2023. But based on Nvidia’s second quarter 2024 earnings call and press reports, as many as 50,000 to 85,000 Nvidia GPUs might have been sold in 2023.

Estimated Percentage of Total Nvidia GPU Sales	Units	% of Nvidia GPU sales
AWS	20,000 - 30,000	20% - 30%
Microsoft Azure	15,000 - 25,000	15% - 25%
Google Cloud	10,000 - 20,000	10% - 20%
Other Cloud Providers (Oracle, Alibaba, etc.)	5,000 - 10,000	5% - 10%
Total Estimated Purchases	50,000 - 85,000	50% - 85%

The "frenemy" dynamic, also known as channel conflict, has seemingly always been part of the computing industry, though muted in the pre-1970 period as the industry was virtually a monopoly held by IBM.

But channel conflict increased between the 1970s and 1990s in the PC era, as independent software vendors developed and hardware sales channels multiplied.

During the Wintel period (1990s-2000s), channel conflict arguably declined, but the frenemy pattern seems to be increasing in the cloud era.

Participant Taking on New Role	Former Supplier Role
Cloud Provider: Building custom hardware	Hardware manufacturer
Distributor: Offering managed services	Software vendor
Reseller: Developing their own software applications	Cloud provider
End-user: Self-provisioning cloud infrastructure	IT service provider

Thursday, December 28, 2023

5G Did Not "Fail"

It is not hard to find examples of belief that 5G failed. Perhaps not. Maybe we just have not adjusted to the function of mobile and fixed networks in the internet era.

Keep in mind that the primary function of any fixed or mobile network originally was “voice communications.” In that context, “voice” was the app that the network supported. Ever since 3G, mobile operators have been touting or searching for other key apps they could provide and--more importantly--”own” as they own network voice and text messaging; video services or the internet access function.

The fundamental problem is that modern networks are not conducive to that sort of “ownership” by access providers.

The whole point of layered software and networks is to make user-facing apps independent from network functions. That separation of apps from network access means it is fundamentally challenging for any internet service provider to actually have a gatekeeper function over any apps.

And that, more than anything else, explains why it has been so hard for mobile operators to come up with new apps they are uniquely positioned to “own,” in the same way they are able to “own” voice, messaging, internet access or subscription video services.

In fact, the main point of next-generation mobile networks is “increase capacity” more than anything else. 3G, 4G and 5G have been profoundly necessary to support increased internet access capacity, in the same way that fixed networks have used fiber to the home to increase interne t access capacity over copper access.

And that is why all next-generation mobile networks are deemed to have “failed” in the sense of creating new services or apps that mobile operators “own” and “control.”

The software architecture is designed to separate apps from network access, which makes it difficult for ISPs to own or control new apps.

Granted, we are early in the 5G era, so it is not yet clear what new use cases and apps might develop. It is fairly safe to say those innovations are unlikely to be created, owned and controlled by mobile operators.

The software architecture is designed to prevent such control by ISPs.

Wednesday, December 27, 2023

LLM Costs Should Drop Over Time: They Almost Have To Do So

One reason bigger firms are likely to have advantages as suppliers and operators of large language models is that LLMs are quite expensive--at the moment--compared to search operations. All that matters for LLM business models.

Though costs should change over time, the current cost delta between a single search query and a single inference operation are quite substantial. It is estimated, for example, that a search engine query costs between $0.0001 and $0.001 per query.

In comparison, a single LLM inference operation might cost between $0.01 and $0.10 per inference, depending on model size, prompt complexity, and cloud provider pricing.

Costs might vary substantially if a general-purpose LLM is used, compared to a specialized, smaller LLM adapted for a single firm or industry, for example. It is not unheard of for a single inference operation using a general-purpose model to cost a few dollars, for example, though costs in the cents per operation are likely more common.

In other words, an LLM inference operation might cost 10 to 100 times what a search query costs.

Here, for example, are recent quotes by Google Cloud’s Vertex AI service.

Model	Type	Region	Price per 1,000 characters
PaLM 2 for Text (Text Bison)	Input	Global	Online requests: $0.00025 Batch requests: $0.00020
	Output	Global	Online requests: $0.0005 Batch requests: $0.0004
	Supervised Tuning	us-central1 europe-west4	$ per node hour Vertex AI custom training pricing
	Reinforcement Learning from Human Feedback	us-central1 europe-west4	$ per node hour Vertex AI custom training pricing
PaLM 2 for Text 32k (Text Bison 32k)	Input	Global	Online requests: $0.00025 Batch requests: $0.00020
	Output	Global	Online requests: $0.0005 Batch requests: $0.0004
	Supervised Tuning	us-central1 europe-west4	$ per node hour Vertex AI custom training pricing
PaLM 2 for Text (Text Unicorn)	Input	Global	Online requests: $0.0025 Batch requests: $0.0020
PaLM 2 for Text (Text Unicorn)	Output	Global	Online requests: $0.007 Batch requests: $0.0060
PaLM 2 for Chat (Chat Bison)	Input	Global	Online requests: $0.00025
	Output	Global	Online requests: $0.0005
	Supervised Tuning	us-central1 europe-west4	$ per node hour Vertex AI custom training pricing
	Reinforcement Learning from Human Feedback	us-central1 europe-west4	$ per node hour Vertex AI custom training pricing
PaLM 2 for Chat 32k (Chat Bison 32k)	Input	Global	Online requests: $0.00025*
	Output	Global	Online requests: $0.0005*
	Supervised Tuning	us-central1 europe-west4	$ per node hour Vertex AI custom training pricing
Embeddings for Text	Input	Global	Online requests: $0.000025 Batch requests: $0.00002
Embeddings for Text	Output	Global	Online requests: No charge Batch requests: No charge
Codey for Code Generation	Input	Global	Online requests: $0.00025 Batch requests: $0.00020
	Output	Global	Online requests: $0.0005 Batch requests: $0.0004
	Supervised Tuning	us-central1 europe-west4	$ per node hour Vertex AI custom training pricing
Codey for Code Generation 32k	Input	Global	Online requests: $0.00025
	Output	Global	Online requests: $0.0005
	Supervised Tuning	us-central1 europe-west4	$ per node hour Vertex AI custom training pricing
Codey for Code Chat	Input	Global	Online requests: $0.00025
	Output	Global	Online requests: $0.0005
	Supervised Tuning	us-central1 europe-west4	$ per node hour Vertex AI custom training pricing
Codey for Code Chat 32k	Input	Global	Online requests: $0.00025
	Output	Global	Online requests: $0.0005
	Supervised Tuning	us-central1 europe-west4	$ per node hour Vertex AI custom training pricing
Codey for Code Completion	Input	Global	Online requests: $0.00025
Codey for Code Completion	Output	Global	Online requests: $0.0005

But training and inference costs could well decline over time, experts argue. Smaller, more efficient models are likely to develop, using cost-reduction techniques like parameter pruning, knowledge distillation, and low-rank factorization, some will argue.

Sparse training methods focused only on the relevant parts of the model for specific tasks also will help.

Use of existing pre-trained models that are fine-tuned for specific tasks also can reduce training costs.

Dedicated hardware specifically optimized for LLM workloads already is happening. In similar fashion, optimizing training algorithms; quantization and pruning (removing unnecessary parameters); automatic model optimization (tools and frameworks that automatically optimize models for specific hardware and inference requirements) and open source all will help lower costs.

IP Carrier