Friday, December 29, 2023

GPU Moves Show "Channel Conflict" and "Frenemy" Behavior

“Channel conflict,” where suppliers compete with their customers, has a long history in the computing industry. These days we tend to call this a “frenemy” dynamic, where partners or suppliers become competitors or competitors become partners. 


The latest version is the move by Nvidia to get into the business of “GPUs as a service” while big cloud services and device suppliers look to build their own GPUs. 


The motivation is simple enough: a single Nvidia H100 GPU now lists for $57,000 on hardware vendor CDW’s online site. And cloud computing as a service firms buy thousands of such units or their equivalents. 


Google has invested significantly in its in-house Tensor Processing Unit  chips, while Amazon and Microsoft are building their own custom AI accelerators as well. OpenAI and Meta also are developing their own in-house GPUs. 


It is not clear how many Nvidia GPUs were sold to cloud computing as a service suppliers in 2023. But based on Nvidia’s second quarter 2024 earnings call and press reports, as many as 50,000 to 85,000 Nvidia GPUs might have been sold in 2023. 


Estimated Percentage of Total Nvidia GPU Sales

Units

% of Nvidia GPU sales

AWS

20,000 - 30,000

20% - 30%

Microsoft Azure

15,000 - 25,000

15% - 25%

Google Cloud

10,000 - 20,000

10% - 20%

Other Cloud Providers (Oracle, Alibaba, etc.)

5,000 - 10,000

5% - 10%

Total Estimated Purchases

50,000 - 85,000

50% - 85%


The "frenemy" dynamic, also known as channel conflict, has seemingly always been part of the computing industry, though muted in the pre-1970 period as the industry was virtually a monopoly held by IBM. 


But channel conflict increased between the 1970s and 1990s in the PC era, as independent software vendors developed and hardware sales channels multiplied.


During the Wintel period (1990s-2000s), channel conflict arguably declined, but the frenemy pattern seems to be increasing in the cloud era. 


Participant Taking on New Role

Former Supplier Role

Cloud Provider: Building custom hardware

Hardware manufacturer

Distributor: Offering managed services

Software vendor

Reseller: Developing their own software applications

Cloud provider

End-user: Self-provisioning cloud infrastructure

IT service provider


Thursday, December 28, 2023

5G Did Not "Fail"

It is not hard to find examples of belief that 5G failed. Perhaps not. Maybe we just have not adjusted to the function of mobile and fixed networks in the internet era.


Keep in mind that the primary function of any fixed or mobile network originally was “voice communications.” In that context, “voice” was the app that the network supported. Ever since 3G, mobile operators have been touting or searching for other key apps they could provide and--more importantly--”own” as they own network voice and text messaging; video services or the internet access function. 


The fundamental problem is that modern networks are not conducive to that sort of “ownership” by access providers. 


The whole point of layered software and networks is to make user-facing apps independent from network functions. That separation of apps from network access means it is fundamentally challenging for any internet service provider to actually have a gatekeeper function over any apps. 


And that, more than anything else, explains why it has been so hard for mobile operators to come up with new apps they are uniquely positioned to “own,” in the same way they are able to “own” voice, messaging, internet access or subscription video services. 


In fact, the main point of next-generation mobile networks is “increase capacity” more than anything else. 3G, 4G and 5G have been profoundly necessary to support increased internet access capacity, in the same way that fixed networks have used fiber to the home to increase interne t access capacity over copper access. 


And that is why all next-generation mobile networks are deemed to have “failed” in the sense of creating new services or apps that mobile operators “own” and “control.” 


The software architecture is designed to separate apps from network access, which makes it difficult for ISPs to own or control new apps. 


Granted, we are early in the 5G era, so it is not yet clear what new use cases and apps might develop. It is fairly safe to say those innovations are unlikely to be created, owned and controlled by mobile operators. 


The software architecture is designed to prevent such control by ISPs.


Wednesday, December 27, 2023

LLM Costs Should Drop Over Time: They Almost Have To Do So

One reason bigger firms are likely to have advantages as suppliers and operators of large language models is that LLMs are quite expensive--at the moment--compared to search operations. All that matters for LLM business models.


Though costs should change over time, the current cost delta between a single search query and a single inference operation are quite substantial. It is estimated, for example, that a search engine query costs between $0.0001 and $0.001 per query.


In comparison, a single LLM inference operation might cost  between $0.01 and $0.10 per inference, depending on model size, prompt complexity, and cloud provider pricing. 


Costs might vary substantially if a general-purpose LLM is used, compared to a specialized, smaller LLM adapted for a single firm or industry, for example. It is not unheard of for a single inference operation using a general-purpose model  to cost a few dollars, for example, though costs in the cents per operation are likely more common. 


In other words, an LLM inference operation might cost 10 to 100 times what a search query costs. 


Here, for example, are recent quotes by Google Cloud’s Vertex AI service. 


Model

Type

Region

Price per 1,000 characters

PaLM 2 for Text (Text Bison)

Input

Global

  • Online requests: $0.00025

  • Batch requests: $0.00020

Output

Global

  • Online requests: $0.0005

  • Batch requests: $0.0004

Supervised Tuning

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing


Reinforcement Learning from Human Feedback

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing


PaLM 2 for Text 32k (Text Bison 32k)

Input

Global

  • Online requests: $0.00025

  • Batch requests: $0.00020

Output

Global

  • Online requests: $0.0005

  • Batch requests: $0.0004

Supervised Tuning

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing


PaLM 2 for Text

(Text Unicorn)

Input

Global

  • Online requests: $0.0025

  • Batch requests: $0.0020

Output

Global

  • Online requests: $0.007

  • Batch requests: $0.0060

PaLM 2 for Chat (Chat Bison)

Input

Global

  • Online requests: $0.00025

Output

Global

  • Online requests: $0.0005

Supervised Tuning

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing


Reinforcement Learning from Human Feedback

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing


PaLM 2 for Chat 32k (Chat Bison 32k)

Input

Global

  • Online requests: $0.00025*

Output

Global

  • Online requests: $0.0005*

Supervised Tuning

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing


Embeddings for Text

Input

Global

  • Online requests: $0.000025

  • Batch requests: $0.00002

Output

Global

  • Online requests: No charge

  • Batch requests: No charge

Codey for Code Generation

Input

Global

  • Online requests: $0.00025

  • Batch requests: $0.00020

Output

Global

  • Online requests: $0.0005

  • Batch requests: $0.0004

Supervised Tuning

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing


Codey for Code Generation 32k

Input

Global

  • Online requests: $0.00025

Output

Global

  • Online requests: $0.0005

Supervised Tuning

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing


Codey for Code Chat

Input

Global

  • Online requests: $0.00025

Output

Global

  • Online requests: $0.0005

Supervised Tuning

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing

Codey for Code Chat 32k

Input

Global

  • Online requests: $0.00025

Output

Global

  • Online requests: $0.0005

Supervised Tuning

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing


Codey for Code Completion

Input

Global

  • Online requests: $0.00025

Output

Global

  • Online requests: $0.0005


But training and inference costs could well decline over time, experts argue.  Smaller, more efficient models are likely to develop, using cost-reduction techniques like parameter pruning, knowledge distillation, and low-rank factorization, some will argue. 


Sparse training methods focused only on the relevant parts of the model for specific tasks also will help. 


Use of existing pre-trained models that are fine-tuned for specific tasks also can reduce training costs. 


Dedicated hardware specifically optimized for LLM workloads already is happening. In similar fashion, optimizing training algorithms; quantization and pruning (removing unnecessary parameters); automatic model optimization (tools and frameworks that automatically optimize models for specific hardware and inference requirements) and open source all will help lower costs. 


DIY and Licensed GenAI Patterns Will Continue

As always with software, firms are going to opt for a mix of "do it yourself" owned technology and licensed third party offerings....