Sunday, October 8, 2023

AI Drives GPU Channel Conflict

To the extent that Nvidia is creating an ability to operate as a “computing or software as a service” supplier, using its own or partner data centers, a logical question is whether Nvidia might have at least a window of opportunity, given its assumed lead in creating hardware, software, tools, and services that power AI applications, including code libraries, a base of developers, pre-built models and integration with important software frameworks. 


On the other hand, as the leading “cloud computing as a service” providers step up their own in-house GPU and chip capabilities, develop code libraries, pre-built models and software framework integrations, one might project that any existing Nvidia advantage will close. 


Even granting an Nvidia lead, AWS, Google Cloud and Azure, for example, have their own hardware, library, models and frameworks in place, and developing rapidly in response to the perceived need for “AI as a service” capabilities. 


Capabilities

NVIDIA

AWS

Google Cloud

Azure

Hardware

High-performance GPUs

FPGAs, GPUs

GPUs, TPUs

FPGAs, GPUs

Code libraries

cuDNN, cuBLAS, TensorRT

MXNet, PyTorch

TensorFlow, PyTorch

MXNet, PyTorch

Pre-built models

NVIDIA NGC catalog

SageMaker models

Vertex AI models

Azure AI models

Integration with software frameworks

Windows, VMWare, Kubernetes

Windows, Linux, Docker

Linux, Kubernetes

Windows, Linux, Kubernetes


As we often have seen, suppliers can become competitors to their customers, and such competition generally spurs efforts by customers to reduce their reliance on key suppliers. The issue is how long any perceived Nvidia advantage can be sustained, and whether the self-reliance efforts by major customers will eventually outweigh the shorter-term revenue benefits Nvidia gets as the key supplier of GPUs and other infrastructure supporting AI operations. 


AWS, Google Cloud, and Azure are all investing heavily in AI hardware, including GPUs. For example, AWS has its own custom-designed GPUs called Graviton2.


AWS, Google Cloud, and Azure are all developing their own code libraries and pre-built models for AI, ML, and data science. For example, AWS has a library called SageMaker Neo that helps developers optimize their AI models for deployment.


Also, AWS, Google Cloud, and Azure are all improving their integration with software frameworks such as Windows and VMWare. For example, Azure has a service called Azure Batch AI that helps developers run AI workloads on Windows and Linux VMs.


So Nvidia is gambling that, overall, becoming a competitor to its best customers will create near-term revenue advantages, almost-certain long-term sales declines of its GPU products and services revenues to former customers who become competitors and also compel Nvidia to change its business model.


Nvidia might wind up as a supplier of cloud computing and SaaS services rather than a supplier of GPUs and related services, at least in substantial part. 


The technology business is no stranger to former suppliers that emerge as competitors to their former customers, leading to a “coopetition” model.


Former Supplier

Products

Former Customers

Android

Operating system

Samsung, LG, HTC, Motorola

ARM

Processor designs

Apple, Qualcomm, MediaTek

Amazon Web Services

Cloud computing services

Netflix

Google

Search engine, web browser, operating system

Microsoft, Yahoo, Apple

Microsoft

Software, operating system

IBM, Oracle, Apple

Intel

Processors

AMD, NVIDIA

Samsung

Displays, memory, processors

Apple, Sony, Google

Sony

Displays, semiconductors, gaming consoles

Microsoft, Nintendo

Microsoft

Software, operating system

IBM, Oracle, Apple

IBM

Mainframes, software, consulting services

Hewlett Packard, Dell, Oracle

Oracle

Software, databases, cloud computing services

SAP, Microsoft, Amazon Web Services

Hewlett Packard

Computers, printers, servers

IBM, Dell, Lenovo


No comments:

Have LLMs Hit an Improvement Wall, or Not?

Some might argue it is way too early to worry about a slowdown in large language model performance improvement rates . But some already voic...