IP Carrier

Sunday, February 16, 2025

How Big is the High Performance Computing Market?

The high-performance computing segment of the overall computing market is big, and apparently destined to get bigger as artificial intelligence takes hold, some forecasts suggest. The issue is “how big?”

Estimates from leading research firms including IDC and Gartner are substantial--suggesting a 2023 market of between $530 billion and $795 billion (though presumably there is some overlap or overcounting within each category, from firm to firm).

Segment	2023 Market Value (USD Billion)	Cumulative Annual Growth Rate 2023-2030
Chip/GPU Sales	$150 - $200	15-20%
Server & Storage	$100 - $150	12-18%
Networking & Interconnect	$50 - $80	10-15%
Software & Services*	$75 - $125	20-25%
Data Center Infrastructure	$125 - $175	10-15%
"GPU as a Service"	$20 - $40	30-40%
"AI as a Service"	$10 - $25	40-50%

* HPC Software, $30 - $50B, 25-30% growth rate HPC Consulting & Support, $25-$45B, 18-22% growth rate AI Software, $20-$30B, 25-35% growth rate

As always, the assumptions are crucial. It often is hard to separate investments or revenue from “high-performance computing” within the larger computing or cloud computing market. And the estimates might be on the optimistic side, as typically is the case.

Estimates from other firms are lower. Analysts at Grandview Research, for example, only estimate a 2030 HPC market value of less than $90 billion.

Segment	2023 Value (USD)	2030 Value (USD)	CAGR
Overall Market	52.65 billion	87.31 billion	7.5%
Data Center GPU Market	14.87 billion	76.76 billion	28.5%
GPU as a Service (GPUaaS)	3.35 billion	16.74 billion	21.6%
High Bandwidth Memory (HBM)	767.1 million	4.90 billion	25.5%
Commercial High-Performance Computing	25.33 billion	42.48 billion	9.0%
High-Performance Computing for Automotive	1.24 billion	2.32 billion	9.4%

Still, assuming the truth lies someplace between the “extremes,” we might still reasonably conclude that the high-performance computing market will reach somewhere in the neighborhood of $325 billion to $420 billion by about 2030.

Market Segment	2023 Estimated Value (USD)	2030 Projected Value (USD)	Growth Rate
GPU/Chip Sales	$30-35 billion	$120-150 billion	~20-25% CAGR
Data Center HPC Infrastructure	$25-30 billion	$80-100 billion	~18-22% CAGR
AI/HPC Cloud Services	$15-20 billion	$60-80 billion	~22-27% CAGR
AI Hardware Accelerators	$10-15 billion	$40-55 billion	~19-24% CAGR
HPC Software & Services	$8-12 billion	$25-35 billion	~16-20% CAGR
Total Market	$88-112 billion	$325-420 billion	~20-25% CAGR

Saturday, February 15, 2025

"Chain of Thought" Reasoning Requires 10X the Compute of Simple Inference

One reason some observers are a bit skeptical about DeepSeek cost claims (training and inference) is that the architecture, including use of "chain of thought" for inference, compared to the"simple inference" models used by ChatGPT, for example, require more processing, not less.

Granted, chain of thought models models break down problems into smaller steps, showing the reasoning at each step. So one might argue each of the smaller steps require less processing. On the other hand, more processing steps must occur.

So chain of thought models require more processing power and time compared to simple inference approaches. On the other hand, CoT is viewed as better for more-complex problems.

source: Cerebras Systems

Still, the CoT "more processing" profile is hard to square with claims that DeepSeek requires less compute.

Friday, February 14, 2025

DeepSeek Might be an Example of Continuity More than We Think

Perhaps I will change my mind as I learn more, but right now DeepSeek has added some clever innovation to language model training and inference costs. But other key contestants seem to be working in parallel.

Watch for coming releases from Anthropic and OpenAI, for example. And other developers have been working on similar approaches to DeepSeek. Gemini, for example, uses the “Mixture of Experts” approach also used by DeepSeek.

The point is that model training and inference costs already were dropping fast, as is typical for all computing processes.

source: Bain

New Reasoning Models Coming from Anthropic, GPT-5?

Anthropic (backed by Amazon and Alphabet) is expected to soon introduce a model that uses a slightly different approach to reasoning, allowing designers to tweak the computational effort of the model (essentially, high, medium or low), resulting in differences in how long and how much effort the model puts into reasoning about a particular problem.

Anthropic seems to be seeking a higher profile as an “enterprise” or “business” model supplier, whose products excel at the sorts of coding larger businesses require. For example, the new model is said to be “better at understanding complex codebases built from thousands of files and produces complete lines of code which work the first time.

That business user focus might also explain why Anthropic is putting effort into features that give developers more control over cost, speed and pricing.

The model uses more computational resources to calculate answers to hard questions. The AI model can also handle simpler tasks faster without the extra work, by acting like a traditional large language model, or LLM.

The new model might be important for other reasons. One advantage DeepSeek has apparently demonstrated is the way it can reason and learn from other models.

As was always to be expected, in a fast-moving AI field, important innovations by any single provider are going to be mimicked by other leading contenders as well.

One might well argue that Anthropic’s new model will provide an example of that, perhaps also illustrating the fact that the DeepSeek approach to reasoning also has been under development or investigation by multiple developers, to some extent.

OpenAI will likely incorporate similar forms of reasoning effort in GPT-5, some might argue.

Thursday, February 13, 2025

Lower LLM Costs Only Part of the Framework for AGI and Agentic AI

There often is a tendency to believe that lower-cost large language models (generative artificial intelligence) have direct implications for the cost of other forms of AI, that is at best partly true, one can argue.

Consider the relationship between LLMs and agentic AI or “artificial general intelligence.”

While LLMs provide language fluency and broad knowledge, they lack deep reasoning, memory, planning, and real-world interaction. A true AGI would integrate LLMs with other AI paradigms, including:

Symbolic AI for logic & reasoning
Reinforcement Learning for decision-making
Memory systems for persistent knowledge
Multimodal AI for vision, speech, and sensory input
Self-learning and world modeling for adaptability

Artificial General Intelligence (AGI) would require a system that can learn, reason, adapt, and generalize across a wide range of tasks, much like a human. While LLMs (Large Language Models) are powerful in processing and generating text, they have key limitations that prevent them from achieving AGI on their own. However, they can play an important role as a component within a larger AGI system.

Likewise, LLMs provide language understanding, reasoning, and decision support, making them useful for agentic AI in several ways:

Language comprehension and generation – LLMs enable agents to process natural language instructions, communicate with users, and generate responses.
Reasoning and planning – Through prompt engineering (e.g., Chain-of-Thought prompting), LLMs can simulate step-by-step problem-solving.
Knowledge retrieval and synthesis – LLMs act as information processors, integrating and summarizing knowledge from different sources.
Code and automation – LLMs can generate and execute code, allowing agents to perform automated workflows

However, LLMs alone are reactive. They respond to prompts rather than initiating actions autonomously. To become true agents, AI needs additional capabilities.

The Role of LLMs in AGI

LLMs can serve as a language and knowledge engine in AGI by:

Understanding and generating natural language (communication)
Encoding vast amounts of world knowledge
Generating code, plans, and reasoning chains (for problem-solving)

However, AGI needs much more than just language modeling. It requires learning, reasoning, memory, perception, and real-world interaction.

To create an AGI system, additional AI subsystems beyond LLMs would be needed, including:

Memory and long-term knowledge retention, as LLMs do not retain memory between sessions.AGI needs episodic memory (remembering past interactions) and semantic memory (storing structured facts over time). So LLMs need to integrate with or interface with databases or vector memory systems.
Reasoning and planning (LLMs can do some reasoning, but they do not truly understand causality or plan long-term. So any AGI would require logic-based reasoning systems, similar to symbolic AI or neuro-symbolic approaches.
Learning beyond pretraining (AGI must be able to continually learn and update its knowledge based on new experiences. This might involve meta-learning, reinforcement learning, and active learning approaches.
Multimodal perception (AGI would need vision, audio, and sensor-based perception
Goal-directed behavior and autonomy (AGI would need an autonomous agent system that can pursue objectives, optimize actions, and self-correct over time)
Embodiment and real-world interaction (Some argue AGI will need a physical or simulated "body" to interact with the world, similar to how humans learn)

Instead of replacing LLMs, AGI systems may incorporate them as a central knowledge and communication layer while combining them with other AI components.

Role of LLMs for Agentic AI

To transform LLMs into autonomous agents, researchers combine them with additional components, such as:

Memory-Augmented LLMs (vector databases (such as Pinecone, Weaviate, ChromaDB) to store and retrieve past interactions, allowing agents to remember previous tasks and refine their behavior over time. AutoGPT and BabyAGI use memory to track goals and intermediate steps, for example.
Planning and Decision-Making Modules (LLMs are combined with reinforcement learning, symbolic AI, or search-based planning systems to enable structured reasoning. OpenAI’s tool-use framework lets LLMs call APIs, retrieve information, and solve complex problems step-by-step, for example.
API and Environment Interaction (LLM-powered agents need tools to execute actions, such as calling APIs, running scripts, or manipulating environments. LangChain and OpenAI Functions enable LLMs to interact with external tools (databases, automation scripts), for example.
Feedback & Self-Improvement Loops (Agents use self-reflection to evaluate and refine their outputs).

Several AI frameworks integrate LLMs into agentic systems:

AutoGPT and BabyAGI (LLM-based agents autonomously define objectives, plan tasks, execute steps, and iterate on results. An AutoGPT agent for market research might break down tasks into three major tasks: research competitors; summarize trends and draft a strategy report.
LangChain Agents (which enable external tools and application programming interfaces; store and recall memory; plan and execute workflows. An example is a customer service agent that remembers user history and escalates issues as needed.
ReAct (Reasoning + Acting) is an architecture allowing LLMs to reason about tasks, proceed step-by-step and decide on actions. For example, a travel agent would, on behalf of a user, conclude that "I need to find a flight to New York." So "I should check Google Flights" and then "compare prices before booking."

The main point is that LLMs are a functional part of platforms aiming to provide agentic AI and future AGI systems, but that LLM cannot do so in the same way that an operating system enables a personal computer to function. An LLM is part of a suite of capabilities.

The implication is that lower-cost LLM training and inference costs contribute to other developments in agentic AI and AGI, but are not a sole and sufficient driver of those developments.

DeepSeek Cost Advantages Will Only Drive Competitors to Reduce Their Own Costs

To the extent that DeepSeek’s approach proves widely effective, algorithmic efficiency and hardware versatility rather than brute-force scaling with expensive GPUs is among the possible outcomes for other competitors in the language model business.

That is why public firm valuations in the AI ecosystem have been challenged recently, including Nvidia, Alphabet and Microsoft, for example.

The response by model builders will seek to narrow any perceived DeepSeek advantages in cost of model training and inference operations.

Every leading model builder will seek to further optimize their own models, to achieve cost reductions in training and inference operations. Existing trends to reduce reliance on high-end Nvidia graphics processing units will probably intensify.

That might include moves to use custom tensor processor units, custom application-specific integrated circuits, or lower-cost GPUs.

More focus on lightweight, fine-tuned models for enterprise use also are possible.

Companies such as Meta, which are already pushing open-source models, may accelerate their efforts to develop cost-efficient open models that can be trained with less computational power.

From a technical standpoint, competitors will likely pursue several approaches. Algorithms might be changed to reflect:

Sparse Computation: More use of sparse mixture-of-expert (MoE) models, which activate only a fraction of the network during inference, reducing compute demands.
Efficient Transformer Variants: Exploration of architectures like FlashAttention, Linformer, and Mamba (state-space models) to reduce memory and computational overhead.
Gradient and Weight Quantization: Further refinement of low-bit precision training (e.g., 4-bit, 8-bit training) to allow high-performance training on less-powerful hardware.

Different choices of hardware also will be explored, but that is an existing trend.

Training methods are almost certain to be tweaked, with model builders possibly looking to:

Fine-grained parallelism, a way of optimizing model sharding and tensor parallelism to reduce bottlenecks in distributed training.
Developing training approaches that leverage many lower-powered chips across cloud and edge devices instead of relying on a few high-end GPUs.
Adaptive computation scaling that dynamically adjusts model complexity based on hardware constraints.

4. Training Data and Pre-training Strategies

Synthetic Data Utilization: Since compute-efficient models may rely on less high-quality training data, competitors might advance synthetic data generation to enhance pretraining efficiency.
Progressive Training: Layer-wise or curriculum-based training methods that reduce compute-heavy early-stage processing.

5. Inference and Deployment Efficiency

Model Distillation: Greater emphasis on distilling large models into smaller, more efficient versions with minimal performance loss.
Token-Level Pruning: Techniques that dynamically adjust inference complexity based on task difficulty, reducing unnecessary computation.
Adaptive Inference Pipelines: Using lower-bit or pruned versions of models for simpler queries while reserving full-scale computation for complex tasks.