Thursday, January 30, 2025

DeepSeek was a Wakeup Call, But Hardly Unusual

Whatever the ultimate resolution of the claimed DeepSeek training and inference cost, ways of cutting inference and training coss already were happening, and always were expected. DeepSeek has been an unexpected element of the time table, to be sure (assuming the cost advantages prove to be sustainable). 


ear

Training Cost (Per Billion Parameters)

Inference Cost (Per 1M Tokens)

Key Cost Drivers

2020

$10M - $20M

$1.00 - $2.00

Expensive GPUs, early transformer models

2022

$2M - $5M

$0.20 - $0.50

Hardware efficiency (A100, TPUv4), model optimizations (Mixture of Experts)

2024

$500K - $2M

$0.05 - $0.20

Advanced chips (H100, TPUv5), quantization, distillation

2026*

$100K - $500K

$0.01 - $0.05

Custom silicon (ASICs), edge inference, sparsity techniques

2028*

<$100K

<$0.01

Breakthroughs in model efficiency, neuromorphic computing


Virtually all computing technologies show such cost declines with time. 

source: Seeking Alpha 


The trend already has been seen for supercomputer cost per cycle, for example. 

ar

Supercomputer

Cost per FLOP ($/FLOP)

Peak Performance (FLOPS)

1960s

IBM 7030 ("Stretch")

~$1

~100 MFLOPS

1980s

Cray-1

~$0.10

~100 MFLOPS

1997

ASCI Red

~$0.001

~1 TFLOP

2008

Roadrunner

~$0.0001

~1 PFLOP

2018

Summit

~$0.00001

~200 PFLOPS

2022

Frontier

~$0.000001

~1.2 EFLOPS

2026*

TBD (Projected)

<$0.0000001

~10 EFLOPS


No comments:

We All Believe Computing is Prodcutive, But Struggle to Measure It

Though virtually everybody would agree that computing technologies are useful, enabling and productivity-enhancing, we still find it difficu...