Whatever the ultimate resolution of the claimed DeepSeek training and inference cost, ways of cutting inference and training coss already were happening, and always were expected. DeepSeek has been an unexpected element of the time table, to be sure (assuming the cost advantages prove to be sustainable).
ear | Training Cost (Per Billion Parameters) | Inference Cost (Per 1M Tokens) | Key Cost Drivers |
2020 | $10M - $20M | $1.00 - $2.00 | Expensive GPUs, early transformer models |
2022 | $2M - $5M | $0.20 - $0.50 | Hardware efficiency (A100, TPUv4), model optimizations (Mixture of Experts) |
2024 | $500K - $2M | $0.05 - $0.20 | Advanced chips (H100, TPUv5), quantization, distillation |
2026* | $100K - $500K | $0.01 - $0.05 | Custom silicon (ASICs), edge inference, sparsity techniques |
2028* | <$100K | <$0.01 | Breakthroughs in model efficiency, neuromorphic computing |
Virtually all computing technologies show such cost declines with time.

source: Seeking Alpha
The trend already has been seen for supercomputer cost per cycle, for example.
ar | Supercomputer | Cost per FLOP ($/FLOP) | Peak Performance (FLOPS) |
1960s | IBM 7030 ("Stretch") | ~$1 | ~100 MFLOPS |
1980s | Cray-1 | ~$0.10 | ~100 MFLOPS |
1997 | ASCI Red | ~$0.001 | ~1 TFLOP |
2008 | Roadrunner | ~$0.0001 | ~1 PFLOP |
2018 | Summit | ~$0.00001 | ~200 PFLOPS |
2022 | Frontier | ~$0.000001 | ~1.2 EFLOPS |
2026* | TBD (Projected) | <$0.0000001 | ~10 EFLOPS |
No comments:
Post a Comment