Saturday, February 15, 2025

"Chain of Thought" Reasoning Requires 10X the Compute of Simple Inference

One reason some observers are a bit skeptical about DeepSeek cost claims (training and inference) is that the architecture, including use of "chain of thought" for inference, compared to the"simple inference" models used by ChatGPT, for example, require more processing, not less. 


Granted, chain of thought models models break down problems into smaller steps, showing the reasoning at each step. So one might argue each of the smaller steps require less processing. On the other hand, more processing steps must occur. 


So chain of thought models require more processing power and time compared to simple inference approaches. On the other hand, CoT is viewed as better for more-complex problems.


source: Cerebras Systems 


Still, the CoT "more processing" profile is hard to square with claims that DeepSeek requires less compute.


No comments:

Some Small Business Owners Believe AI is Enabling "Do It Yourself" Alternatives That Cost Them Revenue

Small business owners believe AI is costing them business , and though that perception might be anecdotal and perhaps incorrect, it is what ...