Sometimes “more” can be “less.” It might be akin to the U.S. Navy SEAL aphorism about slow is smooth; smooth is fast. In other words, being frantic often is not “fast.” In the same way, large language model use of chain of thought for prompting architecture (and reasoning) is a related example of using “more” processing to produce results for complicated queries using “less” total computation.
One reason some observers are a bit skeptical about DeepSeek cost claims (training and inference) is that the architecture, including use of "chain of thought" for inference, compared to the"simple inference" models used by ChatGPT, for example, require more processing, not less.
Chain of thought models break down problems into smaller steps, showing the reasoning at each step. So one might argue the smaller steps require less processing. On the other hand, more processing steps must occur.
So chain of thought models require more processing power and time compared to simple inference approaches, at least for uncomplicated queries. On the other hand, the obverse can be true for very-complex queries, one might argue. CoT might well succeed with a very-complex query, faster and more efficiently than a “simple inference” architecture might.
On the other hand, aCoT approach to prompt engineering also means there is no model fine-tuning required when adapting a model for a new set of tasks, in large part because standard prompts embed reasoning. In other words, CoT’s key concept is that by providing a few examples (or exemplars), where the reasoning process is explicitly shown, the LLM learns to include reasoning steps in its responses.
The point is that, in this case, more processing might well mean model training advantages (no fine-tuning required) but possibly also the ability to solve more-complex problems with less compute. Paradoxically, simpler compute, used to solve more-complex problems, might require less total compute.
Also, CoT enables smaller model sizes. Smaller models can be executed on less-capable hardware and platforms, for example, enabling a “lower cost” approach to model use. “Less” hardware also is possible. All of that has cost implications.
The key point is that CoT can lead to lower costs, even when requiring “more processing” for queries.
CoT prompting enables smaller models to perform complex reasoning tasks that were previously only possible with much larger models. So smaller, more efficient models can be used for tasks that once required massive language models.
Also, CoT reasoning capabilities can be transferred from larger models to smaller ones through knowledge distillation. For example, fine tuning a T5 XXL model (11 billion parameters) on CoT outputs from a much larger PaLM-540B model improved its accuracy on complex math problems from 8.11% to 21.99 percent.
CoT prompting can improve model performance across various tasks without the need for task-specific fine-tuning. This reduces the computational resources required for adapting models to new tasks.
By breaking down complex queries into smaller, manageable steps, CoT allows models to solve problems more efficiently. This can potentially reduce the overall computational load compared to attempting to solve the entire problem at once.
The ability to transfer CoT reasoning capabilities from larger to smaller models suggests that we may be able to create more compact, efficient models that retain advanced reasoning abilities of larger (and more expensive) models.
Slow is smooth; smooth is fast. More can be less.
No comments:
Post a Comment