Recursive training on synthetic data, often referred to as "model collapse," is a significant challenge for artificial intelligence models. It is a point Apple researchers made recently. A paper entitled The Illusion of Thinking suggests reasoning language models underperform standard models for simple problems; show some advantage for medium-complexity tasks but collapse under the weight of high-complexity tasks.
“We acknowledge that our work has limitations,” the authors note. “While our puzzle environments enable controlled experimentation with fine-grained control over problem complexity, they represent a narrow slice of reasoning tasks and may not capture the diversity of real-world or knowledge-intensive reasoning problems.”
Some might note that the problem of complex problem processing is not new. In machine learning, recursive training generally refers to a process where a model is trained, and then its own outputs (or outputs from a previous version of itself) are used as part of the training data for subsequent iterations of the model.
This creates a feedback loop where the model is essentially "learning from itself" or from other models that have similarly learned from generated data.
This leads to a degenerative process where errors compound, and the model's outputs become increasingly narrow, repetitive, and nonsensical.
As always, developers and architects have methods for reducing such distortions.
The most crucial strategy is to retain a significant portion of original, human-generated, "real" data. By some estimates, even a small percentage of real data (10 percent) can significantly slow down or prevent model collapse.
Continuously introducing new, diverse real data into the training pipeline can counteract the degradation caused by synthetic data.
Architects also can implement robust processes to verify the quality and accuracy of synthetic data before it's used for training.
Also, architects can focus on creating synthetic data to address specific model "blind spots" or underrepresented scenarios in the real data.
Architects can also try to ensure that the synthetic data generated is diverse and representative of the desired data distribution, rather than simply mimicking the most common patterns.
Human feedback also is helpful, and could involve humans evaluating the quality of generated data and providing guidance for the next generation.
Training can use regularization methods (L1/L2 regularization, dropout) during training to prevent overfitting and encourage the model to learn more robust representations. Reinforcement Learning with Human Feedback (RLHF) can be used to align the model's outputs with human preferences, effectively guiding the generation process towards more desirable and accurate results, even when using synthetic data.
The core principle to combat model collapse is to ensure that the model always has access to a reliable source of "ground truth" information, whether through direct inclusion of real data, or through careful curation and validation of synthetic data.
No comments:
Post a Comment