Thursday, August 3, 2023

In Some Cases, Small Language Models Will be Needed

Though most of the present attention related to generative AI is focused on large language models such as ChatGPT, Bard and others, small and medium language models might be more interesting for quite a large number of mobile device suppliers, application providers and end users, primarily because they can be executed using a relatively smaller amount of data. 


Of course, “smaller” does not mean “small.” Right now, the general state of the art is that a large language model requires ingesting billions of words, while a small language model might only require “millions.” But lots of small entities might not even boast web-accessible content amounting to millions of words. 


Ignoring the matter of a sufficient critical mass of words to ingest, processing of small datasets, though perhaps less costly than required by very-large data sets, is probably still cost prohibitive for small entities.


So what approaches might work? Right now, Transfer learning takes an already created model for one task and uses it to train for a different task. Such “pre-trained” models might, or might not, be a precise fit for the new task. So greater imprecision will be an issue.  


Pre-trained models can be fine tuned, but that requires specialized information technology knowledge that a smaller entity might have to pay for. Such customization might put generative AI out of reach, financially, for small entities. 


Another way that language models might be able to use generative AI without millions of words of content is to use “few-shot learning,” some argue. 


Few-shot learning is a technique where a model is trained on a small number of examples. Using data augmentation, new examples are created by transforming existing examples. Again, the issue is precision. 


As with earlier versions of business technology, versions scaled for small or mid-sized businesses might be de-featured and therefore less expensive, focused on answering more simple questions related to customer-facing use cases. 


Such small language models might be important for other reasons. A possible developing set of applications and use cases might only require small and medium language models might enable distributed AI conducted by devices, on devices and use only small or mid-sized language models appropriate for industry-specific or firm-specific use cases where the data sets are bounded and smaller. 


Small language models, which cost less to train, might be appropriate for personalized recommendations, in-app translation or customer support specifically geared to a smaller company’s customers or products, for example. 


At least in principle, lighter weight models might be able to support on-device personalized content generation, real-time translation, some forms of creative content or chatbot and personal assistant functions. 


And even such lighter-weight use cases might still require off-device data stores to a large extent, even if actual processing is onboard.


So, eventually, some new business models might develop that focus on small language models aimed at smaller business users and devices.


No comments:

Have LLMs Hit an Improvement Wall, or Not?

Some might argue it is way too early to worry about a slowdown in large language model performance improvement rates . But some already voic...