Despite the occasional large language model hallucination (an instance where the model generates output that is factually incorrect or nonsensical), most of us likely are more than a little shocked that the applied mathematics actually works as often as it does.
Despite the fact that we can call generative artificial intelligence a form of “intelligence,” LLMs actually do not “think” at all. In fact, LLMs are powerful statistical and probability engines. Having been trained on human language and language relationships, they infer (probabilistically) what the next word in a sentence should be.
That should sound almost crazy, but that is the actual process, and illustrates one important reason computing is so powerful: the machines can process fantastically-huge amounts of data very quickly. In the case of LLMs, they are essentially trained on the sum total of human knowledge as principally found on the global internet
If the input to the LLM is ambiguous or lacks sufficient context, the model may make incorrect assumptions and generate inaccurate information.
But, basically, an LLM is applied math, a probability engine, working with “tokens” (think of tokens as “bytes”) that are essentially letters or punctuation. Models then predict the next individual characters in a sentence or group of sentences (for text operations).
For visual analysis, image LLMs typically deal with visual elements that include “patches” (the image is divided into small, overlapping patches and each patch is a token). Or, images are analyzed by features such as edges, shapes, colors, and textures that are treated as tokens.
In other cases, an image is analyzed to identify and locate objects within it that are considered tokens.
For music-focused large language models, tokens represent structured, interpretable chunks of musical information such as notes, chords, rhythmic patterns, beats, pitch and time intervals.
The point is that all LLMs are predictive and probabilistic when providing output or analyzing input. What almost should shock us is how often LLMs “get it right” when doing so.
No comments:
Post a Comment