Researchers looking at Antropic’s Claude language model 3.5 Haiku find evidence that Claude sometimes thinks in a conceptual space that is shared between languages, suggesting it has a kind of universal “language of thought.”
Perhaps more notably, researchers say Claude “will plan what it will say many words ahead, and write to get to that destination,” which is an extension of the basic language model function of predicting what the next word in a sentence might be.
“Language models are trained to predict the next word, one word at a time,” researchers note. “Given this, one might think the model would rely on pure improvisation.”
“However, we find compelling evidence for a planning mechanism,” they say. “First, the model uses the semantic and rhyming constraints of the poem to determine candidate targets for the next line. Next, the model works backward from its target word to write a sentence that naturally ends in that word.”
Specifically, the model often activates features corresponding to candidate end-of-next-line words prior to writing the line, and makes use of these features to decide how to compose the line.
“We show this in the realm of poetry, where it thinks of possible rhyming words in advance and writes the next line to get there,” researchers note. “This is powerful evidence that even though models are trained to output one word at a time, they may think on much longer horizons to do so.”
All that matters as we try to document and lay bare the “thought” processes used by Claude and other language models. A language model might be a “black box,” but we still need to understand what happens inside the box.
No comments:
Post a Comment