Evernight
I also used to strongly believe that LLMs are merely stochastic parrots, i.e., statistical models that solely follow a probability distribution to predict the next token of a given input. Yet, I am still amazed by the capabilities and recent innovations introduced by transformer based AI models. It is still quite obscure to me how LLMs are able to create content that look seemingly eloquent as if the hidden machinery was genuinely understanding a prompt. In spite of those recent innovations, the familiar argument of the "chinese room" is still used. It goes as follows:
Imagine a person who does not speak Chinese is locked in a room with a set of rules and a large batch of Chinese characters. The person is given a piece of paper with a Chinese sentence written on it, and they follow the rules to produce a response in Chinese. The person does not understand the meaning of the Chinese characters, but they are able to produce a response that is grammatically correct and even clever.
The question is: Does the person in the room truly "understand" the meaning of the Chinese characters? The answer is of course no, because the person does not have any comprehension of the language or the meaning of the characters.
Now LLMs process and generate text by manipulating symbols (words, characters, etc.) according to the rules of the model. They don't truly "understand" the meaning of the text; they're simply rearranging symbols to create a coherent-seeming output. They are simply trained on vast amounts of data, which can lead to overfitting and memorization. This often results in the model producing responses that are statistically likely to be correct but lack true understanding.
It is, of course, clear that machine learning models do little similar to what humans have. Scientific or statistical models, for example, might correctly predict outcomes in some specific domain, but they don't "understand" it. A human might know how orbital mechanics works, the same way an algebraic expression or computer program can predict the positions of satellites, but the human's understanding is different. Still, I will adopt a functionalist view of this question, since the internal experience of models, or lack thereof, isn't especially relevant to their performance.
Still, can a stochastic parrot understand? Is having something that functions like a model enough? Large language models (LLMs) do one thing: predict token likelihoods and output one, probabilistically.
But if an LLM can explain what happens correctly, then somewhere in the model is some set of weights that contain information needed to probabilistically predict answers. A LLM doesn't "understand" that Darwin is the author of On the Origin of Species, but it does correctly answer the question with a probability of X% - meaning, again, that somewhere in the model, it has weights that say that a specific set of tokens should be generated with a high likelihood in response. (Does this, on its own, show that it knows who Darwin was or what evolution is? Of course not. It also doesn't mean that the model knows what the letter D looks like. But then, neither does asking the question to a child who memorized the fact to pass a test.)
https://arxiv.org/abs/2310.02207
For example, this paper argues that language models contain a geographical model of the world (in terms of longitude and latitude), in addition to temporal representations of when events have occurred.
They use linear probes to find these representations and argue that the representations are linear because more complicated probes don't perform any better.
They also look for neurons with similar weights as the probes to show that the neuron actually uses the representations.