Tuesday, September 16, 2025

complexity and AI: why LLMs succeeded where generative linguistics failed

Chomsky's approach to linguistics has effectively been disproven by AI: 1. general learning is enough for language (no innate language skill necessary), 2. language is fundamentally scruffy (not neat), 3. language is enough to learn language (no grounding)."

-- Joscha Bach, Nov. 6, 2024 on X

Disproven?

In the quote above, Bach focuses on confirmatory evidence -- what AI and humans can do rather than what they can't --  missing both the key argument in favor of Chomsky's computational-generative program and that program's argument against neural network LLMs as a natural language model. Bach's comment may also be exaggerating the success of neural networks, but let's set that one aside. 

Focusing on confirmation while ignoring discomfirmatory evidence or counterfactual support is quite common in human reasoning. And Bach's is a common response to Chomsky: if AI can produce natural language better than a computational model, humans must not need an innate hardwired syntax program. Children could learn all of language simply by bottom-up empirical impressions, no need for a top-down, evolutionary preset computational grammar. 

What Bach and his fellow engineers ignore is the limit problem that drives the whole Chomsky model. An account of human language-learning does not depend on which languages humans or machines can learn, but which languages, if any, humans cannot learn, and which machines, if any, mechanically, structurally cannot learn those same languages while nonetheless being able to learn human languages. If there are such humanly impossible languages and such machines, then it is 1) highly probable that the human language faculty is isomorphic with those machines and 2) any machine that can parse or produce those languages manifestly cannot be a model of the human faculty, much less explain or even illuminate it. 

This is why Chomsky repeats over and again, to the puzzlement and frustration of the AI engineering community, that AI technology tells us nothing about the human language faculty. Nothing. 

There's a popular view, apparently held by engineers as well, that technology is a branch of science because technology depends on the discoveries in the sciences. But the goals of the natural sciences are not the goals of technology. Technologies are devised in order to accomplish practical tasks, often market-driven tasks or military ones. The goal of the natural sciences is to understand nature, not to manipulate nature into performing a task. The goal of understanding nature includes explaining what nature doesn't or can't do. That's not the task of technology, and one reason why AI alignment is such a challenge: there is no AI natural selection learning limiting itself to human subserviance and harmlessness. For the natural sciences, those limits need to be explained. 

Now, you ask, and should ask, are there languages that human children can't learn? Under current ethical conditions, we can't experiment to find out. But we can look at syntactic structures in known languages that speakers of those languages cannot parse. I provide examples below. First, I want to respond to a more pressing question: why has the generative model failed where AI technology has succeeded. 

On this question, Bach holds an important clue. Language is scruffy. Even if it exhibits a core of recursive computational structures, it also has a lot of vagaries. Worse, the vagaries are not always isolated quirks. They could be recursive and productive, so they could behave and look just like algorithmic recursive productive computations. 

In the 1990's, during my doctoral studies, it was pretty clear to me that it was possible and even probable that the data of language included variants that were non structural, mere irregular quirks. In the US, we say "go to school" but "go to the hospital" or "to a hospital". Brits say "go to hospital". To conclude from this contrast that there must be a deep syntactic structural difference with reflexes throughout the language seems way too much for a theory to explain. It's more likely a historical accident of semantics that has somehow seeped into one dialect and not the other: it could be that for Americans, schooling is an activity, but not hospitaling and this is being reflected in the dialect; "hospital", like "congress", can be treated like a name as Brits do, unlike, say, "the capital", a common noun. But if a linguist can't definitively tell which datum belongs to the syntax machine and which is learnt by the general learning faculty, then all the data are effectively in a black box. 

The clear evidence that there are structures that English speakers cannot parse (see examples below) convinced me that Chomsky's explanatory model was probably right. But because the data were all jumbled together in a black box, I was also convinced that the program of discovering the specifics of the model was doomed to failure -- despite being the right explanation

As long as the evidence is behavioral (sentences), the model will be unfalsifiable if the model data are mixed with general learning data. Judgment will have to wait for psycholinguistic or neurological experiment to sift out the innate machine recursions from the general learning ones. 

The sciences are not as straightforward as the public may assume. For example, Darwin's theory of evolution is entirely post hoc. It can't predict any creature, it can only give a kind of tautological post hoc just-so story: if the species exists, it has survived (tautologically obvious), and that survival depends on adaptation to the environment (that's the post hoc just-so story), explaining at once the divergence of species and their inheritance from their common origins (that's the revelation explaining all the data of zoology together with elegant simplicity, settling all the questions that the theological theory left unanswered like, why do all the mammals have two eyes, a spine and a butthole, the last being a particularly difficult and comical challenge to the image of the deity). Physicists can't predict the next move of a chess piece let alone the fall of a plastic bag. That's one reason to question pundit economists.  A theory, even a great and powerful and pervasive theory, can be right without being predictive. In language, generativism can be right without being able to generate the language. 

A technology, however, must work at least well enough so the bridge doesn't fall. If that means it's wise to use girders stronger than the science prescribes, you use them no questions asked. Or rather, the only question to ask would be costs. Note that such costs are utterly irrelevant to the value of a science, where the only "cost" is the simplicity of the theory (more accurately the probability, see the post on entropy and truth). You see the difference. Tech is about practicalities of an innovation in a realm of action, science about understanding what's already existing

And this is why AI neural network (LLM chatbots) program has succeeded where the top-down computational program has failed. A computational model can only produce by algorithm and can only parse by algorithm. It cannot by itself capture quirks without adding ad hoc do-dads to handle each quirky case, not a theory but a cat for every mousehole. A behavioral, empirical mimicry machine can learn to mimic vagaries just as well as algorithmic functional outputs. They are all equally outputs from the perspective of a behavioral monkey-see-monkey-do machine. The mimic isn't interested in the source of the behavior or its explanation. There's no causal depth to mimicry and no need for it. Algorithmic behaviors and quirky behaviors are equally just empirical facts to the mime. 

This is not to disparage the methods AI's use to "learn" behaviors and generate like behaviors. Neural networks are an amazingly successful technology. And it may even be used by scientific research to understand language. But it is too successful to stand as a model of human language learning or, possibly, human intelligence as well. 

So even though, or actually because, neural networks are restricted to a shallow, non causal, impoverished empiricism, they can accurately reproduce the full range and complexity of language -- its scruffy idioms and recursive vagaries, as well as its algorithmic functional returned values -- whereas the top-down model could at best account for the returned functional values. But a top-down computational model can't do even that because the top-down model relies on the restricted data of returned values, and the linguist can't identify with certainty which data are restricted to the faculty's algorithm and which is ordinary human empirical behavioral learning. 

There's an important Hayekian lesson here for social planning and economic policy: policy and planning algorithms are too narrow to predict social behaviors. But learning machines might not be much better since they are trained post hoc. 

Chomsky's generativism was a Copernican shift for the sciences in the 20th century. He not only blew behaviorism out of the water, he returned thought and the mind back into the sciences and philosophy. Now the zombie corpse of behaviorism in the form of LLMs has been resurrected from its swampy depths to walk again, but this time with a sophisticated information-theoretic learning technology. 

None of this top-down computational modeling relies on or even mentions consciousness. It's only a question of whether humans compute or merely mimic with the exceptional range and sophistication of AI mimicry, ascending and descending gradients, and such Markov walks. The answer is, that the means AI uses are too powerful to explain human language and maybe human learning in general. 

As promised above, here's a handful of sentences showing what English speakers can and can't parse:

a. The drunk at the end of the bar is finally leaving. (English speakers can parse this easily)

b. The drunk on the chair with three legs at the end of the bar wants to talk to you. (again, it's easy to parse that it's the chair that's at the end of the bar, not the legs at the end, and not a drunk with three legs)

c. The drunk on the chair with three legs in the suit with the stripes at the end of the bar wants to talk to you. (again, easy to parse that it's the drunk here in the suit) 

d. The drunk at the end in the suit of the bar wants to talk to you. (not just hard but impossible even if you know that it's intended to mean "the drunk in the suit" and "the end of the bar") 

and it's not because "of" requires proximity: 

 e. The drunk on the chair in the T-shirt with three legs wants to talk to you. (no way to get the three-legged chair)

[Scroll down to the bottom for a mechanical diagram of the cross relation problem.]

There are no gradations in difficulty. a-c are automatic, d&e impossible, even though they're simpler than b or c (!) and even if you're told what they're supposed to mean!! That's a reliable sign of machine operation -- a structural failure. And in fact a push-down automaton can produce a-c, just as English speakers can. A push-down mechanically can't parse d or e and neither can you. 

That is the counterfactual argument for generative grammar: the limit tells us what kind of machine the brain uses. A neural network LLM learning machine can produce d and e if it is trained on sentences of this kind. Therefore LLMs tell us nothing about why d and e are impossible for English speakers. LLMs are a successful technology, not a scientific explanation. LLMs are too successful to be informative of natural language learning. 

The counterfactual above is not a proof, it's just support. It would be proof if an LLM actually produced a sentence with the structure of d and e. And that it doesn't, may mean only that it hasn't encountered it in its training. If a child learned to produce or parse structures like d and e, these sentences would be less interesting. Any of these results would be relevant to our understanding of human learning and LLM learning. But Bach's 1-3 don't address the counterfactual at all. They are mere excitement over a technology that can do anything and everything. That's to miss the scientific goal entirely. In other words, the jury is still out on Chomsky's model, pace Bach. 

Even a behavioral test like the Turing test would distinguish an LLM from human intelligence, since the LLM could perform what humans can't. It's the weakness of the Turing test that it tests for how stupid the machine must be, or appear to be, to emulate humans. It's ironic, but not surprising, that it would be used as a test of intelligence. Not surprising, because Turing's test reflects the despairing limitations of behaviorism prior to Chomsky's 1956 Syntactic Structures. 

Here are diagrams that make it clearer what the machine structure can't do:

[The red X below indicates the cross relation that neither humans nor "push-down" automata can parse.]



No comments:

Post a Comment