Tuesday, September 16, 2025

complexity and AI: why LLMs succeeded where generative linguistics failed

Chomsky's approach to linguistics has effectively been disproven by AI: 1. general learning is enough for language (no innate language skill necessary), 2. language is fundamentally scruffy (not neat), 3. language is enough to learn language (no grounding)."

-- Joscha Bach, Nov. 6, 2024 on X

In the quote above, Bach focuses on confirmatory evidence -- what AI and humans can do rather than what they can't --  missing both the key argument in favor of Chomsky's computational-generative program and that program's argument against neural network LLMs as a natural language model. His comment may also be exaggerating the success of neural networks, but let's set that aside. 

Focusing on confirmation while ignoring discomfirmatory evidence or counterfactual support is quite common in human reasoning. And Bach's is a common response to Chomsky: if AI can produce natural language better than a computational model, humans must not need an innate hardwired syntax program. Children could learn all of language simply by bottom-up empirical impressions, no need for a top-down, evolutionary preset computational grammar. 

What Bach and his fellow engineers ignore is the limit problem that drives the whole Chomsky model. An account of human language learning does not depend on which languages humans or machines can learn, but which languages, if any, humans cannot learn, and which machines, if any, mechanically, structurally cannot learn those same languages while still being able to learn human languages. If there are such impossible languages and such machines, then it is 1) highly probable that the human language faculty is isomorphic with those machines and 2) any machine that can parse or produce those languages manifestly cannot be related to the human faculty. 

This is why Chomsky repeats over and again, to the puzzlement and frustration of the AI engineering community, that AI technology tells us nothing about the human language faculty. Nothing. 

There's a popular view, apparently held by engineers as well, that technology is a branch of science because technology depends on the discoveries in the sciences. But the goals of the natural sciences and the goals of technology are unrelated. Technologies are devised in order to accomplish practical tasks. The goal of the natural sciences is to understand nature, not to manipulate nature into performing a task. 

Now, you ask, and should ask, are there languages that human children can't learn? Under current ethical conditions, we can't experiment to find out. But we can look at syntactic structures in known languages that speakers of those languages cannot parse. I provide examples below. First, I want to respond to a more pressing question: why has the generative model failed where AI technology has succeeded. 

On this question, Bach holds an important clue. Language is scruffy. Even if it exhibits a core of recursive computational structures, it also has a lot of vagaries. Worse, the vagaries are not always isolated quirks. They can be recursive and productive, so they behave and look just like algorithmic recursive productive computations. 

In the 1990's, during my doctoral studies, it was pretty clear to me that it was possible and even probable that the data of language included variants that were non structural, mere irregular quirks. In the US, we go to school but "go to the hospital" or "to a hospital". Brits go "to hospital". To conclude from this contrast that there must be a deep syntactic structural difference with reflexes throughout the language seems way too much for a theory to explain. It's more likely a historical accident of semantics that has somehow seeped into one dialect and not the other: for Americans, schooling is an activity, but not hospitaling; "hospital", like "congress", can be treated like a name as Brits do, unlike, say, "the capital", a common noun. But if a linguist can't definitively tell which datum belongs to the syntax machine and which is learnt by the general learning faculty, then all the data are effectively in a black box. 

The clear evidence that there are structures that English speakers cannot parse (see examples below) convinced me that Chomsky's explanatory model was probably right. But because the data were all jumbled together in a black box, I was also convinced that the program of discovering the specifics of the model was doomed to failure -- despite being the right explanation

The sciences are not as straightforward as the public may assume. For example, Darwin's theory of evolution is entirely post hoc. It can't predict any creature, it can only give a kind of tautological post hoc just-so story: if the species exists, it has survived (obviously), and that survival depends on adaptation to the environment, explaining at once the divergence of species and their inheritance from their common origins. A theory, even a great and powerful and pervasive theory, can be right without being predictive. In language, generativism can be right without being able to generate the language. 

And this is why AI has succeeded where the top-down computational has failed. A computational model can only produce by algorithm and can only parse by algorithm. It cannot by itself capture quirks. A behavioral, empirical mimicry machine can learn to mimic vagaries just as well as algorithmic functional outputs. They are all equally outputs from the perspective of a behavioral see-and-do machine. The mimic isn't interested in the source of the behavior or its explanation. There's no causal depth to mimicry and no need for it. Algorithmic behaviors and quirky behaviors are equally just empirical facts to the mime. 

That means neural networks can reproduce the complexity of language -- its idioms, its recursive vagaries, as well as its algorithmic functional returned values -- whereas the top-down model could at best account for the returned functional values. But the top-down computational model can't do even that because the top-down model relies on the restricted data of returned values, and the linguist can't identify with certainty which data are restricted to the faculty's algorithm and which is ordinary human empirical behavioral learning. 

Chomsky's generativism was a Copernican shift for the sciences in the 20th century. He not only blew behaviorism out of the water, he returned thought and the mind back into the sciences and philosophy. Now the zombie spirit of behaviorism in the form of LLMs has been resurrected from the swampy depths to spread again, but this time with a sophisticated information-theoretic learning technology. 

Here's a handful of sentences showing what English speakers can and can't parse:

a. The drunk at the end of the bar wants to talk to you. (English speakers can parse this easily)

b. The drunk on the chair with three legs at the end of the bar wants to talk to you. (again, it's easy to parse that it's the chair that's at the end of the bar, not the legs at the end, and not a drunk with three legs)

c. The drunk on the chair with three legs, in the suit with the stripes, at the end of the bar wants to talk to you. (again, easy to parse that it's the drunk here in the suit)

d. The drunk at the end in the suit of the bar wants to talk to you. (not just hard, but impossible even if you know that it's intended to mean "the drunk in the suit" and "the end of the bar")

There are no gradations in difficulty. a-c are automatic, d, impossible even though it's simpler than b or c! That's a reliable sign of machine operation. And in fact a push-down automaton can produce a-c, just as English speakers can. A push-down mechanically can't parse d and neither can you. That is the counterfactual argument for generative grammar: the limit tells us what kind of machine the brain uses. A neural network LLM learning machine can produce d if it is trained on sentences of this kind. Therefore LLMs tell us nothing about why d is impossible for English speakers. LLMs are a successful technology, not a scientific explanation. 

The counterfactual above is not a proof, it's just support. It would be proof if an LLM actually produced a sentence with the structure of d, or if a child learned to produce or parse such structures. Either of those would be proof. But Bach's 1-3 don't address the counterfactual at all. They are mere excitement over a technology that can do anything. That's to miss the scientific goal entirely. 

Even a behavioral test like the Turing test would distinguish an LLM from human intelligence, since the LLM could perform what humans can't. It's the weakness of the Turing test that it tests for how stupid the machine must be, or appear to be, to emulate humans. It's ironic, but not surprising, that it would be used as a test of intelligence. Not surprising, because Turing's test reflects the despairing limitations of behaviorism prior to Chomsky's 1956 Syntactic Structures. 

No comments:

Post a Comment