Friday, November 21, 2025

the lesson of AI

If there's a lesson to learn from LLMs, it's that humans don't think and aren't intelligent. 

Every time I hear "AI is intelligent!" I feel like Andersen's little boy in the Emperor's New Clothes. "Don't you see: just because AI can do what we do doesn't mean AI can think. It means we don't think, people!" 

It's not that AI means we're merely thinking meat machines. We already knew that. It's that what amounts to thinking in us is unintelligent, lame mimicry, just like AI.

I've written a lot on this blog about the computational character of language (respect to Chomsky) and the algorithmic nature of words and ideas (respect to Plato and Kant). But I don't really believe it. I write it because I wish to believe it. Because I wish to believe that we humans think intelligently, and because I want to believe that Wittgensteinian behaviorism just can't be so. Because it's just too vacuous. But the sad truth seems to be that we don't think intelligently. We "think" by picking up habits of thought and expectation, often causal stories that we don't bother to question. AI's reinforce, reinforce reinforce means for behavioral psych confirm, confirm, confirm, never think. 

Here's a few examples: 

When asked which is more likely, that San Francisco will be under water in 2035 or in 2035 a great earthquake will sift and shuffle the soil beneath San Francisco and the earthquake will send the mother of all tsunamis over the city and it will slip under water, people choose the latter as more likely than the former scenario. They don't think to themselves that the first scenario could have been the result of an earthquake and tsunami but also global warming's sea rise or North Korea lobbying a nuke nearby or a meteor -- you get the picture. Logically the more general story is less particular, so it's more likely. But people look for a familiar story with a familiar explanation. that's not thinking. That's mere mimicry, just like a neural network! (Now, the presentations of the two scenarios are loaded -- "It will be under water" implicates that it's under water for no reason, that we trust that what we're being told is the whole truth and nothing but. That just shows we prioritize trust more than our own thinking.)

I ask why do we hold a door open for others behind? Everyone I've ever asked this responds with a positive cause: it's polite, it's helpful (even though people open doors by themselves regularly and often the people behind feel compelled to hurry a bit so as not to inconvenience the door-holder -- that is to say, by holding the door you're actually inconveniencing them by compelling theme to hurry). Never has anyone thought, 'what if I didn't hold the door open? Then I'd be slamming the door in front of someone right behind me. They'd think I'm a dick. Ah, that must be why I hold the door open. It's to avoid being judged a dick. It's all virtue signaling.' And isn't that the essence of morality? In a social species, approval-seeking is the glue that keeps us all together. That's an important bit of self-understanding and species understanding. But no one thinks that one through. Because we don't think. We just mimic. Hold the door because ... polite? helpful? nice? We miss it all for the lack of a little thinking. 

Same with free speech. I ask what's the benefit of having freedom of speech if the market of ideas never persuades anyone? And everyone knows that the NYTimes reader is uninfluenced by Fox News, and vice versa, so what's the point? The response I get is never "What if we had no such freedom? How would it be enforced? Would we have to lie to each other on pain of punishment? Wouldn't we all know that we're all lying to each other? How could we ever trust anyone? The point of language would be utterly defeated. Why converse at all? How could a social species even survive without trust in shared information?" No one asks this. 

I see a long literature advising on how to overcome bias, especially confirmation bias (grasping for evidence in favor of one's beliefs or what one wants to believe) or myside bias (attacking evidence against what one already believes or wants to believe). Always top of the list is "be humble". But what does this mean more than "don't be attached to your beliefs"? It's question-begging. "Humble" is just another word for "don't be so biased in favor of your beliefs." If you're looking for a way to free yourself of your biases, how is "so free yourself of your biases, bro" thoughtful advice? It's nothing but a virtus dormitiva. That's monkeying, not thinking. And yet we set great value in such advice. "Be humble" -- what a useless, stupid piece of advice. It's just words, empty words. 

We don't think. We mimic with familiar narratives and habits, and monkey with metaphors and analogies. That's not thinking. That's repeating. 

Psychological anthropology got this right with schema-theory  in the 1990's. The idea is that our interaction with the world -- experience -- provides each of us with little schemas (schemata?) of how things work. That's how and what we learn. One consequence of this schema theory is that we're learning causal stories, not logical entailments. The theory explains why modus tollens is really hard for us if there's no causal story between antecedent and consequent: "If residents are leaving California, Ukraine is defeating Russia. Ukraine is not defeating Russia, therefore residents are not leaving California." That deduction is hard for us. But "If residents leave California, rents there will lower. Rents are not lowering, therefore residents are not leaving" is an easy deduction for us to grasp, even if we think the premise is wrong. Logic takes work. 

Similarly with language. In the post on why AI succeeded where generative linguistics failed, there's a demonstration of how generative syntax allows us to easily parse extremely complex relations with simple recursive functions of machine syntax, the machine in Brocas area of your brain. It's mechanically churns it out quickly and easily. What I didn't mention there is that semantic or logical functions are not syntactic, are not mechanical and take a lot of effort. It's not the case that it's not true that it's not the flat earthers who don't believe the earth is not round because they don't understand the science, it's the round earthers. Honestly I don't know how to parse that sentence. To parse it I'd have to count the negatives and apply the simple logical arithmetic: even negatives=positive, else negative. Counting arithmetically is not mechanical for humans. We have to learn it. same with logic. Yet any string of recursive prepositional phrases, even embedded ones, are easily and quickly parsed because prepositional phrases are syntactic functions. So just because we have generative capacity for language doesn't mean we think intelligently. We don't. 


the obvious evidence that propaganda persuades no one

 Is the Fox News watcher persuaded by what the NYTimes says? Not at all. Whatever the NYT presents only confirms the Fox watcher's beliefs about the NYT's bias. Is the NYT's reader persuaded by Fox News? Nope. Propaganda serves to confirm the beliefs that its audience already believes or wants to believe. Propaganda does not persuade. Far from it: it confirms the negative judgments about the out-group. Propaganda serves to confirm one's conviction that the out-group is morally and informationally defective and dangerous.  

Two facts emerge: propaganda doesn't persuade, and political polarization belies the post modernist and the Marxist assumption that the culture has a single discourse of power. There's no systemic belief structure directing all minds. Liberal democracy has a structure of mutually out-grouping, with the in-group significantly determined by accepting anything the out-group rejects, all the while developing and innovating means of rejecting the Other. 

In this innovation the Right has been particularly fecund. It used to be the conservatives who were stuck in the mud of the past. Now the Right is full of reinvention of natioalisms and conspiracy theories, while the Left is stuck with its Enlightenment principles and its self-righteous moral superiority and censoriousness. . 


sunk cost: loss aversion or Bayesian failure?

Loss aversion is a natural selection emotion tied to survival. Loss has a finite bottom boundary -- no bananas means starvation and death -- whereas acquisition has an infinite or unbounded superfluity. No one needs forty billion bananas and using them takes some effort and imagination, like maybe using them for bribes towards world domination. The normal person wants to assure herself first that there's something to eat tonight. World domination later, if we're still interested after dinner.

So sunk cost fallacy is an emotional attachment to what's been spent. But it is also a failure of Bayesian analysis of time. So you stay in the movie theater not only because you don't want to throw away the ticket you spent money on, but also because that emotional attachment -- loss aversion -- has blindt you to the time outside the theater. The ticket has focused on the loss rate of leaving: 100% of the next hour will be lost. But that's forgetting all the value outside the theater. 

This Bayesian interpretation predicts that people whose time is extremely valuable -- people with many jobs or jobs that have high returns whether in financial wealth or real wealth (personal rewards) are less likely to stay in the theater. Their focus will be trained on the time outside the theater. The losses will be adjusted for the broader context of the normal. We should expect that the very busy or very productive will be resistant to the fallacy. 

Of course, there's also the rich who don't worry about throwing a ticket away, the marginal value of which money is low or worthless. But overall, the sunk cost fallacy should occur only with people who have time to waste, whose time is not pressingly valuable. The sunk cost fallacy may be an arithmetic fallacy of focus, not just an evolutionary psychology of risk-aversion. 

Freud and Haidt got it backwards: the unconscious is rational; the conscious mind is not

A friend insists that I'm disciplined, since he sees that I take time every day to work out on the gymnastics bars in the local park. I object: I work out because I enjoy getting out of the house. He concedes that we do what we enjoy without discipline. 

We both have it all wrong. I'm well aware that the opportunity cost of staying at home is, on most days, far greater than the opportunity cost of going to the park and socialize while practicing acrobatics. I know that I need to socialize every day and maintain my strength and agility. But that rational cost-benefit equilibrium never motivates me. I'm comfortable at home, I don't feel energetic enough to brave the cold -- there's any number of reasons to stay home. If I debate with myself over whether to go out, I will stay. The immediateness of laziness -- the comfort of now -- overcomes any rational equilibrium. So how do I get to the park? It's not discipline. I don't even understand what "discipline" means. Is there an emotion of discipline? Is it suppressing one's thinking -- including one's deliberating and second guessing and procrastinating and distracting oneself -- and just do it? 

As mentioned in this post, the unconscious mind's rational intentions will make decisions without consultation with the conscious mind, as long as the conscious mind is distracted. Focus my conscious attention on going out and immediately I'm feeling comfortable and lazy and call up all the reasons to stay home. Think about anything else unrelated, and soon enough it seems time to grab the wool sweater and go. Sometimes I'll watch myself grab the wool without knowing when I made the decision to grab it. It just happens.

It's the unconscious mind that knows what's best for my long-term goals. It's my conscious mind that's swayed by the emotions of now. Haidt treats emotions as the unconscious mind, sort of following Kahneman. But this rational, disciplined, far-sighted unconscious mind is distinct from the emotions. It's the rational nagging mind of what I know I should do, and that I would do but for the interventions of my conscious, biased, instant-gratification emotions. The emotions are always immediate -- they are feelings and have to be felt in the now. The unconscious mind isn't in the now at all. It's a hidden subterfugeal world of long-term rational sabotages against my conscious will. Freud misplaced the conscience. It's not the superego, it's the subterego, the intuitive fast system that's thinking far ahead, working to keep me well against my will. 

the spirituality paradox

Spirituality often cloaks itself in moral guise as shedding selfishness in favor of embracing the Other whether it be other sentient beings or the world of inanimate phenomena: amor vincit omnia.

The goal of such spirituality is to transcend the self, but the purpose is to improve the self. So, for example, a spiritual cult or movement targets the individual member for the spiritual elevation of that individual. It's not a movement to save the cows and chickens, or preserve pristine nature. It's a movement to bring the individual's self to a higher spiritual state. Saving the chickens is a by-product. In other words, it's a selfish purpose with a selfless goal. 

From my biased perspective it's not merely contradictory and self-defeating (I mean the doctrine is defeating the doctrine -- a doctrine at cross-purposes to itself), but also self-serving, decadent and essentially degenerate. Yes, you have only one life to live, so there's plenty incentive to perfect that life for itself -- I'm down with that, for sure -- but there are billions of others and possibly an infinity of other interests to pursue than this one self. Arithmetically, the others should win, were it not for the infinite value of one's own life. 

But here's the difference: attending to things beyond oneself also perfects or augments one's own meagre life. One path to transcendent enlightenment is studying the Other, instead of limiting oneself to navel-gazing. That's a path towards two infinities added together: the broad study of, say, the psychology of the Other will shed equal light on one's own psychology, while the study of, say, thermodynamics or information theory, will take you far beyond oneself. 

Arithmetically, two infinite series are no greater than one infinite series, but you still can see the advantage of an infinite series within yourself and an infinite series of yourself and all the others', the infinitesimal within plus the infinite outside. 

love-fragility inequality

better to have loved and lost than never to have loved at all??

Romance might be the most wonderful experience in life. also the most precarious. Is the precariousness worse than the wonderfulness is good? Kahneman and Tversky and Thaler and Gilovich tell us that we're more risk-averse than benefit-embracing. The epigraph above must be a fiction. 

It's hard to measure such extreme emotions, but if it's true, as is widely reported, that losing a job is worse than losing a loved one, then maybe romance is an exception to behavioral psychology's "losing is twice as bad as gain is good". So "better to have loved and lost than never to have loved at all" is a good gamble, since there are worse things than losing in romance, but nothing better than loving. 

Tuesday, September 16, 2025

complexity and AI: why LLMs succeeded where generative linguistics failed

Chomsky's approach to linguistics has effectively been disproven by AI: 1. general learning is enough for language (no innate language skill necessary), 2. language is fundamentally scruffy (not neat), 3. language is enough to learn language (no grounding)."

-- Joscha Bach, Nov. 6, 2024 on X

Disproven?

In the quote above, Bach focuses on confirmatory evidence -- what AI and humans can do rather than what they can't --  missing both the key argument in favor of Chomsky's computational-generative program and that program's argument against neural network LLMs as a natural language model. Bach's comment may also be exaggerating the success of neural networks, but let's set that one aside. 

Focusing on confirmation while ignoring discomfirmatory evidence or counterfactual support is quite common in human reasoning. And Bach's is a common response to Chomsky: if AI can produce natural language better than a computational model, humans must not need an innate hardwired syntax program. Children could learn all of language simply by bottom-up empirical impressions, no need for a top-down, evolutionary preset computational grammar. 

What Bach and his fellow engineers ignore is the limit problem that drives the whole Chomsky model. An account of human language-learning does not depend on which languages humans or machines can learn, but which languages, if any, humans cannot learn, and which machines, if any, mechanically, structurally cannot learn those same languages while nonetheless being able to learn human languages. If there are such humanly impossible languages and such machines, then it is 1) highly probable that the human language faculty is isomorphic with those machines and 2) any machine that can parse or produce those languages manifestly cannot be a model of the human faculty, much less explain or even illuminate it. 

This is why Chomsky repeats over and again, to the puzzlement and frustration of the AI engineering community, that AI technology tells us nothing about the human language faculty. Nothing. 

There's a popular view, apparently held by engineers as well, that technology is a branch of science because technology depends on the discoveries in the sciences. But the goals of the natural sciences are not the goals of technology. Technologies are devised in order to accomplish practical tasks, often market-driven tasks or military ones. The goal of the natural sciences is to understand nature, not to manipulate nature into performing a task. The goal of understanding nature includes explaining what nature doesn't or can't do. That's not the task of technology, and one reason why AI alignment is such a challenge: there is no AI natural selection learning limiting itself to human subserviance and harmlessness. For the natural sciences, those limits need to be explained. 

Now, you ask, and should ask, are there languages that human children can't learn? Under current ethical conditions, we can't experiment to find out. But we can look at syntactic structures in known languages that speakers of those languages cannot parse. I provide examples below. First, I want to respond to a more pressing question: why has the generative model failed where AI technology has succeeded. 

On this question, Bach holds an important clue. Language is scruffy. Even if it exhibits a core of recursive computational structures, it also has a lot of vagaries. Worse, the vagaries are not always isolated quirks. They could be recursive and productive, so they could behave and look just like algorithmic recursive productive computations. 

In the 1990's, during my doctoral studies, it was pretty clear to me that it was possible and even probable that the data of language included variants that were non structural, mere irregular quirks. In the US, we say "go to school" but "go to the hospital" or "to a hospital". Brits say "go to hospital". To conclude from this contrast that there must be a deep syntactic structural difference with reflexes throughout the language seems way too much for a theory to explain. It's more likely a historical accident of semantics that has somehow seeped into one dialect and not the other: it could be that for Americans, schooling is an activity, but not hospitaling and this is being reflected in the dialect; "hospital", like "congress", can be treated like a name as Brits do, unlike, say, "the capital", a common noun. But if a linguist can't definitively tell which datum belongs to the syntax machine and which is learnt by the general learning faculty, then all the data are effectively in a black box. 

The clear evidence that there are structures that English speakers cannot parse (see examples below) convinced me that Chomsky's explanatory model was probably right. But because the data were all jumbled together in a black box, I was also convinced that the program of discovering the specifics of the model was doomed to failure -- despite being the right explanation

As long as the evidence is behavioral (sentences), the model will be unfalsifiable if the model data are mixed with general learning data. Judgment will have to wait for psycholinguistic or neurological experiment to sift out the innate machine recursions from the general learning ones. 

The sciences are not as straightforward as the public may assume. For example, Darwin's theory of evolution is entirely post hoc. It can't predict any creature, it can only give a kind of tautological post hoc just-so story: if the species exists, it has survived (tautologically obvious), and that survival depends on adaptation to the environment (that's the post hoc just-so story), explaining at once the divergence of species and their inheritance from their common origins (that's the revelation explaining all the data of zoology together with elegant simplicity, settling all the questions that the theological theory left unanswered like, why do all the mammals have two eyes, a spine and a butthole, the last being a particularly difficult and comical challenge to the image of the deity). Physicists can't predict the next move of a chess piece let alone the fall of a plastic bag. That's one reason to question pundit economists.  A theory, even a great and powerful and pervasive theory, can be right without being predictive. In language, generativism can be right without being able to generate the language. 

A technology, however, must work at least well enough so the bridge doesn't fall. If that means it's wise to use girders stronger than the science prescribes, you use them no questions asked. Or rather, the only question to ask would be costs. Note that such costs are utterly irrelevant to the value of a science, where the only "cost" is the simplicity of the theory (more accurately the probability, see the post on entropy and truth). You see the difference. Tech is about practicalities of an innovation in a realm of action, science about understanding what's already existing

And this is why AI neural network (LLM chatbots) program has succeeded where the top-down computational program has failed. A computational model can only produce by algorithm and can only parse by algorithm. It cannot by itself capture quirks without adding ad hoc do-dads to handle each quirky case, not a theory but a cat for every mousehole. A behavioral, empirical mimicry machine can learn to mimic vagaries just as well as algorithmic functional outputs. They are all equally outputs from the perspective of a behavioral monkey-see-monkey-do machine. The mimic isn't interested in the source of the behavior or its explanation. There's no causal depth to mimicry and no need for it. Algorithmic behaviors and quirky behaviors are equally just empirical facts to the mime. 

This is not to disparage the methods AI's use to "learn" behaviors and generate like behaviors. Neural networks are an amazingly successful technology. And it may even be used by scientific research to understand language. But it is too successful to stand as a model of human language learning or, possibly, human intelligence as well. 

So even though, or actually because, neural networks are restricted to a shallow, non causal, impoverished empiricism, they can accurately reproduce the full range and complexity of language -- its scruffy idioms and recursive vagaries, as well as its algorithmic functional returned values -- whereas the top-down model could at best account for the returned functional values. But a top-down computational model can't do even that because the top-down model relies on the restricted data of returned values, and the linguist can't identify with certainty which data are restricted to the faculty's algorithm and which is ordinary human empirical behavioral learning. 

There's an important Hayekian lesson here for social planning and economic policy: policy and planning algorithms are too narrow to predict social behaviors. But learning machines might not be much better since they are trained post hoc. 

Chomsky's generativism was a Copernican shift for the sciences in the 20th century. He not only blew behaviorism out of the water, he returned thought and the mind back into the sciences and philosophy. Now the zombie corpse of behaviorism in the form of LLMs has been resurrected from its swampy depths to walk again, but this time with a sophisticated information-theoretic learning technology. 

None of this top-down computational modeling relies on or even mentions consciousness. It's only a question of whether humans compute or merely mimic with the exceptional range and sophistication of AI mimicry, ascending and descending gradients, and such Markov walks. The answer is, that the means AI uses are too powerful to explain human language and maybe human learning in general. 

As promised above, here's a handful of sentences showing what English speakers can and can't parse:

a. The drunk at the end of the bar is finally leaving. (English speakers can parse this easily)

b. The drunk on the chair with three legs at the end of the bar wants to talk to you. (again, it's easy to parse that it's the chair that's at the end of the bar, not the legs at the end, and not a drunk with three legs)

c. The drunk on the chair with three legs in the suit with the stripes at the end of the bar wants to talk to you. (again, easy to parse that it's the drunk here in the suit) 

d. The drunk at the end in the suit of the bar wants to talk to you. (not just hard but impossible even if you know that it's intended to mean "the drunk in the suit" and "the end of the bar") 

and it's not because "of" requires proximity: 

 e. The drunk on the chair in the T-shirt with three legs wants to talk to you. (no way to get the three-legged chair)

[Scroll down to the bottom for a mechanical diagram of the cross relation problem.]

There are no gradations in difficulty. a-c are automatic, d&e impossible, even though they're simpler than b or c (!) and even if you're told what they're supposed to mean!! That's a reliable sign of machine operation -- a structural failure. And in fact a push-down automaton can produce a-c, just as English speakers can. A push-down mechanically can't parse d or e and neither can you. 

That is the counterfactual argument for generative grammar: the limit tells us what kind of machine the brain uses. A neural network LLM learning machine can produce d and e if it is trained on sentences of this kind. Therefore LLMs tell us nothing about why d and e are impossible for English speakers. LLMs are a successful technology, not a scientific explanation. LLMs are too successful to be informative of natural language learning. 

The counterfactual above is not a proof, it's just support. It would be proof if an LLM actually produced a sentence with the structure of d and e. And that it doesn't, may mean only that it hasn't encountered it in its training. If a child learned to produce or parse structures like d and e, these sentences would be less interesting. Any of these results would be relevant to our understanding of human learning and LLM learning. But Bach's 1-3 don't address the counterfactual at all. They are mere excitement over a technology that can do anything and everything. That's to miss the scientific goal entirely. In other words, the jury is still out on Chomsky's model, pace Bach. 

Even a behavioral test like the Turing test would distinguish an LLM from human intelligence, since the LLM could perform what humans can't. It's the weakness of the Turing test that it tests for how stupid the machine must be, or appear to be, to emulate humans. It's ironic, but not surprising, that it would be used as a test of intelligence. Not surprising, because Turing's test reflects the despairing limitations of behaviorism prior to Chomsky's 1956 Syntactic Structures. 

Here are diagrams that make it clearer what the machine structure can't do:

[The red X below indicates the cross relation that neither humans nor "push-down" automata can parse.]