Showing posts with label generativism. Show all posts
Showing posts with label generativism. Show all posts

Thursday, May 16, 2024

Why explain?

 Originally published on Lauguage and Philosophy, March 15, 2011

I just read Steven Weinberg’s “Can science explain everything? Anything?” in his Lake Views. Weinberg is a Nobel-prize-winning physicist who writes frequently on science in, among other venues, the NY Review of Books. Here he addresses a basic problem for the philosophy of science: what is an explanation?

Many contend, including some scientists, that science only describes. What purports to be an explanation, they say, is just more description or more general description or broader and more comprehensive description. Others, more contentious still, claim that science should only describe; that any pretense to get beyond description, underneath the phenomena, inside the phenomena, anything but just the phenomena, is philosophy maybe, but not science.

There are, for example, linguists who believe that science must limit itself to the statistical occurrences among words. To them, there is no fixed grammar, only statistical correlations among words. The study of language is the study of those statistical relations only. Positivists, empiricists and behaviorists hold to this purity in their research. It’s admirably spare. For them, language is nothing but the stream of sound and its meaning.

At the other end of the extreme are the linguists who find that the statistical correlations are just the descriptive first step. Discovering the underlying machinery that generates the correlations is the scientific goal. They are looking for the explanatory story, the explanation of the correlations. The empiricits believe that the explanation is unjustified, chimeras. Weinberg asks as well, which explains, the correlations or the discovered machine. Do Newton’s laws explain Kepler or does Kepler explain Newton? Newton derived his laws from Kepler, yet we think Newton explained Kepler. What’s going on?

The romantic scientist dreams of explaining, telling a long and complicated story, full of surprises and apparent digressions that turn out to be essential, a plot that brings all the characters and twists into one simple conclusion. Satisfaction and surprise, that’s what the romantic scientist promises.

I’m a romantic. So I’m going to try my hand in making simplicity out of this troublesome conflict over explanation.

Suppose you’ve got before you a mysterious machine with a screen and a keyboard. When you tap a key a letter appears on the screen. Each keystroke brings a different letter on the screen. What is this thing before you?

The strict empirical positivist will answer with an investigation into the correlation between the keys and the letters. This key to the far left brings up an “a,” the one to its right brings up an “s.” When he’s done, you’ve got a complete inductive account of the keyboard.

Does the “a” key always bring up the letter “a” on the screen? Induction can’t go so far as to prove it, but that is the inductive hypothesis. And if it happens that the “a” fails, then the inductive hypothesis is falsified and the inductive account came to nothing but a statistical probability.

Now, you’re already way ahead of my story. You want to break open the mysterious machine, look at its parts, see how it functions and give an account of why and how the keys relate to what appears on the screen. Why stop at the statistical correlation of the mere phenomena? Don’t we want an explanation of that correlation?

Is that an explanation? Or is it just more description — a deeper description or a description of more stuff related to the correlation? How can science be anything more than just description?

Well, there is a difference between mere description of phenomena and a description that explains. Suppose you’ve figured out the machine and how it appears to work. And suppose now that the “a” stroke suddenly fails to bring up the letter on the screen. Has your hypothesis about the machine failed? Not at all.

When your computer keyboard doesn’t respond, you don’t come to the conclusion that you were wrong all the while about computers: the keys aren’t designed to bring up letters, that’s just a statistical likelihood. Sometimes it works, sometimes it doesn’t. There’s no more to be said.

No; you don’t stop there. When your keyboard doesn’t work, you think: either the keyboard is broken, or the connection is loose, or the software has a bug or the processor has got a virus, or — you know there’s an explanation in that machine. You know where to look. If worse comes to worst, you know where to go for help at the Apple Store. You see, the statistical probability is minimally informative, too minimal to be qualitatively useful. It’s not explanatory. It doesn’t tell you why. It doesn’t tell you anything when the inductive hypothesis fails. You have no reply to its falsification.

When you’ve explained the machine (described how and why the keys work) and the key fails, your hypothesis doesn’t fail now at all. If the key fails, your hypothesis now has an opportunity for counterfactual support. Because according to your hypothesis, if a key fails, there must be a failure in the mechanism’s hard or software. This is the moment of experimentation. You look to find the mechanical failure. If you find it, then you have additional support for your hypothesis.

And you can experiment further. If you can predict how each mechanical piece works, you can fool around with the mechanism and predict how those changes will change the operations. If you succeed, you’ve got more counterfactual support.  Remove this piece, no “a” on the screen. Replace the piece, restore the “a.”

Getting back to the linguist. The positivist, behaviorist linguist objects to the use of made-up sentences that are a hallmark of generative linguists. If language is just the stream of speech — he’s pounding his fist on this one — how can anything be learnt by inventing experimental sentences that have never been said?

Believe it or not, there are schools of linguistics that hold this purism. No experiments. English is speech spoken among those who understand English. The data of English are only utterances from those speakers.

(Note that the empiricist has a chicken and egg problem. How does he know who the English speakers are? But I think even generativists have to face this one too.)

For starters, the empiricist ignores that comprehension of English is just as much a part of English as speech is. Comprehension may not be the same faculty, but it is patently a part of English and it is closely related to the speech faculty, since people who can’t speak English generally can’t understand it either. That correlation is more than just a coincidence.

And if comprehension is part of English, then the comprehension or lack of comprehension of experimental sentences is a datum of the language, even if the sentence has never been spoken. So there is nothing unscientific in making up experimental sentences. At least they tell us something about comprehension. But not least, if comprehension is integral to the language, they tell us about the structure of the language, its speech indirectly, as well as comprehension directly.

What’s more, when you’ve analysed the grammar through the use of counterfactually supportive experimental sentences that define the language, laying out its boundaries, then you can say when an utterance is a fumbled sentence, or an unfinished sentence, or a sentence distracted midway and returned to. The empiricist, relying only on speech can at best identify statistical aberrations. The generativist can say with confidence: that sentence was half finished, it’s not reflective of English as English speakers know and understand it.

That’s just the beginning of explanatory power. But you have to look behind and beyond the correlations of the phenomena. You have to look at what it is that’s generating those correlations. When you’re done, of course you’ve got another description — a description of a generative machine. But that generative machine does something new for your phenomenal correlations. What had been statistical aberrations in a pure but naive view of phenomena now are counterfactual support for what’s really going on, a reality that was not apparent, but which now is now both apparent itself and apparent in the workings of the phenomenon.

addendum

In linguistics it’s generally not possible to open up the machine and look at it. Linguists usually figure out the machine by experimenting with sentences (see the entry “Syntax for the uncertain” below), not by opening up the skull as you might open up a computer to see what’s inside.


Syntax for the uncertain

 Originally published on Language and Philosophy, June 11, 2007

(This entry is for the Chomsky skeptic: the type of long distance relationship prohibited among prepositional phrases provides strong evidence for a generativist view of grammar and a computational view of syntax in the brain.)

Anti-Chomskians have focused their attacks on productivity, claiming that novel syntactic structures are rare. Certainly formulaic utterances are rampant in speech and have justly received much attention recently. Diana Sidtis, who has published widely on formulaic utterances, adds to these schematic utterances — utterance patterns structurally fixed like formulae, but not fixed for content. The claim seems to be that if schemata and formulae dominate speech patterns, the generative element is marginal at best, a mere intuitive capacity largely unused.

Setting aside the question of why humans would have such an unused capacity, this argument ignores the essential duality of the Chomsky program. The goal is not just to generate all the sentences of natural language. It’s to generate all and only the sentences of natural language. It doesn’t just explain novelty and unbounded productivity. The really dramatic, interesting and compelling side of Chomsky’s work from the very outset was the other horn of the bull: discovering one mechanism that will both generate all the sentences yet won’t overgenerate. Generative syntax crucially explains why some extremely simple sentences are unprocessable, even when they contain the same structures as more complex and easy-to-process sentences.

Sometimes I think Chomsky and syntax have garnered so many vitriolic enemies because Chomsky’s original examples were not chosen for pedagogical perspicuousness and the computational origins of generative theory are not consistently taught. So here’s an attempt at pedagogical perspicuity which I hope will convert both agnostics and scoffers-in-good-faith.

Both long distance and local relations are possible for prepositional phrases

You walk into the lobby of the hotel. There are several people sitting at the bar and in the lounge, some in suits. You approach the front desk. The attendant tells you you received a call, using one of these sentences:

1. The guy at the end of the bar in the suit with the stripes on the chair with three legs called.
2. The guy at the end in the suit of the bar called.
3. The guy on the chair with three legs at the bar called.

Notice that sentence (1) is easy to understand even though it is long and complex. I’ve yet to encounter a class of undergrads who didn’t understand it instantly. Yet it contains no less than three pairs of prepositional phrases, each pair holding a local relation within the pair and a long distance relation with the subject of the sentence. So

the chair with three legs

is a noun phrase with a prepositional phrase [with three legs] related directly to [the chair]. It’s the chair that has three legs, not the guy.

On the other hand, the stripes are not on the chair, it’s the guy who is on the chair. So there is no relation in this sentence [the stripes on the chair] even though there is a relation [the chair with three legs].

So these prepositional phrases can relate over long distances to the subject, or they can hold a purely local relationship with the nearest noun phrase. Both long distance and local relations are possible for prepositional phrases.

Some long distance relationships are impossible

But now consider sentence (2). It is a simpler string of words: only three prepositional phrases — yet I have not met any English speaker who can process it to get [of the bar] to relate to [at the end] even though it’s semantically obvious and it’s the only semantic possibility. This sentence is not difficult to process; it is impossible! Even when you know what it’s intended to mean, you still can’t get it to mean that.

And yet, it contains the same prepositional phrases, some with local relationships and some with long distance relationships, in no way different from (1), except (2) is simpler and (1) is a great deal more complex. Why is the more complex sentence easy and the simple sentence strictly impossible?

Is it because a prepositional phrase cannot intervene between two related prepositional phrases? Sentence three shows this cannot be the reason.

Sentence (3) has the most complex relationships of all three sentences, and yet it too is relatively easy to process. Imagine there are two guys sitting on three-legged chairs, one chair at the bar and one in the lounge.

3. The guy on the chair with three legs at the bar left this for you.

means

The guy on [the chair [with three legs] [at the bar]]

where the chair is both at the bar and has three legs.

It’s not hard to understand, even though there is a prepositional phrase intervening between [the chair] and [with three legs].

So prepositional phrases may intervene sometimes but not always. What’s the explanation?

What determines which are possible and which are impossible?

Computational theory early on gave us the answer. A machine that processes language word by word cannot exclude sentences like (2) while including sentences like (1) and (3). But a machine that processes phrases as well as words, can. A finite automaton can produce any and all of the prepositional relationships above, including, unfortunately, (2), which is not possible for native English speakers. A push-down automaton, however, can produce (1) and (3) without any trouble, but is mechanically, physically, structurally, logically unable to produce (2).

The internal structure of a prepositional phrase can be processed by a machine, like a finite automaton, that reads one grammatical category at a time

prep + determiner + noun

in that order. Such a machine consists of a set of states including an initial state and at least one final state and a set of functions that take one state into another depending on input. The initial state here accepts a preposition which takes it into a new state accepting a determiner. Feeding the machine at this point a determiner will take the machine to a noun-accepting state. (When I have a chance, I’ll flesh this out a bit. Meanwhile, if you’re curious, any textbook on computer theory will have a good description of how finite automata work and the push-downs mentioned below.)

To accommodate (1), such a machine could have a structure corresponding to a regular expression like

[PDN[PDN]]*
(P=prep, D=deter, N=noun, *=any number of times including zero)

and to get (2) and (3), it needs simply

[PDN]*
where any relationships among the prepositional phrases are allowed.

Such a grammar will allow any number of pairs of locally related prepositional phrases along with unrelated intervening prepositional phrases. In other words, a machine that processes one word at a time can be constructed to process all three sentences: it overgenerates to produce (2) as well.

But a push-down automaton — the kind of machine the accepts context free grammars — can’t be designed to produce (2) and needs no special complexity to accommodate the long distance and local relations of (1) and (3).

The simplest context free grammar that can be constructed to process (1)is:
(S=sentence, NP=noun phrase, VP=verb phrase, PrP=prepositional phrase)
S=>NP, VP
NP=> D, NP
NP=> N
NP=> NP, PrP
PrP=> Pr, NP
VP=> VP, PrP
VP=> VP, NP
VP=> V

This simplest grammar, exactly as it is, will also generate (3), but no context free grammar can be constructed to generate (2). (This is all much easier to see with trees, but trees are tough to draw on a blog.)

This is very powerful evidence that the brain has a context free grammar represented in it — not necessarily in a specific place, possibly only in a process distributed through a variety of locations in the brain — but represented somehow.

I haven’t touched here on examples that show that a context free grammar cannot handle all the phenomena of language or on examples that suggest that elements can be moved around by the brain. English speakers have more powerful machinery between their ears capable of taking this fundamental push down structure and playing with it, within some limits. Figuring out the limits is the stuff of current linguistic theory. I am interested here only in presenting sentences that demonstrate that the brains of English speakers must have a pushdown structure that prevents the generation of sentences like (2) which are strictly impossible for native English speakers to process. This demonstration is just for the agnostics and scoffers: How else can you explain why (2) is impossible?