Showing posts with label Grice. Show all posts
Showing posts with label Grice. Show all posts

Thursday, May 16, 2024

Art, craft, game-theoretic cognition and machine learning

Originally published on Language and Philosophy, July 12, 2022

Here in Istanbul, you cannot but admire the Turkish carpet and the mosques of Sinan, the carpet a wonder of intricacy, the more complex and detailed the more wondrous, and Sinan’s grand mosques a wonder of simplicity, purity and restraint even when scaled to the most expansive heights. If you are an idle wonderer with time to think about questions almost too obvious to ask, you might puzzle over why are there no simple carpets when the simplicity of the mosques is so overwhelmingly effective. Why can’t carpet makers avail themselves of modest simplicity in their craft, when purity and humility can reach so deeply into the human heart and mind?

The goals of craft are not the goals of art, no doubt. But what’s the difference? Or better, why such a difference? A good libertarian, and a good Darwinian — and they might as well be the same — would ask first where the market incentives lie. The answer will go a long way to explain the traditional crafts of intricacy. Maybe not so far with art.

First, let’s look at the differences. Craft can be learnt by almost anyone. That this is so couldn’t be more obvious from the tradition of handing the craft from generation to generation. Not so with art. There are a few great arts families, the Bachs and Holbeins and Mendelsohns, and certainly Beethoven and Mozart grew up in musical families, but how many of us listen avidly to Beethoven senior’s compositions (I’ve never heard of a single one) or of papa Mozart or even Bach’s sons, famous as they were in their day. I’d guess that Holbein’s brother, had he not died so young, would have surpassed Hans, but this is the exception that proves the rule. That is, Ambrosius proves by demonstration that extraordinary artistic talent can be shared within a family, so why not the Bachs, the Mozarts, the Beethovens, and whatever happened to Vincent’s brother Theo, the art dealer and all the other artistically talentless siblings, parents and children?

So maybe the incentive is to blame. The traditional crafts provide a reliable source of income, the arts don’t, so children or siblings of artists might choose an alternative route. But this unreliability is only relevant to unsuccessful artists, so the incentive argument begs the question. Successful artists can be far wealthier than any craftsperson. The question has merely been restated: why is a career in the arts riskier than the crafts so safe; why are the arts not reliable sources of income? And now also a mystery, why does anyone pursue art if the financial incentive lacks reliability?

Stay for a moment with the crafts, handed from generation to generation, a traditional income for the whole family. If the craft is a reliable source and so the incentive then is income — the reliability draws the income-seeking — then the goal of craft is income. This too seems obvious just looking at the intricacy of such crafts, since intricacy is labor made visible. The more labor devoted to the artifact, the more evidence of it in the product, the more value. What is the patron looking for and paying for in a traditional craft, after all? The wealthy patron wants an artifact that demonstrates visibly to the patron’s friends or clients or guests that this object bought a high price, so he must be wealthy with money to burn. There’s your incentive and there’s the explanation for all that intricacy. The more intricate, the more proof of labor, the more value in the artifact, the more evidence of the wealth of the patron. Simplicity in such a context, signals absence of labor, lack of value. There’s no room for the virtues of simplicity. It plays a role only if there is a down market of cheap goods for fashion followers who haven’t the resources to buy an expensive carpet.

So the financial incentive yields an artifact that confines itself to intricacy for the purpose of appealing to the patron’s pocket, not to the patron’s emotions, not to his ideas or his politics, not to his morality, not to his mind. Just his pocket. Pay more, get more labor in the artifact in the form of greater intricacy. The craft and its artifact is a relationship between the laborer’s skill and the material she or he works and the pay it gets, nothing more. The only modulations are between more labor (intricacy) and more material — more items or larger items.

What about the arts? An artist must also master the skill of manipulating a material, but has to manipulate as well the emotions or ideas, or politics or moral sense of the audience. The art is a relation between the artist, the material and the mind of the audience, and really the primary material is the audience’s mind. To manipulate the mind, the artist will often hide the craft (the labor) to achieve a seamless illusion of reality, not display the artist’s intent to manipulate, which would undermine the manipulation. (That’s why Brecht’s breaking the fourth wall was radically innovative — the point of drama is to create the illusion of reality, not draw attention to the author.)

The art will also have content, not just intricacy. The content may refer beyond the material. It may consider the context surrounding the art — in drama the context might be the society and its ills, say, or for architecture the entire cityscape. The artifact must not be just an elaborate structure. It must have a place in the world, to the audience’s mind. It can even create a world. Art is an interactive game between artist and other minds and anything that could be contained in that mind. It may contain many worlds, unbounded in number and form.

What about the incentive?

Well, yes, what about the incentive. Like the best of the sciences and maybe all of sports, the primary incentive is not financial, it’s some other kind of reward. Recognition and approval, esteem and pride must be in there, and competition among peers, but surely, above all, the love of engagement with the audience through the artform. Money? That might be a necessary condition, not sufficient for an artist’s choice.

What is it about this game theoretic activity — the manipulation of other minds — that draws or cultivates such extraordinary abilities? It’s not dull labor for financial gain. That’s a one-dimensional activity. The artist might even challenge and insult the audience. Not a craftsperson. Artistry is more social and more fun.

So how is all this related to machine learning and Grice?

*****

I was listening to a couple of Sean Carroll podcasts a few weeks ago, one interviewing Tai-Danae Bradley, the other with Gary Marcus, both about machine learning. The Yoneda Lemma is the connection. The Yoneda Lemma, according to Tai-Danae Bradley, tells us that the meaning of a word can be completely derived from all its contexts.

https://www.youtube.com/watch?v=OynLbSzLS9s&embeds_referring_euri=https%3A%2F%2Flanguageandphilosophy.wordpress.com%2F&source_ve_path=MjM4NTE&feature=emb_title

I can think of at least four challenges to this broad behaviorist, reductionist assertion. For one, a neologism may be introduced by its coiner with clear meaning, but its full contexts may be underdetermined. The coiner will understand it in detail, but a machine learner won’t have access to that information. To assert that the meaning of the word is underdetermined would be a circular argument on the one hand, and would be to ignore that the coiner may have had a clear and distinct concept of its meaning even though others in the speech community might not, and that those others in the language community, reading this coinage, will likely be able to guess its meaning by a kind of process of elimination — looking at first which possible familiar words is this new word replacing. That is, we can learn not just by the contexts of the word, but using contexts analogous to the context at hand, and inserting an expected word to derive the expected meaning, then reverse engineer the meaning of the new word.

A second challenge is a sorites-type puzzle. Contexts may be inconsistent. At what point do we judge that, say, “meme” refers to ideas that circulate among humans (following its coiner, Dawkins) or an online gif, often including either a cat or movie clip, used in place of a linguistically articulated judgment? This puzzle usually has an easy solution. The word has become ambiguous with two historically related but now very distinct meanings. The slippery slope here isn’t disastrous either. It’s no skin off my tooth to grant that the Australian Prime Minister’s idealect — his personal dialectician of English — has a “suppository of wisdom”, or that Rick Perry’s has “lavatories of innovation”. I’m sure I’ve made equally silly maladropisms without even confusing the reader. More likely the reader is amused or gloating depending on how sophisticated or how much of a troll the reader is, but not confused. So let a thousand flowers boom.

A third raises an old behavioral, reductionist quandary. It comes from Willard van Orman Quine. Consider any two concepts that denote the same set of individuals. His example was cordates and renates, animals with a heart and those with a kidney. Now all mammals have hearts and also have kidneys. So the expression “mammalian cordates” and “mammalian renates” designate the same set, but they clearly don’t have the same meaning and if it’s possible that every known, actual use (as distinct from every possible use) of one could be exchanged for the other, then like the neologism, the contexts won’t access the difference of meaning in the mind of their users.

All these cases assume that we know the meanings of the words somehow beyond the contexts in which they are used. If we humans knew the meaning of words only by their contexts, then we would be such big data learners, and the words “renate” and “cordate”, if the actual contexts never distinguished them, would have to be considered synonymous. The reason we don’t is simply because we avail ourselves of the dictionary. Of course, the dictionary is a context too, so if we include the dictionary in machine learning then there will be strong evidence that the Yoneda Lemma is true. But if the machine learner avails itself of the dictionary, then who needs a Yoneda Lemma or big data, just consult the dictionary — insert the meaning of words as brutal input, the machine now knows, but no learning has happened. And no prediction of language shift either, since the dictionary is just a reflection of historical use, not a determiner of it. 

This is really all to say that what’s in the mind is not necessarily accounted for by the actions that proceed from the possessor of the mind. Knowing and doing are not identical, so there should be circumstances in which the two could be distinguished so that gathering the one will not account entirely for the other. One might imagine having taken an action without actually having done it. It’s delusional, but it happens, probably more often than we’d like to admit. Or the reality we believe we live in might not have an unambiguous relation to the actual physical world we live in. We might be certain of what a zipper is, without realizing that we don’t know how it works, or think we know exactly what some deity is but when pushed can’t say exactly any of its properties. Obversely, as in the Quine cases, one’s actions might be ambiguous evidence for one’s decisions or ideas.

Gary Marcus in his interview addresses similar problems for machine learning. Not everything humans know can be known by extensional learning — big behavioral data. (I think we should give it the name BBG because it is a very specific kind of limited data, and I’m going to suggest a different kind of data for learning.) Gathering behaviors may account for actual actions, but not possible actions, and the possible actions reveal the difference between the mere denotation in the actual world and the meaning of the word.

Machine learning is fully adequate to extensional use of language — the actual uses in the real world — but not for intensions, which include all possible uses beyond the actual ones, where machine learning runs straight up against the inductive fallacy. (This is a pretty thorough historical treatment from Frege to Montague and Quine, this is a very brief summary of the background issues. Here’s the Marcus interview)

So much for the familiar challenges. I want to look at a fourth problem for machine learning, a game theoretic problem that seems to be missing from the AI discussion. The meaning of an expression in use — in conversation, which is the point of language and without which symbolic language would never have evolved — is a game theoretic equilibrium, not restricted to the value (meaning or reference) of the word defined in the code/language.

Think of the difference between coding for a simple input-output device versus an interactive interface in which the algorithm must “guess” at the intentions of the user. This is analogous to what I was saying about traditional decorative “intricacy” crafts, which are a relation between the craftsperson and the material (for the sake of exhibiting labor in the product for the rich patron) via the tools of the technology that manipulate the material, as compared with arts where the relation is between the craftsperson and the audience via those tools but also via Theory of Mind to manipulate the mind of the audience — the user. So for machine learning, a sarcastic use of, say, “brilliant” (“You got an A: brilliant!” “You dumped hot coffee in my lap. Brilliant!”) will be interpreted as a homophony or auto-antonymy — the sound sequence “brilliant” having two meanings, in effect two words with identical sounds like “fast” (run fast, an intermittent fast) or “left” (she left home, she took a left turn) or auto-antonyms like “cleave” (cleave to a friend, cleave apart) or “dust” (dust a field with glyphosate, dust the table with a rag).

But for English speakers, sarcasm is not homophony or auto-antonymy. It’s a self-same word used differently depending on the mutual knowledge of the conversation members: you got an A, brilliant; you spilled hot coffee in my lap, brilliant — as code, the symbol still means “smart”. Proof: replace “brilliant” with a synonym like “smart” and the two meanings are unchanged — not so with “cleave” or “fast (run quickly, intermittent quickly??)”.  The value in the conversational game is an equilibrium based on mutual information (speaker and addressee both know spilling coffee is not brilliant according to its conventional meaning or use in English). 

So this is yet another failure of extensional inductive learning — a particularly narrow one-sided materialist reductionism. In other words, a truly successful machine learner, beyond merely gathering or analysing data, would have to *experiment* with synonyms of “brilliant” and “fast” to figure out the type of use difference and then *speculate* on why there’s such a difference — *conjecturing* on what’s actually represented in the mind of the speakers. It’d have to engage in the speculative process of scientific theory creation, not just calculating averages. Prediction, experiment, followed by error correction, but not just Popper-style or if you prefer, Friston-style, but a game-theoretic prediction, Darwin- or Dawkins-style (as in sexual selection or the extended phenotype where the products of selection themselves participate in the selection), answering the question not “what is this symbol’s fixed reference in use?” but “what are the rules of this game?” You see the difference.

Sarcasm is only one of many consequences of this Gricean game-theoretic equilibrium. Grice mentions “possibly” and its (defeasible) implicature of “not actual”. So in “It’s possible to construct a car that runs 300mph. In fact, Fiat made one but couldn’t market it” the “in fact” phrase is used to remove the implicature that “it’s possible, but merely possible, not yet actual”, an implicature which would stand were it not for the “in fact” phrase. Similarly, “it might be raining” implicates that it might also not be raining. These implicatures are explained by Grice’s game theoretical rules. It also has broad evolutionary implications for human gullibility, including adherence to religious beliefs (yet another discussion), cases in which humans fail when chimps succeed, and further supports Marcus’ point in the podcast.


intended paradox

Originally published on Language and Philosophy, June 16, 2013

“I’m very witty!” someone wrote in a comment box in response to the criticism “You have no wit.”

“I’m very witty” might seem at first a witless and therefore unpersuasive response, unless it is sarcastic, in which case it is actually witty. If it’s sarcastic, the meaning intended to convey is that author isn’t witty, and therefore it implies that the comment itself also is not witty. The joke is, the author knows it’s not witty; yet that’s what makes it witty. So if it’s witty, it’s a lie; if it’s a lie, it’s not witty: a liar paradox.

But if the comment is merely false, then there’s no paradox — just a reply by someone who thinks he’s witty but is too dull to know he’s not witty, and hasn’t enough wit to say so wittily.

So if it’s a lie, then it is a meta-witty paradox; if an honest falsehood, it’s just stupid.

What’s interesting is that the intention or speaker’s attitude or character of mind induces the paradox, not the words alone. The paradox depends on who’s speaking, liar or dolt, wit or fool.


Saving Grice’s theory of ‘and’ (with Kratzer lumps!)

 Originally published on Language and Philosophy, June 11, 2007

I’ve always considered Grice’s theory of conversational implicature to be one of the most beautiful theories around. But nowhere is beauty so tightly yoked to truth as in the sciences, where beauty, in the form of simplicity, will decide the truth of two otherwise equally powerful theories. (It’s kind of remarkable when you think about it — truth and simplicity seem not only distinct, but unrelated, unlike say, truth and accuracy or consistency. A complex theory will cause more complexity in its relation to other theories, but if it’s still true, why should complexity ever matter? Is preference for simplicity just a bias?) Truth seems to be a necessary condition for the beauty of a theory in science, so if Grice’s theory isn’t true, its beauty all is lost. The application of conversational co-operation gets messy at and, impugning its truth. I’ve got an idea on how to clean up the mess and restore the symmetry of the structure.

Grice’s analysis of “and” goes like this:

Sometimes “and” is interpreted as simple logical conjunction

1. I brought cheese and bread and wine.

The order of conjuncts doesn’t change the meaning: I brought bread and cheese and wine; wine and cheese and bread; bread and wine and cheese; wine and bread and cheese; it’s all the same. This use of and is symmetric, exactly like the logical conjunction &: A&B<=>B&A

But sometimes and carries the sense of temporal order, “and then”

2. I took off my boots and climbed into bed.

(I think I got this example from Ed Bendix some years ago)

This conjunction is not symmetric: taking off your boots and then climbing into bed is not the same as climbing into bed and then taking off your boots, and the proof of the difference, you might say, comes out in the wash.

The difference in meaning, according to Grice, arises from the assumption that the speaker would not withhold relevant information or present it in a confusing form. If the order of events matters, the order of presentation will follow the order events, unless otherwise specifically indicated. So if I said

I climbed into bed and took off my boots

you’d be justified in surmising that I’d come home very late and very drunk.

The theory of conversational implicature avoids the undesirable circumstance that there might actually be two homonymic “and”s in English, one meaning “&” and the other meaning “and then.”

A problem for Grice was observed long ago by Bar-Lev and Palacas (1980, “Semantic command over pragmatic priority,” Lingua 51). They noted this wonderful minimal pair:

3. I stayed home. I got sick.
4. I stayed home and got sick.

If Grice is right, (3) should mean

3′. I stayed home and then got sick.

But it doesn’t. It means

3″. I got sick and therefore stayed home.

Now unless we are willing to say that the sentential boundary is a morpheme with meaning, we are compelled to drop Grice. Worse still, even though (3) means (3″), the sense of “and then” returns immediately we add “and” between the sentences. (4) means

4″. I stayed home and then I got sick.

even though that’s semantically unexpected. So it’s not about semantic bias, this violation of Grice’s principle. It’s a very real problem that Bar-Lev and Palacas pointed out.

So what’s with “and”?

Here’s my suggestion.

a. In order to use “and” you’ve got to be introducing something new. Think of Angelika Kratzer’s lumps of thought: you’d never say “I painted a portrait and my sister” if you’d only painted one portrait and it was of your sister. Information is structured in clumps of truths that the logical connectives don’t respect. Yes, a portrait was painted and a sister was painted, but if these two things were accomplished in the same act of painting a portrait of one’s sister, then they are in some sense the same fact, though two truths. Now notice the difference between :

“I painted a portrait. I painted my sister.”

Could be the same event. Not so easy to get the same-event interpretation from

“I painted a portrait and I painted my sister.”

The and implies a distinct, newly introduced fact not lumpable with the antecedent event.

b. Causal relations are internal to an event.

Put (a) and (b) together and you have an explanation for (3) and (4). I have a good deal more to say about this, but it’s really nice out, and I’ve been in all day.

More about and: a contextual, situational connective?

 

A few examples:

1. Pat washed her sweater and ruined it

2. Pat ruined her sweater and washed it.

3. Pat ruined her sweater. She washed it.

(1) means, I think, that by washing it Pat ruined it. The sentence allows and because washing doesn’t entail ruining; ruining is a consequence, not a cause.

(2) means that Pat ruined the sweater and then washed it presumably in an attempt to fix it, the outcome of which attempt the sentence doesn’t reveal. It can’t be read, as (3) can, to mean: Pat ruined her sweater by washing it.

Now, (3) can be read also as: she ruined her sweater then washed it. That’s not surprising. What’s surprising is that (3) has the grammatical-consequent-as-semantic-antecedent reading as well, while (2) doesn’t. So the explanation above has to be modified a bit:

a’) consequences are external to an event — they are new facts justifying and

a”) causes are internal to an event — they lump with their consequence and don’t justify and

Bar-Lev and Palacas use another example that goes something like this:

Napoleon took thousands of prisoners and defeated the army. (=and then)

Napoleon defeated the army and took thousands of prisoners (=and then)

Napoleon took thousands of prisoners. He defeated the army. (=backwards cause)

So even when the real-world knowledge bias leans in favor of backward cause, and prevents it.

Here’s another strong example against real-world experiential bias. In answer  to the question, “What did you do today?”:

I went to the store and I went out. (two unrelated round-trip forays outside, the latter possibly to a bar or club)

I went out and I went to the store. (two related events: one followed by a consequence: and=and then)
I went out. I went to the store. (One round-trip foray, the consequent explaining the antecedent)

I went to the store. I went out. (Two events: the consequent can’t explain the antecedent, so they are interpreted as two distinct events)

Given a context in which going out explains going to the store, this last sentence pair should reduce to one event, if this analysis of and is right. I think it does: if the question is, “Did you or did you not go out today?” the answer: “I went to the store. I went out,” indicates one event, the antecedent indicating the specific event and the consequent clause explaining how the antecedent is an answer to the question.

This last example also shows that it’s not just cause that is internal to an event, but anything that explains the antecedently described event. Explanation seems the informationally relevant function from utterance to acceptability. Explanations are internal to a fact. The next step in this investigation would be to figure out what kinds of information qualify as explanations / internal to the fact, and what kinds as additional, new information external to the fact.

And or &: ideas for a contextual logic

On one view of this analysis, it looks like Grice was partly right about and. There’s just one and. But he was wrong to equate English and with logical conjunction &. The one and in English carries a conventional implicature just as but does, but where the conventional implicature of but requires the denial of some association of the antecedent clause, the conventional implicature of and requires that the consequent add some information external to the antecedent clause. and always means something like and also, carrying the conventional implicature that what follows and is additional information external to what preceded and.

There’s an alternative to explore for fun. Suppose Grice was completely right that and means the logical connective &. It’s just that the logical connective & is not the familiar one. It’s truth values are dependent on the relationship of consequent with antecedent. I mean, why couldn’t we have a causal relations-based logic? It would be very different from familiar freshman logic, but it might be a lot of fun and useful too. This connective (I’ll use “+” to avoid confusion with traditional conjunction “&”) would not be symmetric:

a+b ≠ b+a

and there could be two ways of dealing with the truth tables:

if a and b are true and a causes b  then a+b =t

if a and b are true and b causes a then a+b=f

if a and b are true, and a and b denote distinct facts that are causally unrelated, then a+b=t,

otherwise a+b=f.

The last two clauses cover the “I painted a painting and painted a portrait” — two conjuncts denoting the same fact. That sentence will be false if denoting one fact/event, true if denoting two causally unrelated distinct facts/events (assuming that there is no causal relation in this sentence in either direction).

(Now, I’ve forgot the second way I was going to do this. Well, it’ll come to me.)

Ah, yes. [Two years later.] How about defining what is included in an event or using the connective to do that work?

a+b entails that b is not included in a

where “included” means either ‘denoting the same event’ or ‘causing’.

It may seem odd to contextualize truth values so that they depend on denotations and situational relations, but truth values are themselves semantic and denotational. We’re just shoving the contextualization deeper in the muddy murk. Why not have logical connectives that reflect the language or reflect thought?

One application would be to lumps of thought. The whole notion of lumps is model-dependent / context-dependent. Here’s a context-dependent (model-dependent) connective that reflects the lumping of reality.

I can think of some obvious objections to a context-dependent logic. It’s not really truth-functional in its syntax. The falsehood, for example, of a+b, does not entail either the falsehood of a or the falsehood of b. a+b could be false simply because b is included in a. But something like this is true of other familiar logical connectives. For example, the falsehood of avb does not entail the falsehood of a or the falsehood of b. It might be that b is true and a false, or a true and b false. The difference between + and v is that the truth value of v depends on the truth values of the statements it joins, while the value of + depends also on event/fact inclusion.

How are the connectives syntactically interdefined? How can deductions be proved syntactically? What would the laws of deduction look like?

Cliff-hanger.