Sunday, September 8, 2013

Keynes, Bayes, and the law

This is a combined response to a post by John Kay on math and story telling, followed by another by Lars Syll on probabilistic reductionism and a final one by Philip Pilkington on multiple versions of probabilities. Since I read the posts in reverse order, I'll structure my response in the same way.

So starting with Pilkington, to say that there are alternative probabilities, one preferred by trained statisticians and another adopted by lawyers and judges is akin to say that there are alternative versions of chemistry, one suitable for the laboratory and another, more subtle and full of nuances, adopted by the refined minds of cooks and winemakers, honed in by hundreds or even thousands of years of experience. Clear nonsense, of course: the fact that a cook or winemaker uses tradition, taste, and rules of thumb, does not change the underlying chemistry. Same with probabilities and the court room.

Before leaving Pilkington's post behind, let me just observe that he seems to be under the impression that confidence intervals and such are the domain of Bayesian statistics, whereas arguments based on the "degree of belief" are something else altogether. But anyone with a basic understanding of both confidence intervals and Bayesian statistics knows that nothing could be farther from the truth, as explained in this very clear post by normaldeviate, where one can find the statement that "Bayesian inference is the Analysis of Beliefs" - as simple as that.

But Pilkington can be (partially) excused by the fact that he's getting his definitions from Lars Syll, who doesn't score any better in understanding or explaining Bayesianism. Syll gives the following example:

Say you have come to learn (based on own experience and tons of data) that the probability of you becoming unemployed in Sweden is 10%. Having moved to another country (where you have no own experience and no data) you have no information on unemployment and a fortiori nothing to help you construct any probability estimate on. A Bayesian would, however, argue that you would have to assign probabilities to the mutually exclusive alternative outcomes and that these have to add up to 1, if you are rational. That is, in this case – and based on symmetry – a rational individual would have to assign probability 10% to becoming unemployed and 90% of becoming employed.

While it is certainly true that a Bayesian would argue that you have to assign probabilities to the mutually exclusive events and that these have to add up to 1, no Bayesian would EVER say, based on symmetry or whatever, that a rational individual would have to assign probability 10% to becoming unemployed and 90% of becoming employed. A Bayesian could not care less how someone comes up with their priors. All a Bayesian says is that the priors need to add up to one and subsequently be revised in face of experience according to Bayes theorem. In this example, an assignment of 10% and 90% is just as rational as the exact opposite, namely 90% and 10%. What matters is that these priors eventually get corrected by new evidence. The only effect of a bad prior is that the correction takes slightly longer and requires a bit more evidence, that's all (for the technically minded, I should say that the only priors that don't get corrected by evidence are those of the form 0% to one event and 100% to another - no amount of new evidence can change these). Although trivial, this point is important to understand Syll's rejection of Bayesianism. For example, in this other post he explains why he's a Keynesian and not a Bayesian in terms of a "paradox" created by "the principle of insufficient reason", which is yet another way to select a prior and has precious little to do with Bayesianism.

Next, moving on to Kay and his person-hit-by-a-bus example, evidently no court should find Company A liable simply because it has more buses than Company B, but absent any other information, this is a pretty decent way to form a prior. Another one is to come up with a narrative about the person and the bus. But in either case, the court should look at further evidence and recalculate its belief that a bus from Company A actually hit the person. For example they could hear testimony from eye witnesses or look at video footage and use Bayes theorem to find the posterior probabilities, which would then enter the "balance of probabilities" to lead to a decision. A court that finds Company A liable purely based on a story without looking at evidence is just as stupid as a one that bases its decision on the number of buses from each company.

But the key passage in Kay's piece is:

Such narrative reasoning is the most effective means humans have developed of handling complex and ill-defined problems. A court can rarely establish a complete account of the probabilities of the events on which it is required to adjudicate. Similarly, an individual cannot know how career and relationships will evolve. A business must be steered into a future of unknown and unknowable dimensions. 
So while probabilistic thinking is indispensable when dealing with recurrent events or histories that repeat themselves, it often fails when we try to apply it to idiosyncratic events and open-ended problems. We cope with these situations by telling stories, and we base decisions on their persuasiveness. Not because we are stupid, but because experience has told us it is the best way to cope. That is why novels sell better than statistics texts.

so let me addressed this lest I receive hand-waving that I have not understood the criticism. There are two fundamental misunderstandings here. The first has to do with what it means to give a "complete account of the probabilities of the events" and the second is that probabilistic thinking involves some form of definite knowledge about the future, which has "unknown and unknowable dimensions".

Now if by a "complete account of the probabilities of the events" one means "to assign probabilities to the mutually exclusive alternative outcomes and that these have to add up to 1", then as we have seen this is exactly what Bayes requires. But notice that "complete account" here simply means slicing up all mutually exclusive events that one is interested in (the technical term is a partition), and this can be as simple as two events (say hit by bus from Company A or from Company B, or being employed of unemployed in Norway). It does NOT mean a complete account of all the complex and ill-defined phenomena that led a person to be hit by a bus, or found in a cafe in Oslo with a diminishing amount of money in their pocket. Once the events of interest are identified, ANY method for assigning priors is fair game. This could be a narrative, or historical data, or an agent-based model for people, buses, and firms.

Finally, about the notion that probabilistic thinking requires strong assumptions about the future, one hears often that because economics (or law, or politics, or baseball) is not ergodic, past experience is no guide to the future, and therefore there's no hope in using probability. As I said elsewhere, Bayes theorem is a way to update beliefs (expressed as probabilities) in face of new information, and as such

could not care less if the prior probabilities change because they are time-dependent, the world changed, or you were too stupid to assign them to begin with. It is only a narrow frequentist view of prediction that requires ergodicity (and a host of other assumptions like asymptotic normality of errors) to be applicable.

A few related exclamations to conclude:

Mathematical models are stories like any other!

Non-ergodicity is the friend of good modellers and story tellers alike!

So is irreducible uncertainty!

Think probabilistically! Estimate nothing!



  1. You make some bold claims. I couldn't help thinking of the case against the Assad regime. It may be that a Bayesian does not care where Kerry's priors come from, but many do. I also do not get your ergodicity argument. Suppose that you have masses of data about a probabilistic process, and then no data for many years. What difference does it make whether you know that the process is probabilistically fixed, or where it is liable to have changed? For example, a roulette wheel that was fair but is now wearing, but you don't know how. Could you do a toy Bayes calculation in the two case?

  2. I never said that the quality of the priors doesn't matter. Poor priors lead to poor bets and ultimately poor outcomes (i.e you loose your bets more often than you win). All I'm saying is that Bayesianism is entirely agnostic about where you get your priors from, while offering a well defined prescription to update them. It's all there in one elegant theorem.

    Also notice that not all evidence is born equally. Strong evidence (as measured by the other conditional probabilities that enter Bayes theorem) correct priors very quickly, whereas weak evidence doesn't.

    So the quality of both the priors and the evidence presented is hugely important. But there are no other hidden assumptions in the argument (e.g symmetry) in either the way to choose priors or to incorporate evidence. Everything is laid out explicitly.

    Same goes for ergodicity. If you have masses of data about a roulette, you can use the data to initially assign your priors as if it was fair. But as you continue playing and realize it is wearing (how quickly you realize depends how obvious the wearing is - e.g it spins once and comes to a screeching halt), you use Bayes theorem to update them.