Noise A Flaw in Human Judgment by Sunstein, Cass R. Sibony, Olivier Kahneman, Daniel [Sunstein, Cass R.]
This book comes in six parts. In part 1, we explore the difference between noise and bias. In part 2, we investigate the nature of human judgment and explore how to measure accuracy and error. Part 3 takes a deeper look at one type of judgment that has been researched extensively: predictive judgment. Part 4 turns to human psychology. We explain the central causes of noise. These include interpersonal differences arising from a variety of factors Part 5 explores the practical question of how you can improve your judgments and prevent error. What is the right level of noise? Part 6 turns to this question.
Judgment can therefore be described as a measurement in which the instrument is a human mind. Implicit in the notion of measurement is the goal of accuracy—to approach truth and minimize error. The goal of judgment is not to impress, not to take a stand, not to persuade.
Level noise is when judges show different levels of severity. Pattern noise is when they disagree with one another on which defendants deserve more severe or more lenient treatment. And part of pattern noise is occasion noise—when judges disagree with themselves.
In a perfect world, defendants would face justice; in our world, they face a noisy system.
Bullshit has become something of a technical term since Harry Frankfurt, a philosopher at Princeton University, published an insightful book, On Bullshit, in which he distinguished bullshit from other types of misrepresentation.
When people are introduced to clinical and mechanical prediction, they want to know how the two compare. How good is human judgment, relative to a formula?
The question had been asked before, but it attracted much attention only in 1954, when Paul Meehl, a professor of psychology at the University of Minnesota, published a book titled Clinical Versus Statistical Prediction: A Theoretical Analysis and a Review of the Evidence. Meehl reviewed twenty studies in which a clinical judgment was pitted against a mechanical prediction for such outcomes as academic success and psychiatric prognosis. He reached the strong conclusion that simple mechanical rules were generally superior to human judgment. Meehl discovered that clinicians and other professionals are distressingly weak in what they often see as their unique strength: the ability to integrate information.
When there is a lot of data, machine-learning algorithms will do better than humans and better than simple models. But even the simplest rules and algorithms have big advantages over human judges: they are free of noise, and they do not attempt to apply complex, usually invalid insights about the predictors.
AI often performs better than simpler models do. In most applications, however, its performance remains far from perfect.
Wherever there is prediction, there is ignorance, and probably more of it than we think.
When you trust your gut because of an internal signal, not because of anything you really know, you are in denial of your objective ignorance
Models do better than people, but not by much. Mostly, we find mediocre human judgments and slightly better models. Still, better is good, and models are better.
Among doctors, the level of noise is far higher than we might have suspected. In diagnosing cancer and heart disease—even in reading X-rays—specialists sometimes disagree. That means that the treatment a patient gets might be a product of a lottery.
Doctors like to think that they make the same decision whether it’s Monday or Friday or early in the morning or late in the afternoon. But it turns out that what doctors say and do might well depend on how tired they are.
Medical guidelines can make doctors less likely to blunder at a patient’s expense. Such guidelines can also help the medical profession as a whole because they reduce variability.
In traditional, informal interviews, we often have an irresistible, intuitive feeling of understanding the candidate and knowing whether the person fits the bill. We must learn to distrust that feeling.
Traditional interviews are dangerous not only because of biases but also because of noise.
We must add structure to our interviews and, more broadly, to our selection processes. Let’s start by defining much more clearly and specifically what we are looking for in candidates, and let’s make sure we evaluate the candidates independently on each of these dimensions.
There are seven major objections to efforts to reduce or eliminate noise.
First, reducing noise can be expensive; it might not be worth the trouble. The steps that are necessary to reduce noise might be highly burdensome. In some cases, they might not even be feasible.
Second, some strategies introduced to reduce noise might introduce errors of their own. Occasionally, they might produce systematic bias. If all forecasters in a government office adopted the same unrealistically optimistic assumptions, their forecasts would not be noisy, but they would be wrong. If all doctors at a hospital prescribed aspirin for every illness, they would not be noisy, but they would make plenty of mistakes.
Third, if we want people to feel that they have been treated with respect and dignity, we might have to tolerate some noise. Noise can be a by-product of an imperfect process that people end up embracing because the process gives everyone (employees, customers, applicants, students, those accused of crime) an individualized hearing, an opportunity to influence the exercise of discretion, and a sense that they have had a chance to be seen and heard.
Fourth, noise might be essential to accommodate new values and hence to allow moral and political evolution. If we eliminate noise, we might reduce our ability to respond when moral and political commitments move in new and unexpected directions. A noise-free system might freeze existing values.
Fifth, some strategies designed to reduce noise might encourage opportunistic behavior, allowing people to game the system or evade prohibitions. A little noise, or perhaps a lot of it, might be necessary to prevent wrongdoing.
Sixth, a noisy process might be a good deterrent. If people know that they could be subject to either a small penalty or a large one, they might steer clear of wrongdoing, at least if they are risk-averse. A system might tolerate noise as a way of producing extra deterrence.
Finally, people do not want to be treated as if they are mere things or cogs in some kind of machine. Some noise-reduction strategies might squelch people’s creativity and prove demoralizing.
People value and even need face-to-face interactions. They want real human being to listen to their concerns and complaints and to have the power to make things better. Sure, those interactions will inevitably produce noise. But human dignity is priceless.
Moral values are constantly evolving. If we lock everything down, we won’t make space for changing values. Some efforts to reduce noise are just too rigid; they would prevent moral change.
If you want to deter misconduct, you should tolerate some noise. If students are left wondering about the penalty for plagiarism, great—they will avoid plagiarizing. A little uncertainty in the form of noise can magnify deterrence.
If we eliminate noise, we might end up with clear rules, which wrongdoers will find ways to avoid. Noise can be a price worth paying if it is a way of preventing strategic or opportunistic behavior.
Creative people need space. People aren’t robots. Whatever your job, you deserve some room to maneuver. If you’re hemmed in, you might not be noisy, but you won’t have much fun and you won’t be able to bring your original ideas to bear.
In the end, most of the efforts to defend noise aren’t convincing. We can respect people’s dignity, make plenty of space for moral evolution, and allow for human creativity without tolerating the unfairness and cost of noise.
Rules simplify life and reduce noise. But standards allow people to adjust to the particulars of the situations.
Rules or standards? First, ask which produces more mistakes. Then, ask which is easier or more burdensome to produce or work with.
We often use standards when we should embrace rules—simply because we don’t pay attention to the noise.
Noise reduction shouldn’t be part of the Universal Declaration of Human Rights—at least not yet. Still, noise can be horribly unfair. All over the world, legal systems should consider taking strong steps to reduce it.
Type of noises:
System noise can be broken down into level noise and pattern noise. Some judges are generally more severe than others, and others are more lenient; some forecasters are generally bullish and others bearish about market prospects; some doctors prescribe more antibiotics than others do. Level noise is the variability of the average judgments made by different individuals. The ambiguity of judgment scales is one of the sources of level noise. Words such as likely or numbers (e.g., 4 on a scale of 0 to 6) mean different things to different people. Level noise is an important source of error in judgment systems and an important target for interventions aimed at noise reduction.
System noise includes another, generally larger component. Regardless of the average level of their judgments, two judges may differ in their views of which crimes deserve the harsher sentences. Their sentencing decisions will produce a different ranking of cases. We call this variability pattern noise (the technical term is statistical interaction ).
The main source of pattern noise is stable: it is the difference in the personal, idiosyncratic responses of judges to the same case. Some of these differences reflect principles or values that individuals follow, whether consciously or not.
pattern noise also has a transient component, called occasion noise. We detect this kind of noise if a radiologist assigns different diagnoses to the same image on different days or if a fingerprint examiner identifies two prints as a match on one occasion but not on another
The judges’ cognitive flaws are not the only cause of errors in predictive judgments. Objective ignorance often plays a larger role.
Psychological biases are, of course, a source of systematic error, or statistical bias. Less obviously, they are also a source of the noise. When biases are not shared by all judges, when they are present to different degrees, and when their effects depend on extraneous circumstances, psychological biases produce noise.
How to Reduce Noise (and Bias, Too)
There is reason to believe that some people make better judgments than others do. Task-specific skill, intelligence, and a certain cognitive style—best described as being actively open-minded —characterize the best judges. Unsurprisingly, good judges will make few egregious mistakes. Given the multiple sources of individual differences, however, we should not expect even the best judges to be in perfect agreement on complex judgment problems. The infinite variety of backgrounds, personalities, and experiences that make each of us unique is also what makes noise inevitable.
One strategy for error reduction is debiasing. Typically, people attempt to remove bias from their judgments either by correcting judgments after the fact or by taming biases before they affect judgments.
Our main suggestion for reducing noise in judgment is decision hygiene. We chose this term because noise reduction, like health hygiene, is prevention against an unidentified enemy.
A noise-reduction effort in an organization should always begin with a noise audit (see appendix A). An important function of the audit is to obtain a commitment of the organization to take noise seriously. An essential benefit is the assessment of separate types of noise.
We now recapitulate six principles that define decision hygiene, describe how they address the psychological mechanisms that cause noise, and show how they relate to the specific decision hygiene techniques we have discussed
The goal of judgment is accuracy, not individual expression.
Think statistically, and take the outside view of the case
Structure judgments into several independent tasks.
Resist premature intuitions
Obtain independent judgments from multiple judges, then consider aggregating those judgments.
Favor relative judgments and relative scales
Bias leads to errors and unfairness. Noise does too—and yet, we do a lot less about it. Judgment error may seem more tolerable when it is random than when we attribute it to a cause, but it is no less damaging. If we want better decisions about things that matter, we should take noise reduction seriously.