18 min read

Book Review and Notes: Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O'Neil

Book: Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O'Neil

My rating: 5 of 5 stars

The subtitle says this book is about how big data increases inequality and threatens democracy. And it is exactly that. It’s a good overview for people who are new to the topic.

I wished she had more pragmatic proposals or solutions, but it’s understandable that she doesn’t. That shouldn’t stop us from becoming aware and starting a conversation.

I, for one, am convinced that the gains in efficiency and savings from making decisions from big data come at the expense of inequality and unfairness, and is therefore not worth it.

My notes and highlights

This underscores another common feature of WMDs. They tend to punish the poor. This is, in part, because they are engineered to evaluate large numbers of people. They specialize in bulk, and they’re cheap. That’s part of their appeal. The wealthy, by contrast, often benefit from personal input. A white-shoe law firm or an exclusive prep school will lean far more on recommendations and face-to-face interviews than will a fast-food chain or a cash-strapped urban school district. The privileged, we’ll see time and again, are processed more by people, the masses by machines. LOCATION: 207

The human victims of WMDs, we’ll see time and again, are held to a far higher standard of evidence than the algorithms themselves. LOCATION: 249

My goal was to mobilize fellow mathematicians against the use of sloppy statistics and biased models that created their own toxic feedback loops. LOCATION: 259

But the point is not whether some people benefit. It’s that so many suffer. These models, powered by algorithms, slam doors in the face of millions of people, often for the flimsiest of reasons, and offer no appeal. They’re unfair. LOCATION: 524

They considered them sensible investments. But consider the opacity. Investors remained blind to the quality of the mortgages in the securities. Their only glimpse of what lurked inside came from analyst ratings. And these analysts collected fees from the very companies whose products they were rating. Mortgage-backed securities, needless to say, were an ideal platform for fraud. LOCATION: 631

The AAA ratings on defective products turned into dollars. The dollars in turn created confidence in the products and in the cheating-and-lying process that manufactured them. The resulting cycle of mutual back-scratching and pocket-filling was how the whole sordid business operated until it blew up. LOCATION: 672

Snake oil vendors, of course, are as old as history, and in previous real estate bubbles unwitting buyers ended up with swampland and stacks of false deeds. LOCATION: 675

But this time the power of modern computing fueled fraud at a scale unequaled in history. LOCATION: 676

The lobbyists succeeded, for the most part, and the game remained the same: to rope in dumb money. Except for a few regulations that added a few hoops to jump through, life went on. LOCATION: 701

In the time it took me to type two words into my résumé, I was a newly proclaimed Data Scientist, LOCATION: 733

More and more, I worried about the separation between technical models and real people, and about the moral repercussions of that separation. LOCATION: 760

What does a single national diet have to do with WMDs? Scale. A formula, whether it’s a diet or a tax code, might be perfectly innocuous in theory. But if it grows to become a national or global standard, it creates its own distorted and dystopian economy. This is what has happened in higher education. LOCATION: 780

U.S. News’s first data-driven ranking came out in 1988, and the results seemed sensible. However, as the ranking grew into a national standard, a vicious feedback loop materialized. The trouble was that the rankings were self-reinforcing. If a college fared badly in U.S. News, its reputation would suffer, and conditions would deteriorate. Top students would avoid it, as would top professors. Alumni would howl and cut back on contributions. The ranking would tumble further. The ranking, in short, was destiny. LOCATION: 810

He admitted that the most relevant data—what the students had learned at each school—was inaccessible. But the U.S. News model, constructed from proxies, was the next best thing. LOCATION: 839

However, when you create a model from proxies, it is far simpler for people to game it. This is because proxies are easier to manipulate than the complicated reality they represent. LOCATION: 841

Let’s say a website is looking to hire a social media maven. Many people apply for the job, and they send information about the various marketing campaigns they’ve run. But it takes way too much time to track down and evaluate all of their work. So the hiring manager settles on a proxy. She gives strong consideration to applicants with the most followers on Twitter. That’s a sign of social media engagement, isn’t it? Well, it’s a reasonable enough proxy. But what happens when word leaks out, as it surely will, that assembling a crowd on Twitter is key for getting a job at this company? Candidates soon do everything they can to ratchet up their Twitter numbers. Some pay $19.95 for a service that populates their feed with thousands of followers, most of them generated by robots. As people game the system, the proxy loses its effectiveness. Cheaters wind up as false positives. LOCATION: 842

Winning athletic programs, it turns out, are the most effective promotions for some applicants. LOCATION: 878

The problem isn’t the U.S. News model but its scale. It forces everyone to shoot for exactly the same goals, which creates a rat race—and lots of harmful unintended consequences. LOCATION: 887

As the rankings grow, so do efforts to game them. LOCATION: 945

And whether or not it was the case, they had the perception that others were cheating. So preventing the students in Zhongxiang from cheating was unfair. In a system in which cheating is the norm, following the rules amounts to a handicap. Just ask the Tour de France cyclists who were annihilated for seven years straight by Lance Armstrong and his doping teammates. LOCATION: 969

The victims, of course, are the vast majority of Americans, the poor and middle-class families who don’t have thousands of dollars to spent on courses and consultants. They miss out on precious insider knowledge. The result is an education system that favors the privileged. It tilts against needy students, locking out the great majority of them—and pushing them down a path toward poverty. It deepens the social divide. LOCATION: 991

All of those sound like worthy goals, to be sure, but every ranking system can be gamed. And when that happens, it creates new and different feedback loops and a host of unintended consequences. It’s easy to raise graduation rates, for example, by lowering standards. Many students struggle with math and science prerequisites and foreign languages. Water down those requirements, and more students will graduate. But if one goal of our educational system is to produce more scientists and technologists for a global economy, how smart is that? It would also be a cinch to pump up the income numbers for graduates. All colleges would have to do is shrink their liberal arts programs, and get rid of education departments and social work departments while they’re at it, since teachers and social workers make less money than engineers, chemists, and computer scientists. But they’re no less valuable to society. LOCATION: 1006

This establishes a powerful basis for legitimate ad campaigns, but it also fuels their predatory cousins: ads that pinpoint people in great need and sell them false or overpriced promises. LOCATION: 1060

They find inequality and feast on it. The result is that they perpetuate our existing social stratification, with all of its injustices. The greatest divide is between the winners in our system, like our venture capitalist, and the people his models prey upon. LOCATION: 1061

Anywhere you find the combination of great need and ignorance, you’ll likely see predatory ads. LOCATION: 1063

If people are anxious about their sex lives, predatory advertisers will promise them Viagra or Cialis, or even penis extensions. If they are short of money, offers will pour in for high-interest payday loans. If their computer is acting sludgy, it might be a virus inserted by a predatory advertiser, who will then offer to fix it. LOCATION: 1064

When it comes to WMDs, predatory ads practically define the genre. They zero in on the most desperate among us at enormous scale. LOCATION: 1067

Vatterott College, a career-training institute, is a particularly nasty example. A 2012 Senate committee report on for-profit colleges described Vatterott’s recruiting manual, which sounds diabolical. It directs recruiters to target “Welfare Mom w/Kids. Pregnant Ladies. Recent Divorce. Low Self-Esteem. Low Income Jobs. Experienced a Recent Death. Physically/Mentally Abused. Recent Incarceration. Drug Rehabilitation. Dead-End Jobs—No Future.” LOCATION: 1082

Why, specifically, were they targeting these folks? Vulnerability is worth gold. It always has been. LOCATION: 1086

But for-profit colleges hunt in the opposite direction. They’re more likely to be targeting people in the poorest zip codes, with special attention to those who have clicked on an ad for payday loans or seem to be concerned with post-traumatic stress. (Combat veterans are highly recruited, in part because it’s easier to get financing for them.) LOCATION: 1125

One lead generator, Salt Lake City–based Neutron Interactive, posted fake jobs at websites like Monster.​com, as well as ads promising to help people get food stamps and Medicaid coverage, according to David Halperin, a public policy researcher. Using the same optimization methods, they would roll out loads of different ads, measuring their effectiveness for each demographic. The purpose of these ads was to lure desperate job seekers to provide their cell phone numbers. LOCATION: 1173

the search engine on the website is engineered to direct poor students toward for-profit universities. Once a student has indicated in an online questionnaire that she’ll need financial aid, the for-profit colleges pop up at the top of her list of matching schools. LOCATION: 1184

Along come the for-profit colleges with their highly refined WMDs to target and fleece the population most in need. They sell them the promise of an education and a tantalizing glimpse of upward mobility—while plunging them deeper into debt. They take advantage of the pressing need in poor households, along with their ignorance and their aspirations, then they exploit it. And they do this at great scale. This leads to hopelessness and despair, along with skepticism about the value of education more broadly, and it exacerbates our country’s vast wealth gap. LOCATION: 1221

If you just think about where people are hurting, or desperate, you’ll find advertisers wielding their predatory models. One of the biggest opportunities, naturally, is for loans. Everyone needs money, but some more urgently than others. These people are not hard to find. The neediest are far more likely to reside in impoverished zip codes. And from a predatory advertiser’s perspective, they practically shout out for special attention with their queries on search engines and their clicks on coupons. LOCATION: 1230

While looking at WMDs, we’re often faced with a choice between fairness and efficacy. LOCATION: 1403

Our legal traditions lean strongly toward fairness. The Constitution, for example, presumes innocence and is engineered to value it. LOCATION: 1403

So the system sacrifices enormous efficiencies for the promise of fairness. The Constitution’s implicit judgment is that freeing someone who may well have committed a crime, for lack of evidence, poses less of a danger to our society than jailing or executing an innocent person. LOCATION: 1406

WMDs, by contrast, tend to favor efficiency. LOCATION: 1408

So fairness isn’t calculated into WMDs. And the result is massive, industrial production of unfairness. If you think of a WMD as a factory, unfairness is the black stuff belching out of the smoke stacks. It’s an emission, a toxic one. The question is whether we as a society are willing to sacrifice a bit of efficiency in the interest of fairness. LOCATION: 1413

It’s a tough case to make, similar in many ways to the battles over wiretapping by the National Security Agency. LOCATION: 1418

a crucial part of justice is equality. And that means, among many other things, experiencing criminal justice equally. People who favor policies like stop and frisk should experience it themselves. Justice cannot just be something that one part of society inflicts upon the other. LOCATION: 1431

And what would those criteria be? That looked like the easy part. St. George’s already had voluminous records of screenings from the previous years. The job was to teach the computerized system how to replicate the same procedures that human beings had been following. As I’m sure you can guess, these inputs were the problem. The computer learned from the humans how to discriminate, and it carried out this work with breathtaking efficiency. LOCATION: 1703

This is, of course, the nature of capitalism. For companies, revenue is like oxygen. It keeps them alive. From their perspective, it would be profoundly stupid, even unnatural, to turn away from potential savings. That’s why society needs countervailing forces, such as vigorous press coverage that highlights the abuses of efficiency and shames companies into doing the right thing.

NOTE: Why metrics should come in opposite pairs, e.g. quality and quantity; speed and quality


In other words, the modelers for e-scores have to make do with trying to answer the question “How have people like you behaved in the past?” when ideally they would ask, “How have you behaved in the past?” LOCATION: 2116

Even with the Affordable Care Act, which reduced the ranks of the uninsured, medical expenses remain the single biggest cause of bankruptcies in America. LOCATION: 2160

Such mistakes are learning opportunities—as long as the system receives feedback on the error. In this case, it did. But injustice persists. When automatic systems sift through our data to size us up for an e-score, they naturally project the past into the future. As we saw in recidivism sentencing models and predatory loan algorithms, the poor are expected to remain poor forever and are treated accordingly—denied opportunities, jailed more often, and gouged for services and loans. It’s inexorable, often hidden and beyond appeal, and unfair. LOCATION: 2245

They urgently require the context, common sense, and fairness that only humans can provide. However, if we leave this issue to the marketplace, which prizes efficiency, growth, and cash flow (while tolerating a certain degree of errors), meddling humans will be instructed to stand clear of the machinery. LOCATION: 2254

The move toward the individual, as we’ll see, is embryonic. But already insurers are using data to divide us into smaller tribes, to offer us different products and services at varying prices. Some might call this customized service. The trouble is, it’s not individual. The models place us into groups we cannot see, whose behavior appears to resemble ours. Regardless of the quality of the analysis, its opacity can lead to gouging. LOCATION: 2370

At the same time, surveillance will change the very nature of insurance. Insurance is an industry, traditionally, that draws on the majority of the community to respond to the needs of an unfortunate minority. In the villages we lived in centuries ago, families, religious groups, and neighbors helped look after each other when fire, accident, or illness struck. In the market economy, we outsource this care to insurance companies, which keep a portion of the money for themselves and call it profit. LOCATION: 2468

As insurance companies learn more about us, they’ll be able to pinpoint those who appear to be the riskiest customers and then either drive their rates to the stratosphere or, where legal, deny them coverage. This is a far cry from insurance’s original purpose, which is to help society balance its risk. In a targeted world, we no longer pay the average. Instead, we’re saddled with anticipated costs. Instead of smoothing out life’s bumps, insurance companies will demand payment for those bumps in advance. This undermines the point of insurance, and the hits will fall especially hard on those who can least afford them. LOCATION: 2472

In The Selling of the President, which followed Richard Nixon’s 1968 campaign, the journalist Joe McGinniss introduced readers to the political operatives working to market the presidential candidate like a consumer good. By using focus groups, Nixon’s campaign was able to hone his pitch for different regions and demographics. LOCATION: 2706

The scoring of individual voters also undermines democracy, making a minority of voters important and the rest little more than a supporting cast. LOCATION: 2833

As is often the case with WMDs, the very same models that inflict damage could be used to humanity’s benefit. Instead of targeting people in order to manipulate them, it could line them up for help. In a mayoral race, for example, a microtargeting campaign might tag certain voters for angry messages about unaffordable rents. But if the candidate knows these voters are angry about rent, how about using the same technology to identify the ones who will most benefit from affordable housing and then help them find it?

NOTE: Of course we (as humans) don’t want to deal with angry people. The tech really does put a scale on our prejudice and meanness


The quiet and personal nature of this targeting keeps society’s winners from seeing how the very same models are destroying lives, sometimes just a few blocks away. LOCATION: 2875

Gay rights benefited in many ways from market forces. There was a highly educated and increasingly vocal gay and lesbian talent pool that companies were eager to engage. So they optimized their models to attract them. But they did this with the focus on the bottom line. Fairness, in most cases, was a by-product. At the same time, businesses across the country were starting to zero in on wealthy LGBT consumers, offering cruises, happy hours, and gay-themed TV shows. While inclusiveness no doubt caused grumbling in some pockets of intolerance, it also paid rich dividends. LOCATION: 2902

Dismantling a WMD doesn’t always offer such obvious payoff. While more fairness and justice would of course benefit society as a whole, individual companies are not positioned to reap the rewards. For most of them, in fact, WMDs appear to be highly effective. Entire business models, such as for-profit universities and payday loans, are built upon them. And when a software program successfully targets people desperate enough to pay 18 percent a month, those raking in the profits think it’s working just fine. LOCATION: 2906

NOTE: Well what is the metric you are using to track how society benefits

Injustice, whether based in greed or prejudice, has been with us forever. And you could argue that WMDs are no worse than the human nastiness of the recent past. In many cases, after all, a loan officer or hiring manager would routinely exclude entire races, not to mention an entire gender, from being considered for a mortgage or a job offer. Even the worst mathematical models, many would argue, aren’t nearly that bad. LOCATION: 2924

But human decision making, while often flawed, has one chief virtue. It can evolve. As human beings learn and adapt, we change, and so do our processes. Automated systems, by contrast, stay stuck in time until engineers dive in to change them. If a Big Data college application model had established itself in the early 1960s, we still wouldn’t have many women going to college, because it would have been trained largely on successful men. If museums at the same time had codified the prevalent ideas of great art, we would still be looking almost entirely at work by white men, the people paid by rich patrons to create art. The University of Alabama’s football team, needless to say, would still be lily white. LOCATION: 2927

NOTE: Is this true. But the algorithm is designed and can be updated by humans

Big Data processes codify the past. They do not invent the future. Doing that requires moral imagination, and that’s something only humans can provide. We have to explicitly embed better values into our algorithms, creating Big Data models that follow our ethical lead. Sometimes that will mean putting fairness ahead of profit. LOCATION: 2932

Clearly, the free market could not control its excesses. So after journalists like Ida Tarbell and Upton Sinclair exposed these and other problems, the government stepped in. It established safety protocols and health inspections for food, and it outlawed child labor. With the rise of unions, and the passage of laws safeguarding them, our society moved toward eight-hour workdays and weekends off. These new standards protected companies that didn’t want to exploit workers or sell tainted foods, because their competitors had to follow the same rules. And while they no doubt raised the costs of doing business, they also benefited society as a whole. Few of us would want to return to a time before they existed. LOCATION: 2943

Though economists may attempt to calculate costs for smog or agricultural runoff, or the extinction of the spotted owl, numbers can never express their value. And the same is often true of fairness and the common good in mathematical models. They’re concepts that reside only in the human mind, and they resist quantification. And since humans are in charge of making the models, they rarely go the extra mile or two to even try. It’s just considered too difficult. But we need to impose human values on these systems, even at the cost of efficiency. LOCATION: 2977

If you consider mathematical models as the engines of the digital economy—and in many ways they are—these auditors are opening the hoods, showing us how they work. This is a vital step, so that we can equip these powerful engines with steering wheels—and brakes. LOCATION: 3035

The real-name policy is admirable in many ways, not least because it pushes users to be accountable for the messages they post. But Facebook also must be accountable to all of us—which means opening its platform to more data auditors. LOCATION: 3041

The government, of course, has a powerful regulatory role to play, just as it did when confronted with the excesses and tragedies of the first industrial revolution. It can start by adapting and then enforcing the laws that are already on the books. LOCATION: 3043

First, we need to demand transparency. Each of us should have the right to receive an alert when a credit score is being used to judge or vet us. And each of us should have access to the information being used to compute that score. If it is incorrect, we should have the right to challenge and correct it. LOCATION: 3050

Next, the regulations should expand to cover new types of credit companies, like Lending Club, which use newfangled e-scores to predict the risk that we’ll default on loans. They should not be allowed to operate in the shadows. LOCATION: 3053

If we want to bring out the big guns, we might consider moving toward the European model, which stipulates that any data collected must be approved by the user, as an opt-in. It also prohibits the reuse of data for other purposes. LOCATION: 3063

The opt-in condition is all too often bypassed by having a user click on an inscrutable legal box. But the “not reusable” clause is very strong: it makes it illegal to sell user data. This keeps it from the data brokers whose dossiers feed toxic e-scores and microtargeting campaigns. LOCATION: 3065

Finally, models that have a significant impact on our lives, including credit scores and e-scores, should be open and available to the public. Ideally, we could navigate them at the level of an app on our phones. In a tight month, for example, a consumer could use such an app to compare the impact of unpaid phone and electricity bills on her credit score and see how much a lower score would affect her plans to buy a car. The technology already exists. It’s only the will we’re lacking. LOCATION: 3068

While Big Data, when managed wisely, can provide important insights, many of them will be disruptive. After all, it aims to find patterns that are invisible to human eyes. The challenge for data scientists is to understand the ecosystems they are wading into and to present not just the problems but also their possible solutions. LOCATION: 3093

Like many responsible models, the slavery detector does not overreach. It merely points to suspicious places and leaves the last part of the hunt to human beings. Some of the companies find, no doubt, that the suspected supplier is legit. (Every model produces false positives.) That information comes back to Made in a Free World, where Bernstein can study the feedback. LOCATION: 3108

Another model for the common good has emerged in the field of social work. LOCATION: 3111

NOTE: I guess the problem with all these good works is we don’t know their goodness metric. How to know they are winning. Also why do social people always lag behind businesses. I think they need a metric we can rally around

They found a number of markers for abuse, including a boyfriend in the home, a record of drug use or domestic violence, and a parent who had been in foster care as a child. If this were a program to target potential criminals, you can see right away how unfair it could be. Having lived in a foster home or having an unmarried partner in the house should not be grounds for suspicion. What’s more, the model is much more likely to target the poor—and to give a pass to potential abuse in wealthy neighborhoods. Yet if the goal is not to punish the parents, but instead to provide help to children who might need it, a potential WMD turns benign. It funnels resources to families at risk. LOCATION: 3115

But as I’ve tried to show throughout this book, these models are constructed not just from data but from the choices we make about which data to pay attention to—and which to leave out. Those choices are not just about logistics, profits, and efficiency. They are fundamentally moral. LOCATION: 3128

If we back away from them and treat mathematical models as a neutral and inevitable force, like the weather or the tides, we abdicate our responsibility. LOCATION: 3130

Generally speaking, the job of algorithmic accountability should start with the companies that develop and deploy the algorithms. They should accept responsibility for their influence and develop evidence that what they’re doing isn’t causing harm, just as chemical companies need to provide evidence that they are not destroying the rivers and watersheds around them. LOCATION: 3201

That’s not to say algorithms should be universally outlawed or forced to be open source, but it does mean that the burden of proof rests on companies, which should be required to audit their algorithms regularly for legality, fairness, and accuracy. LOCATION: 3204

Let’s reframe the question of fairness: instead of fighting over which single metric we should use to determine the fairness of an algorithm, we should instead try to identify the stakeholders and weigh their relative harms. LOCATION: 3227

In the case of the recidivism risk algorithm, we’d need to compare the harm of a false positive—someone who is falsely given a high-risk score and unjustly imprisoned—against the harm of a false negative—someone who is falsely let off the hook and might commit a crime. LOCATION: 3229

It’s important to note, as we endeavor to understand relative harms, that they are entirely dependent on context. LOCATION: 3240

For example, if a high-risk score for a given defendant qualified him for a reentry program that would help him find a job upon release from prison, we’d be much less worried about false positives. Or in the case of the child abuse algorithm, if we are sure that a high-risk score leads to a thorough and fair-minded investigation of the situation at home, we’d be less worried about children unnecessarily removed from their parents. In the end, how an algorithm will be used should affect how it is constructed and optimized. LOCATION: 3241

we must also create standards for monitoring algorithms once they’ve been installed, to make sure they are functioning as intended. And although this might sound obvious, we have recently witnessed the result of algorithms that have been given far too much blind trust. LOCATION: 3246

When algorithmic systems like the Midas system contain fatal flaws, whether intentional or not, they end up being worse than the human systems they’ve replaced. And, as we’ve seen repeatedly throughout the book, the resulting pain is not distributed equally, but is rather borne by society’s most vulnerable citizens. The very people that cannot afford to hire fancy lawyers have to go head-to-head with the machine. It’s not a fair fight, and examples like this make a clear case for placing the burden of proof on those designing and implementing the algorithms. LOCATION: 3257

we need to be skeptical about where our data comes from and whether it reflects real life in all its complexity. LOCATION: 3283

In the case of predictive policing, that would mean refusing to conflate arrest data with crime data. LOCATION: 3284

It will be a group effort, and we’ll need as many lawyers and philosophers as engineers, LOCATION: 3311