The Role of Data in Sorting Humanity

From a story on by Chris Wiggins and Matthew L. Jones headlined “From Eugenics to Targeted Advertising: The Dark Role of Data in Sorting Humanity”:

When the gentleman-​scholar Francis Galton inspected his fellow men of Victorian Britain, he found them wanting: “We want abler commanders, statesmen, thinkers, inventors, and artists,” he wrote in an article called “Hereditary Talent and Character.” “The natural qualifications of our race are no greater than they used to be in semi-​barbarous times,” even though “the conditions amid which we are born are vastly more complex than of old.” Modern civilization was all too much. “The foremost minds of the present day seem to stagger and halt under an intellectual load too heavy for their powers.”

Genius was needed, but was in short supply. Education would never be enough, for there were simply not enough gifted men and women, not enough born geniuses, to confront the complexity of the times. England needed more geniuses, more people of extraordinary talent. They needed, Galton decided, to be bred.

The writings of Galton’s illustrious if infamous cousin, Charles Darwin, offered a way forward. In his Origin of Species, Darwin used the example of human breeding of domestic animals, such as show pigeons and pedigree dogs, to motivate his account of natural selection. Just as human breeders select features they desire, certain features of animals are selected for, as it were, over time in particular environmental niches.

Humans, in Galton’s estimation, underestimated their own power to affect their species. “The power of man over animal life,” Galton explained, “is enormously great. It would seem as though the physical structure of future generations was almost as plastic as clay, under the control of the breeder’s will.” Not just physical traits, but equally the mind could be altered: “It is my desire to show more pointedly than—​so far as I am aware—​has been attempted before, that mental qualities are equally under control.”

Galton soon coined the term “eugenics” to describe the conscious effort to improve the quality of human beings—​and national races of human beings in particular. Eugenics quickly became central to many left- and right-​wing political programs across Europe, the United States, and the world. Racist to its core, to be sure, Galton’s primary focus nevertheless was class. His suggestions often were whimsical, especially in comparison with the forced sterilizations and genocides associated with eugenic programs to come outside of Britain:

Let us, then, give reins to our fancy, and imagine a Utopia—​or a Laputa, if you will—​in which a system of competitive examination for girls, as well as for youths, had been so developed as to embrace every important quality of mind and body, and where a considerable sum was yearly allotted to the endowment of such marriages as promised to yield children who would grow into eminent servants of the State.5

Unlike many philosophers and economists of his time, Galton was fundamentally anti-​egalitarian. “I object to pretensions of natural equality,” he wrote. “I have no patience with the hypothesis occasionally expressed. . . . ​that babies are born pretty much alike, and that the sole agencies in creating differences between boy and boy, and man and man, are steady application and moral effort.”

All people were not created equal, Galton insisted, and not all market agents had comparable mental capacities. Liberal political thought and liberal economics were just wrong in his estimation.

We associate eugenics and scientific racism with the far right, with Nazis. Things were otherwise around 1900. Many progressives as well as conservatives up to World War II saw science as capable of improving the human lot by improving the human race; indeed, one proponent noted, belief in eugenics offered “a perfect index of one’s breadth of outlook and unselfish concern for the future of our race.”

Statistical sciences were to replace the bigotries of old with evidence-​based new sciences of human improvement: accounts of natural human hierarchies that moved easily from description to prescription.

To improve the species, Galton needed to explore the wellsprings of talent and human excellence. Nurture was no explanation. Using biographical dictionaries of great men and women, Galton began investigating the density of talent and genius within families. In his long study Hereditary Genius of 1869, Galton studied prominent families and compared historical states with those around the earth. Despite the large number of his cases, his approach was intuitive and anecdotal. He generally argued that modern peoples were all lesser than ancient Greeks and that non-​European peoples—​he called them “races”—​lesser than European ones.

While the approach in his book is largely anecdotal, Galton drew on Quetelet’s normal curve to support his new ideas of ranking people and races. Quetelet used the normal curve to understand the qualities of a group as a whole. Galton used the same curve to understand variation within a group. Quetelet might seek the mean stature of Englishmen. Galton sought to understand the extremes of stature. His quarry was talent, not height, but he applied the same tools to one and other.

The French sociologist Alain Desrosières explains, Galton used the normal curve as “a law of deviation allowing individuals to be classified, rather than as a law of errors.” What astronomers saw as errors to be eliminated, Galton saw as individuals to be ranked and classified. Every child getting test scores listing their percentile performance lives in the world Galton helped create.

And yet there was a major sticking point to all this investigation of excellence in distinguished families. Extremely tall people had tall children, but, on the average, those children were not as tall as their parents, reverting toward a population average height. Similar observations describe a wide range of human and animal traits. For breeders of animals—​human or otherwise—​this was a puzzle, one that would limit attempts to breed supposedly superior human beings. How to understand it? The answer would come from Galton’s reworking of Quetelet’s enthusiastic applications of the normal curve.

Why do offspring of tall parents tend not to be as tall as their parents, and more generally why do the attributes of a human group stay nearly constant over time? Galton came to explain both phenomena through what he called “regression,” mathematically capturing the “tendency of that ideal mean filial type to depart from the parent type, ‘reverting’ towards what may be roughly and perhaps fairly be described as the average ancestral type.”

With his statistical investigation, he discovered a powerful mathematical relationship between the amount of reversion of offspring and the extent of their parents’ deviation from the mean. He not only showed the relationship to be linear, but also undertook what we would call today, thanks to Galton, the linear regression applied to the data, finding the coefficients of a simple linear equation like y = mx+b.

Galton was modeling facets of the process of generation, so his initial work with reversion involved only treating the parental heights as the x’s and only the children’s heights as the y’s, for he was looking at a unidirectional biological process. But he soon realized that his process of regression could be detached from its biological mooring and used on a vast array of data. In investigating the process of “reversion,” Galton had unknowingly hit on a much broader concept, namely that of statistical regression.

Galton did far more than introduce a powerful new approach to modeling data and making predictions from data. Quetelet studied society. Galton studied individuals in a distribution. He wanted better techniques to know and rank individuals and to know and rank races. In studying relations between pairs of attributes, such as the height of parent and child, Galton also introduced “co-​relation,” or as we would now term it: the correlation.

While governments were producing an ever-​increasing number of statistics, they failed to accumulate enough of the data that most interested Galton—​detailed investigations of the “chief physical characteristics” of a wide selection of the population, qualities such as “Keenness of Sight; Color-​Sense; Judgment of Eye; Hearing; Highest Audible Note; Breathing Power; Strength of Pull and Squeeze; Swiftness of Blow; Span of Arms; Height, standing and sitting; and Weight.” So challenging was collecting this data that Galton set up an Anthropometric Laboratory at the International Health Exhibition of 1884 in South Kensington.

The laboratory measured 9,337 people in 17 ways. He explained that “periodical measurements” would be useful to families in tracking their individual development, and to “discover the efficiency of the nation as a whole and in its several parts.” Such records “enable us to compare, schools, occupations, residences, races, &c.” The data produced would continue to be studied well into the twentieth century. Galton’s anthropometry, historian of psychology Kurt Danziger explains, “defin[ed] individual performances as an expression of innate biological factors, thereby sealing them off from any possibility of social influence.”

Galton’s style enabled a dramatic new approach to understanding human differences. Following Quetelet, analysis of data could reveal the commonalities and range of quantifiable human behavior and attributes. And following Galton, each individual could be placed and ranked within those ranges: the top 5 percent, the bottom 10 percent. Inspired by Galton’s work in observing large numbers of human beings, mental tests, for example, emerged from the effort to place each person amid the range of measured human capacities. And entire sciences of examining large numbers of “subjects” in statistical ways emerged in its wake.

“A new method for justifying psychological knowledge claims had become feasible” with the work of Galton and his intellectual successor Karl Pearson, explains the historian Danziger. “To make interesting and useful statements about individuals it was not necessary to subject them to intensive experimental or clinical exploration. It was only necessary to compare their performance with that of others, to assign them a place in some aggregate of individual performances.”

And it didn’t take long for an approach to become big business. While pioneers like Galton struggled to get data at an adequate scale, a vast appetite for such inquiries would soon open, especially in the United States after the First World War.

Above all, Galton revealed how surveying a mass of people makes recognizing—​and targeting—​the individual possible. Lots of data about lots of people allows scientists, marketers, militaries, spies to better know you—​and target you. We live in such a world, where our individuality is quantified in reference to all other users of the internet, and where ad-​serving algorithms exploit this quantification of difference to compete for our attention.

From How Data Happened: A History from the Age of Reason to the Age of Algorithms.

Chris Wiggins is an associate professor of applied mathematics at Columbia University and the New York Times’s chief data scientist.

Matthew L. Jones is a professor of history at Columbia University and has been a Guggenheim Fellow.

Speak Your Mind