6 February 2021

The Number Bias (2)

Tag(s): Politics & Economics, History, Pedantry

Last week I blogged on an excellent book on the importance of numbers, The Number Bias by Dutch econometrist Sanne Blauw.[i] I explained how often an abstract concept is converted into numbers and then consolidated into one number with potentially misleading consequences. This week I became aware of a particularly significant example of this in understanding the pandemic. Huge emphasis is placed on the “R number”. Most people would probably understand this is a measure of how many people, on average, a person with Covid-19 is likely to infect. It is one of the key factors in determining whether lockdown can be lifted. But how many of us know how it is calculated? Instead of being derived from some data-intensive formula, it’s the product of negotiation.

Every week a group of academics from 11 different institutions meet online, each making a case for what he or she suspects the reproduction rate to be. These judgement calls can vary widely and are then manipulated into a sort of average. R is therefore an educated guess. No other leading nation lets it dictate policy in this way but having from the beginning told us that they follow the science, our ministers have settled on this one so-called scientific metric to shut the critics up. But there is nothing scientific about an educated guess and giving it far more relevance than it deserves.

Similarly, Boris has been reported as saying that the new variant of the coronavirus is “more deadly”, with the mortality rate being “30%” higher. The survival rate of the new variant appears to drop from 99% to 98.7%, a difference on the relatively small numbers involved to be insignificant, though 1.3% mortality rate is 30% higher than 1.0%! But there is a problem with this analysis of mortality. Every day we are told how many deaths were reported that day. Even this gets mis-reported as many politicians and journalists will say how many people died of the disease that day. No, it is how many deaths were reported that day. The death may have ben a few days before. But the actual definition of how many deaths were reported is where the person had had a positive test for Covid-19 in the previous 28 days. But that is not the same as saying the cause of death was Covid-19. By this definition the UK has the highest rate of death per head of population of all countries in the word with a population of over 20 million. But is this likely? Certainly, I would expect the UK to have the highest rate of death among large countries in Europe because we know from independent research conducted before the pandemic that our Health Service was the least prepared of any country in Europe. But compared with developing countries? Age would be a factor as in poorer countries in Africa for example, the life expectancy is considerably lower, and we know that most deaths from Covid-19 are among the elderly. Still, it is unsatisfactory to report a death from Covid-19 without showing true cause.

In my blog The Merchants of Doubt 11^th February 2012[ii] I discussed the way the tobacco industry colluded to mislead the public about the causation of lung cancer by smoking. They paid scientists and PR executives to use doubt as their product. I based my blog on the book Merchants of Doubt by Naomi Oreskes and Erik Conway[iii] which Ms Blauw also uses along with many other sources in her chapter entitled Smoking Causes Lung Cancer (But Storks Do Not Deliver Babies.)

In 1953 Ernest Wynder and colleagues published the results of an experiment involving painting tar from cigarettes on the shaved backs of mice. Only 10% of the mice were still alive after 20 months while no incidence of cancer was found in the non-tarred control group. Shares in leading US tobacco companies like Philip Morris & Co suddenly plunged in value. So, in January 1954 the big tobacco manufacturers launched the Tobacco Industry Research Committee. In full page advertisements in over 400 newspapers, they assured the public that their products were not harmful.

The same year Darrell Huff published How to Lie with Statistics which became one of the most popular books about numbers ever. Huff was no statistician, but a journalist with an irrepressible curiosity. He pointed out many of the tricks that crooks, and dishonest politicians, use to mislead, but he also pointed out another classic mistake: confusing correlation with causality. For example, there may be a link between babies and storks. “Big houses attract big families, and big houses have more chimney pots on which storks may nest.” But obviously the babies are not delivered by black and white birds.

This causality error is particularly prevalent in news stories about food and drink. When as a news consumer, you can no longer have total confidence in journalists and scientists, how can you separate fact from fiction? How do you know whether smoking causes lung cancer, for instance?

Correlations may just be a coincidence; there may be a factor missing that would explain the causation; or there may be a reverse causal relationship. Often reporting is based on relative risk rather than in absolute terms. The eating of processed meat was reported as leading to a 20% increase in bowel cancer, but what this meant was an increase from 5% of the population to 6%, so the actual increase was that one person in a hundred was more likely to get bowel cancer.

Incredibly Darrell Huff testified in 1965 to the American Congress in a hearing about cigarette advertising and packaging. The last thing you should do, he said, was to confuse the correlation between smoking and bad health with causality. It seems extraordinary to me that the senators and congressmen on the panel would not be suspicious of an author of a best-selling book telling you how to lie with statistics. But then perhaps some of them represented areas where tobacco was grown, or cigarettes manufactured. And in those days the media relied heavily on tobacco advertising. Huff later admitted that he just did it for the money, as did all the scientists and PR executives who lied about the causes of lung cancer.

One of the most dangerous ideas of our time is Big Data. Numbers are important but more important is to understand them and how they are collected and analysed. Increasingly that is being done through the internet and automatically. We standardise and collect more than ever. Per minute, Google performs 3.6 million searches, YouTube plays more than 4 million videos, and Instagram users post almost 50,000 photos on the platform.

Algorithms decide which search results you get on Google, which posts you see on Facebook and who pops up on your dating app. An algorithm is simply a number of steps you take to reach a particular goal. This might be a credit score, but this is a prediction, not a fact. It’s like the mistaken interpretation of intelligence quotients that I discussed in my previous blog. It is trying to make an intangible concept into mathematical reality.

Big data can have murky origins. Companies no longer think they must talk to you to find out about you. It is all on the internet. But much of the stuff on the internet is junk. Between 2009 and 2010 there appeared to be 17,000 pregnant men in the UK. The code that registered their medical treatment had been mixed up with that of an obstetrics procedure. An obvious error like that probably had no harmful consequences, but the American Federal Trade Commission noted in 2012 that in its sample a quarter of all people found errors in their credit reports from one of the three big bureaus. For many that meant they had wrongfully paid a higher interest rate on loans. And worse than this is the hacking of this data and its sale to fraudsters.

And then there is Facebook. In 2015 Facebook secured a patent to use your social network for calculating credit scores. The rationale behind this? if your friends have a bad credit history, you too can probably not be trusted with a loan. We are back in the region of confusing correlation with causality, and indeed of prejudice.

The numbers that should have captured reality have replaced it. People with particular characteristics find it more difficult to get loans than others, landing the people in poverty more quickly, making it even harder for them to get a loan, which accelerates their poverty, and so on. Algorithms like this become a self-fulfilling prophesy.

Each algorithm tries to optimise something. YouTube, for example, wants you to carry on watching for as long as possible, because that brings in revenue via advertisements. Whether a clip is truthful is of less importance. The Guardian reported that the platform recommended videos describing the earth as flat or revealing that Michelle Obama is a man. “On YouTube fiction is outperforming reality,” their researcher said.

Algorithms will never be objective, however reliable the data may be and however advanced artificial intelligence becomes. When we forget this concern, we leave the moral decisions to people who happen to have a talent for computers. And, while they are programming, they will decide what is good and what is bad.

I have blogged before about the importance of numeracy and on my idea that all politicians and journalists should take a basic course in statistics before being allowed to make political decisions or judgements. Ms Blauw used to think similarly but now argues that it is more complicated than that. Yes, it’s true that a large proportion of the general public lack confidence in their ability to handle numbers but there is also a question of psychology, gut feelings if you like.

There are American conservative farmers who deny the existence of climate change, but who take all kinds of measures to protect their business against the effects of a changing climate. This seems irrational but much can be at stake if you alter your convictions. The farmer who suddenly believes in climate change is given the cold shoulder by his family and friends. The truth will have to wait.

If you encounter a number, don’t stop and just accept it, but go and explore. Search – on- or offline for people who look at the number from a different angle. Accept uncertainty and watch out for a conflict of interest. Ask yourself the following six questions:

Who is the messenger?
What do I feel?
How has it been standardised?
How has the data been collected?
How has the data been analysed?
How have the numbers been presented?

P.S. In my previous blog on this subject I referred to a reference in the book to the quotation "Not everything that can be counted counts. Not everything that counts can be counted" rumoured to be by Albert Einstein. One of my regular correspondents has kindly informed me that it was by William Bruce Cameron in "Informal Sociology: a casual introduction to sociological thinking" 1963, Random House. New York.

[i] I am grateful to a regular correspondent who sent me the link to a TED talk by Ms Blauw which I thoroughly recommend. https://www.sanneblauw.com/contact

[ii] The Merchants of Doubt 11^th February 2012 https://davidcpearson.co.uk/blog.cfm?blogID=180

[iii] Merchants of Doubt, Naomi Oreskes and Erik Conway Bloomsbury, London, 202

6 February 2021

The Number Bias (2)

Blog Archive

David's Blog