Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

Saturday, 8 August 2020

Dropping the past

 




Stephen Jones recently put out his list of the best rugby union teams of all time. Like a lot of these 'best of' lists, it doesn't seem to be based on much more than Jones' impressions and memories. His list doesn't go back further than the 1970s. Can we do better?


To start with we'll need to think about what we mean by 'the best.' Does that mean the 'with the best record'? In that case we'll need to bear in mind how much teams played - winning 15 matches in a year is harder than winning 5. Do we mean 'the best relative to contemporary rivals?' If so we'll want to have a sense of how strong the teams of various different eras were. Or do we just mean 'the best at rugby'? The problem with that approach is that it's no fun, since the pro teams of today would clearly have destroyed the amateur sides of yesteryear. 


And anyway, rugby teams play against other teams around at the same time, not against teams from 50 years ago. Supremacy in the present is the name of the game. So let's go with 'best relative to their contemporary rivals.' Note that if we're really focused on who's the best - as in, most likely to win - we'll have to be disciplined and not care so much about who has the most iconic players, who won the most memorable series, who had the most positive cultural impact, etc. Those things might be more important in saying which teams were the greatest, but not so much which were the best.


The next thing to think about is what we mean by a 'team.' Teams change over time. There's probably some form of fancy analysis that could be done tracking the similarity of teams over time, but I don't know how to do it. And some teams are clearly more stable than others. Touring teams and tournament sides probably produce the most similar lineups, since they're drawn from a squad that's brought together for a particular period of time. 


So, for what it's worth, here's my list. 


1. New Zealand 2011-17. An 100% record in tests in 2013, 17 consecutive victories up to June 2014, 1st place in the Rugby Championship six times, plus a couple of World Cups. Thrashed the Springboks 57-0 in 2017.


2. South Africa 1949-52. Won test series against All Blacks 4-0, and then went on their own tour of the five nations, beating them all (including Scotland 44-0) and losing only 1 of their 31 matches overall. 


3. New Zealand 1905-6, the original All Blacks. Toured Britain, France, and North America winning 35 out of 36, scoring 976 points and conceding 59. 


4. England 2002-3. They didn't win the 2002 Six Nations, but won the Grand Slam in 2003. Beat the Springboks 53-3 at home and the Wallabies and All Blacks both home and away. Beat Australia again in the final to take the World Cup.


5. The 1924-25 All Blacks, dubbed The Invincibles' after winning all of their 32 matches (including one against each of the home nations). Points for: 838. Against: 116.


6. The 1937 Springboks, also dubbed 'The Invincibles,' slightly less deservingly, after suffering only two losses on a 29-match tour of Australia and New Zealand. 


7. New Zealand in the late 60s. A series victory against the Springboks in 1965 kickstarted a 17-match winning streak that was ended only in 1969 by Wales. 


8. South Africa 1995-1998. After the World Cup victory they lost a test series at home to the All Blacks for the first time in 1996, but they then swept the Tri-Nations in 1998, winning 17 consecutive matches. 
 
9. Australia 1999-2001. Two Tri-Nations victories following on from the 1999 World Cup. 


10. Wales in the 70s. Won 7 Five Nations championships including 3 Grand Slams. Lost both tests against NZ in 1969, and could only draw against South Africa the following year. Lost to NZ again in 1978. Formed the core of the British Lion team that won the test series in NZ in 1971.


Friday, 19 June 2020

Beta's colander


Buy SWHF Stainless Steel 5 L Jumbo Colander and Strainer Online at ...
This is a classic titanic post, because (as so often) I don't really know what I'm talking about. Through the lockdown I made a half-hearted attempt to learn R, the leading statistical programme. As usual, what caught my eye was an ancient Greek. 

Specifically, Eratosthenes of Cyrene. Eratosthenes is one of the great figures of Hellenistic Alexandria. He was head of the library there, the highest intellectual position of the day. He's best remembered today for estimating the circumference of the earth to an astounding degree of accuracy, but he was also a polymath interested in literature, music, and, as we'll see, mathematics. He was called 'Beta' by his peers, not because he was on Reddit, but because he was the second best at everything.

Not too far into my studies in R, I came across Eratosthenes' sieve, a simple algorithm that's often used as an exercise for coding students. What follows is an attempt to unpack one way of coding Eratosthenes' sieve in R, with some help from a friend and some code I found online. I struggled with it a bit myself, which should hopefully put me in the ideal position to explain, since I can remember what confused me (everything), and nothing about it seems obvious to me. 

I'll start with the sieve itself. The point of it is to find all the prime numbers up to a certain point (let's say, up to 100). Starting with 2, what you do is to move through the rest of the numbers crossing out all of the multiples of 2 (since if they're multiples of any other number higher than 1 they can't be prime). Then you do the same with 3, 4, and so on, until you can't do it anymore, since all the multiples of the number you started with have already been crossed out (as multiples of an earlier number). Then you can look at all the numbers that haven't been crossed out. Those are your primes. 




I guess it makes sense as an exercise for beginner coders because it's a pretty easy to understand algorithm, a set of steps for a brain or computer to work through to get a set of outputs (here, prime numbers).


So - as previously advertized - below is one way of getting the R programme to become, for a thrilling and infinitely repeatable moment, the head librarian of Alexandria. After the code itself I'll break it all up into bits, accompanying each bit with some comments that hopefully a) are correct and b) help you understand what's going on. Code is bold, the commentary (a highly Alexandrian form) not. The commentary is talking to R and telling it what to do (imperative mood).



sieve <- function(n) {   
if (n < 2) return(NULL) 
  a <- rep(T, n) 
  a[1] <- F 
  for(i in seq(n)) { 
    if (a[i]) { 
      j <- i * i 
      if (j > n) return(which(a)) 
      a[seq(j, n, by=i)] <- F 

   }
  }
}

sieve <- function(n) { 
Make "sieve" a function that does the below to a number we'll call "n." (This just names our algorithm and makes it a function, a way of doing things to what's in the brackets that follow).

if (n < 2) return(NULL) 
a <- rep(T, n) 

If n is smaller than 2 then spit out NULL. (In other words, refuse to do this computation if it's on a number of numbers less than 2).

Make a vector called "a" and store in it n repetitions, all marked "true." (T stands for 'true.' This is like writing out the number in the grid above. Saying they're all true effectively means we're starting with the assumption that they're all prime.)

a[1] <- F 
Store the first item as false in "a." (Because we want to get rid of 1 immediately?)

for(i in seq(n)) { 
Start with i = 1, iterating the code between here and the closing }, incrementing by one each time, until i = seq(n). (Note the opening curly bracket. The code within the for loop, contained between the curly brackets, will run a number of times equal to the number of elements in seq(n), starting with i = 1, and with the value of i increasing by one each time it runs.)

if (a[i]) { 
If a[i] is true, the code within the ensuing curly brackets will execute. If a[i] is false, it won't. (So if an individual number is false, it won't run the code. At this point that's just 1, as specified above, so I think this just stops R running this algorithm with the number 1. That's actually important, because if it did it would cross off all the numbers and we wouldn't get any primes; another way of looking at this is that being a multiple of 1 doesn't mean a number isn't a prime, and the algorithm needs to recognize this.)

j <- i * I
This finds the square of i and stores it in variable j. Remember we're in the for loop, with incrementing each time the loop repeats. The first time round i = 1 and j = 1; the second, i = 2 and j = 4; the third, i = 3 and j = 9; and so on.

 if (j > n) 
If (and only if) the square of i is greater than the number of elements in the entire sequence...(i.e. that number times itself is larger than e.g. 100...)

return(which(a)) 
Return the sequential positions of all those numbers in a which are true (T). (The idea here is that if we've reach a number whose square is greater then the number of numbers in the sequence, e.g. 100, then we've found all the primes already. I guess this is just a separate assumption that happens to be true?)

a[seq(j, n, by=i)] <- F 
Mark as F (false, i.e. not prime - in other words, cross out) all the numbers in a between the jth and nth (the nth being the last) that are divisible by (that's what 'by' does) i (and thus aren't prime). This line is the heart and soul of the code, the steely essence of Beta's colander. 

OK, I'm still not sure I understand all of that, so if you want to try to help out in the comments be my guest. 

Saturday, 16 May 2020

Wealth and health


One of the tropes of the current crisis has been that to re-open the economy is to sacrifice lives on the altar of profits. It's also been widely pointed out that this is a false dichotomy. In fact, wealth and health tend go together. Economic downturns lead to deaths as predictably as viruses. In this post I just want to re-state this view one more time, since I think it's a crucial one to grasp if we want to react sensibly to this (or any other crisis).

As the above graph suggests, there's a positive association between longevity and GDP. GDP and child mortality seem to be inversely related. Deaths from the five most lethal infectious diseases have declined as the global economy has grown.

Correlations like this aren't a slam-dunk case that wealth causes health. Studies that have looked at the association in detail have found it to be a slightly complicated one. But even if the exact causal mechanisms at various stages of growth can be difficult to disentangle, the basic picture seems clear: wealth and health tend to go hand in hand. That's the case not only if you look at individual countries through time, but also if you look the set of countries at a particular point in time and compare the well-being of people in richer and poorer parts of the world (even controlling for other factors).

Moreover, unlike in the case of spurious associations (scores for M. Night Shyamalan's films going down on Rotten Tomatoes in line with newspaper sales, for example), it's not hard to think of reasons why these two variables might be linked, and why the wealth of a country might help its people be healthier. Richer countries can give more funding to health services. They can invest in better-quality housing, safer infrastructure, and a more comprehensive social safety net. Its citizens are wealthier, and they can spend more money on their well-being.

So far we've been looking at the positive side of the story, with better wealth being associated with health. But there's also a dark side to the association, with poverty being associated with disease and shorter life-spans.  You can see this effect with economic downturns even in the rich world: opioid deaths rose by 85% in parts of the US where car factories had closed down (and here too it's easy to think of how this might have happened, with unemployment leading to despair and addiction). A 1% rise in the unemployment rate makes working-age men 6% more likely to die of any cause.

And that's in the rich world. Economic growth is even more vital to the developing world, since increases in wealth just make the citizens of rich countries even healthier, whereas people in poor countries live much closer to death and disaster. They're highly dependant on trade and exchange with the rich world. It's no surprise that UNICEF is now predicting a 45% rise in child mortality because of lockdown-related disruptions.

The relationship between economic downturns and health does have its complications. Some studies suggest fewer people die in the actual course of recessions than normal (although more people die of particular causes, like suicide), but there are lasting health costs over the longer term. So even if the coronavirus lockdowns are followed by a V-shaped recovery, with locked-up demand immediately bursting out again, then we might expect our health (on average) to be slightly worse than it otherwise would have been over the next few decades.

There probably are a few people out there who would put profit for themselves ahead of other people's lives. But most of the people raising warnings about the lockdowns are probably just trying to draw attention to the harm we can do to ourselves if do too much damage to the economy. Whether Covid-19 is dangerous enough to justify the public-health costs associated with the economic downturns that are now starting to bite - that's a different question, and one that's best left for another day (and maybe even another website). But it's not a simple question.

'Your money or your life?' isn't a question that, in ordinary circumstances, anyone would want to be asked. But it's actually a much easier question to answer than the one many countries are faced with now.






Saturday, 4 April 2020

What are my chances like?

One evening in that bygone age known as a few weeks ago I was at a dinner party, and the conversation turned to this new virus everyone was making such a fuss about, the coronavirus (or the novel coronavirus known as Covid-19, to be completely accurate). Like a lot of other people, I'd looked at this breakdown by age groups of people who were known to have died from the virus in China. So I told the other people at the dinner party, 'In our age group, we have a 0.2% chance of death. You probably have about as much a chance of dying going scuba-diving or sky-diving - maybe even riding a motorcycle. We're just not that good at comparing risk.'

I was dead right about that last point, and I even managed to exemplify it in the examples I gave, which were, as I later discovered, well off the mark. It turns out that the 2-in-1000 chance you probably have of dying if you get coronavirus before your forties is much higher than the 1-in-about-34 000 chance you have of dying while scuba diving, or the roughly 1-in-100 000 chance you take skydiving. As for riding a motorcycle, it's only if you were doing it in a race that you'd approach the kind of danger you'd face as a thirty-year-old with coronavirus - with a 1 in 1000 chance, you'd only be about half as likely to die as a thirty-year old with Covid-19.

So what kind of thing does give you a 0.2% chance of dying? And, come to that, what sort of activity puts you at something less than a 1% chance of dying, which is starting to look like the best bet for the overall case fatality rate for Covid-19?

The best match for a 2-in-1000 chance (at least on this pretty well-sourced chart I found online) seems to be hang-gliding, whose 1-in-560 risk makes it just slightly less deadly than that. Four bouts as a boxer will get you somewhere near the same amount of risk.

(I'm assuming the chance accumulates when it comes to activities like this, rather than decreasing as it does in the case of multiple coin flips. Dying of a head injury is unfortunately something that increases the more times you get punched, whereas what side a coin lands on isn't affected by what side it fell on last time.)

The closest comparison for the less than 1% risk that a random person with Covid-19 has is Formula 1 racing, which 99 our of every 1000 people are also going to survive. (Of course, those figures likely apply to experienced race-car drivers - if you just jumped into a McLaren and had a go at the next Grand Prix your chances would probably be less than that.) Something that has about twice the risk factor as getting Covid-19 is base-jumping, which has about a 2% fatality rate.

So, if you're under 40, your chances of dying from Covid-19 if you get it are about the same as if you went hang-gliding once (probably assuming you've done it a few times before), or took part in four boxing matches (again, assuming you're not completely untrained). Across all age-groups, getting Covid-19 is about half as likely to kill you as jumping off a cliff or tall building (with a parachute), and presents you with something very like the risk that Lewis Hamilton faces every time he competes.

In conclusion, I clearly under-estimated the risk that even a 0.2% chance of death represents. It's a significantly bigger risk than something like going deep under water with an air-tank strapped to your back. Having said that, that doesn't necessarily mean that it's very dangerous - it may just mean that scuba-diving was even less of a risk than I'd previously thought it was. We don't call off motorcycle racing or Formula 1 because 0.1 or 1% of the competitors are probably going to die (although we have now called them off to stop the spread of Covid-19, interestingly).

Of course, it needs to be noted that these figures represent the (best guess at) the risk for someone who actually contracts the virus, and lots of people won't. (How may will and won't get it over the next few months is something else the experts aren't quite sure about, but I don't think anybody's predicting an 100% infection rate.) It's also worth bearing in mind that these are average figures, and your risk will be higher or lower depending on your other risk factors (especially if you already have other medical conditions).

But, on the whole, it's interesting to think that the overall chance of dying if you get Covid-19 is somewhere between a motorcycle racer's and a base-jumper's. Motorcycle racing and base-jumping aren't activities that we ban, though they're also things that you'd probably want to avoid, unless you were a bit foolhardy. And they, of course, don't really increase the death-risk for anybody else (except maybe rivals you might take out in a crash during your Superbike race).