Tuesday, 12 February 2008

Some statistics: mortality in reverse

A warning to readers: this is a somewhat morbid discussion. I guess a consultancy at a life insurance company in 2006 has left its mark.

I recently was given a database of deaths containing about 500 deaths covering a district for 17 years in the 1920/30s. I found 2 people on the list who were from one family I'm researching. Only 2 people for an area that included that family's home shtetl! What was going on? Could I extract any more information from the database?

The average age of death was about 65 (excluding deaths over infants less than 12 months old). This means the overall annual death rate should be about the reciprocal of this: about 1.5% annually. This is a fairly crude basis: any actuary interested should let me know how this ought to be done.

Over 17 years this means that 25.5% of adults might be expected to die. So for any number of deaths the one might expect that they come from a group that is about 4 times that number. So 2 deaths in a family means that the size of the family should be about 8.

But this is only a statistical expectation, the real number could be higher (the family were healthy/wealthy) or lower (the family were poorly/poor). There is a strong correlation between wealth and mortality. What could be the range be? Statisticians use confidence intervals to define a range of likely results. I determined a 90% confidence range, using a roundabout method. I concluded that 2 deaths might mean between 2 and about 22 in the family.

I then looked at the numbers for other numbers of deaths over 17 years using the same overall death rate. My estimates are below [sorry about the layout]:

No of deaths .....Expected family size ....Lower limit ........Upper limit

5 ........................20 ....................9 ..................36
10 .......................39 ...................25 ..................58
20....................... 78................... 61.................. 98


Is this useful information? It told me at least that with high probability very few people in the target family had stayed in the home shtetl over this period - probably around 8 and almost certainly fewer than 22. This suggests that my failure to find significant numbers of people from this family in the home district in the interwar period and in Holocaust records (compared to the hundreds elsewhere at the time) is not a result of poor research technique - almost everyone had gone.

If anyone can suggest a better actuarial approach to this problem, please let me know.