Technological solutions such as machine learning statistical modeling are limited by the quality of the data in which they are based (a scientific methodological point typically overlooked by programmers and armchair epidemiologists :):) ). and in this case the tables of coronavirus statistics are practically useless without understanding how they were derived.
Numerology : Mistaking a numeral — a squiggle on a screen or a piece of paper — for a fact of the world.
APR. 4, 2020, AT 1:11 PM
Coronavirus Case Counts Are Meaningless*
*Unless you know something about testing. And even then, it gets complicated.
If you’re a regular reader of FiveThirtyEight, you’re probably used to looking at data in sports — where basically everything that happens on a basketball court or a baseball diamond is recorded — or in electoral politics, when polls (in theory, anyway) survey a random sample of the population. COVID-19 statistics, especially the number of reported cases, are not at all like that. The data, at best, is highly incomplete, and often the tip of the iceberg for much larger problems. And data on tests and the number of reported cases is highly nonrandom. In many parts of the world today, health authorities are still trying to triage the situation with a limited number of tests available. Their goal in testing is often to allocate scarce medical care to the patients who most need it — rather than to create a comprehensive dataset for epidemiologists and statisticians to study.