RANDOM SAMPLE BLUES

Posted in Rants on October 2nd, 2016 by Ed

The constant complaints from the right about the inaccuracy of polling reflects a deep-seated distrust of Math and Facts and Science that happens to disappear completely whenever the polls say they're doing well. It has been especially amusing this year to listen to their conspiracy theories, since anyone with even a passing understanding of the process (and challenges) of conducting polling today would realize that if anything they are likely to oversample the most likely Trump supporters.

I've always defended, and will continue to defend, the fundamental reliability of the polling. Is every polling firm totally above board and in compliance with AAPOR standards in every single poll they conduct? Of course not. There is always going to be a "pay the piper" element to polling conducted on the behalf of media outlets, parties, campaigns, and politically active groups. But taken as a whole, polling gets it right. Since 2000, polling has done an almost eerily good job of predicting election outcomes provided – and this is an important caveat – we are willing to accept "too close to call" as a valid result. One of the reasons that aggregate polls have become so popular as a predictive tool on sites like 538, Real Clear Politics, and Pollster.com is simple: it's really quite accurate.

Most polling agencies now incorporate some measure of online-administered responses in their samples, but telephone polling of both landline and cell users remains the backbone of the industry. And polling industry people will tell you that random samples do not materialize from random phone dialing. In theory it should, but here's the thing: while any phone number has an equal chance of being dialed by a random number generator, there is a bias to who actually 1) answers the phone and 2) stays on the line to participate once they realize that this is either telemarketing or a survey. One of the reasons pollsters have such a hard time getting young people, low income people, blacks and Hispanics, and other demographic groups in their samples is that those people are the least likely to stay on the phone and participate even if the survey team manages to reach them. For younger and cell-only users, they're unlikely to even answer when they see an unrecognized number.

So, to produce random samples survey researchers resort to a lot of…statistical adjustments. Weights are given to respondents until the sample roughly approximates their target population. Firms differ in whether their "ideal" sample is the population as a whole or their best guess at the demographics of the electorate that actually shows up to vote (different guesses about which groups of people will turn out and in what numbers are one of the key reasons that polls often vary slightly among firms). They get away with this largely because 1) Young people generally do not vote, so getting a sample with few young voters is survivable, 2) African-Americans and Hispanics are pretty monolithic in their Democratic preferences, so a small number can be weighted to represent a larger population without much error, and 3) old white people are both the most likely to respond when phoned and the most likely to vote.

Why is it that the majority of people responding – not being called, but actually responding – to polling calls are old? Well. They're more likely to be home when a pollster decides to call, especially after retirement. They're more likely to want to vent their opinions at someone. They're less likely to reflexively hang up on an obvious cold call because they developed their phone habits in an earlier era. And, I'm sorry to say, a lot of them are just lonely and want to talk to someone. The challenge of polling in this era certainly is not "How do we get enough likely Trump supporters in our sample, especially white ones over 55?" The challenge is getting anything other than that.

To that end, right wingers who complain about polling are correct: the unweighted sample drawn by polling firms is not truly representative of the demographics of the nation or even of the electorate. It is, if anything, over-representing the people who are most likely to be conservatives today.