Maintaining Optimism in the Face of Reality. Occasional observations on the state of the world, society, business and politics. Usually anchored by facts, always augmented by opinion.
"Report says Problems Led to Skewed Polling Data," reports the NY Times. The exit poll results had the biggest partisan skew since 1988, the first year the started keeping track. I think that's a nice way to put it, rather than saying they were the least accurate since 1988, which would seem to be the same thing, and a much clearer way of putting it. Results subscribers like the Washington Post's managing editor are griping about them; the organization conducting the polls is blaming subscribers for leaking the data early. Of course, the job of a polling organization is to provide information, not raw data, in my opinion. Because of the enormous skew in the early exit poll results, Tony Blair probably had a fitful night of sleep, having gone to bed thinking John Kerry would win the election. (Which just goes to show you that if you think things are bad, maybe they'll seem better after a good night of sleep.)
The whole challenge of exit polling for the election nearly borders on insurmountable, yet despite the problems, there is an obsessive desire to have it: one more number, one more seemingly objective quasi-fact that doesn't demand interpretation (even though it requires interpretation). For the media, exit polls are content, just like the various "instant polls" that were run during the debates; even as the media would say that the instant polls were horribly biased, they didn't stop showing them; it livened up the screen too much.
The truth is that most newscasters, and certainly most of the general population, are ill-equipped to interpret streaming, raw exit poll data. Do you know what demographic categories are more likely to vote early in the day versus late in the day? The first exit poll numbers (circulated almost instantly across the blogs after being leaked by someone within the press corps) I looked at were obviously meaningless, demographic skews suggesting that women voted at a significantly higher rate than men, and candidate results that, had they been accurate, would have been in significant contrast to the hundreds of state-level polls done leading up to the election. But, in the rush to report, these sorts of details are missed, ignored really. And honestly, portions of the blogosphere accept any information, occasionally even satire, as fact, if it squares with their beliefs (or desires).
There are several fundamental concerns with exit polling, especially as tools for predicting the end-of-the-day election result; most particularly when one considers the voting patterns in the U.S. and the nature of this election, which more than red-state/blue-state was very much urban versus suburban, exurban and rural. While a random-digit dialing poll can easily sample people from the entire geography of a state, exit polls cannot access such a sample so easily.
How many precincts can you practically survey when you need a human at each location? According to their FAQ, these exit polls watched 1,480 precincts across the United States. If you rearrange those numerals to 4,180 you have the total universe of voting precincts, in Minnesota. Ohio has 11,360 voting precincts, Florida 7,310, and Pennsylvania 9,442. California has 24,035.
If you were to have assigned a poll watcher to my precinct in Minneapolis, and projected the state results from that, I'm sure Kerry would have been projected north of 85%. If you had the poll watcher in a southwestern suburb of Minneapolis, it might have shown a 70% result for Bush. If you put one up in a precinct along the Canadian border, it might be 50-50 or it might be 65-35, either way, depending on the precinct. Trying to model statewide voting behavior from a small selection of precincts is very challenging, both in precinct selection, and in the potential magnitude of errors.
It's kind of amazing that exit polls were placed at only 50 locations in Ohio. (Or so I infer from their planned placement at 50 locations according to the Cincinnati Enquirer.) That's 50 out of 11,360 less than one-half of one-percent of the precincts.
In response to the question of how the lucky precincts get chosen:
The polling places were selected as a stratified probability sample of each state. The purpose of stratification is to group together precincts with similar vote characteristics. A recent past election was used to identify all the precincts as they existed for that election. The total vote in each precinct and the partisan division of the vote from this past race are used for the stratification. In addition, counties are used for stratifying the precincts. The total vote also is used to determine the probability of selection. Each voter in a state has approximately the same chance of being selected in the sample.While the word "stratification" certainly adds to the apparent credibility of their methods, without knowing which "recent election" they may have used, as well as whether they used a uniform national election or selected state or local races, if they could reasonably stratify the precincts in such a way that they could pull a decent sample from the strata. Particularly well enough to try analyze a state as truly varied as Ohio from 50 precincts, 10 precincts from 5 strata, from which they would be interviewing 2020 individuals, or only 40 per precinct on average. [CBS' exit poll results include the Geo Strata for each exit poll]. Also, "total vote is also used to determine the probability of selection" suggests logistical convenience encourages selecting precincts with a larger voting population, which would almost certainly imply that there would be systematic bias towards doing exit polls in urban areas.
(If you care to read some more general puffery about the polling, read AP's chest-pounding discussion of just how great they are, and how great the exit polling will be. My favorite thing is the telephone polling to estimate the absentee vote in various states. I had an issue with this when I first read of it in early October: "Since You're Not There, Can We Ask You Some Questions?" This is related to the military absentee vote as well: "Bush's Hidden Support in the One Real Polling Gap." An AP note on Nevada's exit poll methodology more openly acknowledges certain weaknesses (Nevada had exit polling at 40 locations).
Since I live here, let's look at Minnesota to highlight some of the methodological challenges and flaws. Minnesota's exit polls (via CNN) show that 34% of the sample is from "smaller cities", 41% from suburbs, 23% from rural areas and 2% from "small towns." I use the scare quotes, because I am not sure what their breaks are, precisely. The thing is, though, while the Twin Cities metro area is about 3.3 million people, we only have two cities in Minnesota with populations over 100,000: Minneapolis and St Paul. Rochester and Duluth (not part of the metro, if you don't know these parts) are both right around 90,000 each. The next six largest municipalities in the state are suburbs. (We are clearly not an annexation-oriented state). [Factmonster and CityPopulation]
So, the four largest cities in Minnesota only total up to about 832,000 people, or roughly 16% of our 5 million state population. If you subtract the populations of Minneapolis and St Paul from the metro area population, to get the size of our suburban population, you're left with about 2.6 million people, or over one-half the state. So with 34% of the sample from cities, they are massively oversampling the 16% of the population while significantly undersampling the suburbs (41% of the sample versus 52% of the actual population). I also suspect (but need to find their segmentation criteria) that small towns were under-represented while the rural population was similarly over-represented.
Of course, you would be right to point out that I am comparing population figures to voter turnout. Possibly urban voters turnout at higher rates than suburban voters? (Surely you jest?) On the contrary, at least looking at the results in Hennepin County, which contains Minneapolis as well as collection of first, second and third ring suburbs with varying demographics from working class to extremely affluent (most of the suburbs are relatively affluent by national standards).
In any event, using the baseline of voters registered at 7AM on Election day, Hennepin county had voter turnout of about 93%. Not bad. However, turnout in Minneapolis proper (the city) was only 84%. My precinct (which actually went for Kerry 86-13) had only 77% turnout, while my former precinct, in Linden Hills, was 93%. However, if you examine comparable numbers for the suburban communities, the lowest suburban turnout in Hennepin county was 94%, ten solid points higher than Minneapolis proper, aggregate voter turnout within the suburbs ran 98%. Again, do not read too much into this as an absolute figure, however, I imagine getting voting age population figures by precinct would produce similar numbers, namely that suburban voters actually voted at approximately a 17% higher rate than urban voters.
Twin Cities locals may find it interesting that while 13th Ward Council Member Barret Lane takes some (very justified) pride in Ward 13 (of which Linden Hills is a part) having the highest voter turnout in Minneapolis, at 92% using the metrics here, it still has lower turnout than any of the Hennepin county suburbs, including Brooklyn Center, Brooklyn Park, Crystal, Robbinsdale as well as Minnetonka, Maple Grove and Plymouth.
If this is the case, then the exit polls were well past doubling the representation of the urban areas, it would seem. But even if they corrected the balance, the suburbs don't all vote the same way, by any means, so it is kind of a hopeless situation for exit polling, most especially on the predictive side of things. (Although the somehow make enough errors to get close, by which I mean almost none of their subcategories correlate with expected results based on census data, other polling, etc, but they still get within a point of the total number.)
The networks were alerted to the bias fairly early and thus held off calling any state that wasn't a foregone conclusion (e.g. Texas for Bush or NY for Kerry), which is probably for the best, given their 2000 debacle. (Although NBC and Fox upset some folks by calling Ohio for Bush and raining on their parade.) But, what's the big deal? The nation knew the result (other than what Kerry might do when he woke up) by 2AM Central based on actual vote counts. If staying up until 3AM (worst case, if you are on the East coast in an election this tight) once every 4 years to find out who got elected is just too much to ask, then maybe you should work on your patience. In 1968, we had to wait that long to find out whether it was Nixon or Humphrey.
So what good are exit polls?
Cynical answer: they serve the same purpose as the statistics prattled about by baseball commentators in the huge gaps in the action. Honestly, "Bill, this batter is 2 for 7 against left-handed pitchers when playing at home," is really not much different than, for example, "Dan, Bush is leading Kerry 91% to 9% among white conservative protestants in Minnesota," or "Tom, 77% of the almost one-quarter of Minnesota voters who cite 'moral values' as their most important issue when voting are voting for Bush."
Hopeful Answer: they are useful for determining why people voted as they did and what the turnout in the election was really like. This is really the one chance to ask people right after the vote: "What did you do, and why?" This information can then inform policy makers, citizens and politicians about the nature of the electorate.
Official Answer (from their home page, in bold red type): "Exit Polls Provide Rich Content for News Outlets"
Assessment: The cynical answer seems to be correct.
However, let's at least consider the more hopeful possibility, that this information is extremely valuable for determining the opinion and sentiment of the electorate. Assuming they don't have a motherlode of better interview data they haven't t let out already, they really seem to have squandered the opportunity.
I will address this in my next post. UPDATE: The next post took a little longer to put together, and unless I'm just seeing something that isn't there, my cynical answer was optimistic. "Was the NEP Exit Poll Designed to Find a Divided America?"
As a parting thought via CNN: Dan Rather turning a phrase in 2000 about the exit poll system after the debacle in the last election:
"As far as I'm concerned," CBS anchor Dan Rather said to radio commentator Don Imus about VNS, "we have to knock it down to absolute ground zero, plow it under with salt, put a barbed-wire fence around it, quarantine it for a few years and start off with something new."I would suggest that the "something new" could actually be reporting on the election in an entirely different way, a way that doesn't involve trying to predict the outcome. Unfortunately, the networks were obviously too addicted to exit poll data for filler to actually "quarantine" the practice. Hell, CBS even used 3D mapping software to sex up the results this year.
The one positive thing this year is that there was more deliberation in the calling of races, excessive deliberation near the end of the night. I think that showed some reserve by the networks. Really, I suspect most people don't care that much about calling the race if it's really tight: just report the real data as available and viewers can watch the precinct results fill in. Of course, calling races has never been about doing the viewer a favor, this is purely about ego, as the CBS 2000 post mortem cited below mentions, "Make no mistake. The Election Night broadcast occurs in a cauldron of competitive heat--heat that comes from within each individual and within each network, all burning to be the best and to be first." But it doesn't matter to people: election night reporting is more like a sportscast, and most people are not watching a particular network for the call, but for the commentary: can't get enough of Dan Rather's colorful turns of phrase and metaphors? Watch CBS. Like Brokaw's dry humor? NBC it is. Being 30 seconds or even five minutes ahead on a call just doesn't matter much.
While equally accurate, weather predictions actually serve a purpose - I might grab an umbrella. Newscaster speculation based on fewer than 3,000 interviews about the outcome of a particular state mere hours before it can be known with almost absolute certainty is of virtually no use whatsoever.
While the data did not provide any great early race calls for any of the networks nor did it allow them to offer any really insightful analysis, the data were at least good for some news, or the leaks through the blogosphere certainly were, in any case.
The San Francisco Chronicle wags its finger at the blogs:
[C]ontext was absent at many pseudo-news and blogger Web sites such as drudgereport.com and wonkette.com where early exit poll numbers -- derived from interviews and adjusted for factors such as poll location and the past performance of a precinct -- began appearing Tuesday afternoon.The Chronicle doesn't even mention that the only way anyone outside of the "professional" media could get the data is from a leak from an NEP subscriber, all of whom are major media establishments. The leak and the resulting brouhaha made a great story for the media, which took the place of the polling station chaos story they seemed to be expecting (hoping for?). Candidly, I wonder if the media didn't leak the data in the hopes of achieving exactly that result. Wonkette, after all, was sitting in the NBC studios that day, and despite her name, she lacks the expertise to make an effective judgement about the implications of using the data.
Update: John at Powerline shares my speculation that this may have been exactly what NBC wanted to happen, although he injects more of a politically-motivated conspiracy into the goings-on. I don't think that is as likely as the desire to create some excitement and news to report on once the networks realized that their exit poll data were not going to let them call key races until late in the evening, leaving them with a whole lot of potentially dead airtime (which they were still stuck with).
UPDATE2: The next post took a little longer to put together, and I will no longer question John's comspiracy theory about the early exit poll data leak. "Was the NEP Exit Poll Designed to Find a Divided America?"
Some source material links about the predictive failures of these polls:
CBS News on the "Shortcomings of Exit Polls," as well as their "Investigation, Analysis and Recommendations"(an 87 page PDF) for exit polls and calling races after the 2000 election.
The Washington Post reports "New Woes Surface in Use of Estimates"
Also: Knight Ridder via Seattle Times, Boston Globe, New York Post, New York Times (Nov 3), Reuters (which accurately files the story under "entertainment")
Blaming the bloggers: AP via Arizona Republic, Business Week, Newsday.
e-mail post | Link Cosmos | [Permalink] | | Saturday, November 06, 2004