by Andrew Sabisky
Mr. Sabisky is a forecaster on Good Judgment Open and a freelance writer. He has written extensively for the International Business Times – UK edition. Mr. Sabisky writes for the “Superforecasting in Action” blog to dive deeply into the perspective of an active GJ Open forecaster.
As Advent progresses, it seems correct, in this liturgical season of penitence and recollection, to think on our failings in this year of forecasting. We’re all adjusting to the election results, which most forecasters failed to anticipate. Thankfully, we have some good quality data at Good Judgment Open to evaluate how well our forecasters performed in predicting the election outcome.
(Note as you read that this is based on merely one forecast – albeit a very important one – and is best analyzed by comparing lots of forecasts over time. However, this is a great opportunity to compare different forecasts, see who did what, and take a look at some best practices.)
As an earlier blog at Washington Post’s Monkey Cage data blog showed, while the Good Judgment crowd had Hillary Clinton had a consistent favorite, it also consistently gave Trump a higher probability than most other sources of such forecasts. We comprehensively beat the models of the Huffington Post and the Daily Kos (based on polling aggregation), also outperformed Predictwise (a model based on aggregating prediction markets), and very slightly beat Hypermind (a prediction market). It’s nice to be right, but when you can’t be right, it’s also pleasant to be the least wrong person in the room.
So, how did our crowd do it? An interesting clue lies in analysis of the links forecasters left in their comments on their forecasts. The two most popular sites to link to were, by an overwhelming margin, FiveThirtyEight and RealClearPolitics. FiveThirtyEight was cited 528 times, and RealClearPolitics 320 times (848 citations of these two sites in total). By comparison, the next five most popular sites (the New York Times, Politico, Washington Post, PredictWise and CNN) were only cited a total of 795 times.
To explore the data visualizations from the The Washington Post’s The Monkey Cage’s election challenge on Good Judgment Open prepared by the Data Face, please see below or visit here:
This, I suspect, reveals a strong preference from the crowd for data over commentary. FiveThirtyEight doesn’t contain much commentary, and its most valued content is always its numerical forecasts (which, like Good Judgment Open’s, had a stronger pro-Trump lean than most other sources of forecasts). RealClearPolitics does contain its own commentary, and also collates the best commentary of other sites, but again, its most valued product is its polling aggregation. Anecdotally, I can report that almost all links to RealClearPolitics were links to its polling aggregation pages. Our forecasters, therefore, didn’t seem to pay much attention to pundits or color commentary.
It should be noted that although the GJ Open crowd clearly weighted FiveThirtyEight’s forecasts highly, the crowd’s own forecast does not especially look like that of FiveThirtyEight. The crowd’s forecast on the Trump-Clinton horse race was remarkably stable, unlike that of FiveThirtyEight, which was notoriously spiky. The criticism to be made of the crowd is that it wasn’t updating properly in response to new information. The obvious defense is that an extraordinary quantity of new information in presidential election is useless information, and that swings in the polls are largely driven by differential non-response. For instance, as Clinton gets a positive news cycle and Trump a bad one, Clinton supporters become more likely to pick up the phone to pollsters, and Trump supporters less likely to do so. The same is true in reverse; whenever Clinton’s email troubles reared their head, her supporters probably became more demoralized and less likely to respond.
A substantial political science literature suggests that it is very hard to get voters to change their minds, and bearing in mind the issue of differential non-response, I think you have to favor a stable, consistent forecast (like the GJ crowd) over a spiky forecast like that of FiveThirtyEight. The crowd’s forecast can perhaps be read as saying “our collective opinion is that movement in the polls is very largely illusory. The hypothetical underlying true polling numbers are stable, but favor Clinton. However, because differential non-response is not the only source of polling error, we assign a substantial probability (between 25-35%) to a Trump victory”.
The FiveThirtyEight forecast can be read as saying “we ignore the issue of differential non-response, and take all swings in the polls at face value. The hypothetical underlying true polling numbers may be stable, or they may not be.” This is not really, in my opinion, a defensible basis for a forecast, given the strong theoretical and empirical reasons for incorporating the differential non-response issue into interpretations of polling swings. Perhaps it’s a more defensible basis for a “nowcast”, a view on what would happen were the election held today. But the two forecasts have to be read in slightly different ways, despite the fact on average they probably performed quite similarly.
In summing up, there are strong hints in the data that the crowd’s relatively successful performance incorporated two great keys to good forecasting. It trusted hard data over pundit commentary, and it knew when to discard misleading new information. In 2017, we hope that you will profit from incorporating these vital tips into your own predictions with us at Good Judgment Open.