The Good Judgment Team Breaks Down What Makes a Good Forecasting Question & Why

Posted on Posted in GJ Open

Question on Phone

“Before 1 January 2017, will Philippine President Rodrigo Duterte proclaim martial law?”

The above question, posed in our Good Judgment Open public forecasting tournament, seems fairly straightforward and as though it has been taken directly from the headlines.

But, in reality, much meticulous drafting and an intensive question review process is going on behind the scenes by several GJ question team members. There is both an art and a science to writing a good forecasting question in order to render the forecasts useful.

One of the key lessons GJ teaches in forecasting accuracy training is to do “post-mortem” analyses on forecasts. (Asking yourself, “Did I ace it or blow it, and why? Was I right for the wrong reasons? Wrong for the right reasons?”)

We do the same type of analyses for our question development, and are in fact in the middle of a review process right now. So, while we think through our most successful questions and our not-so-successful questions from the past quarter, we thought we’d share a bit about what we’re thinking about when we develop a question.

  • Good forecasting questions are relevant and newsworthy, attracting many forecasters who update their forecasts frequently and post many rationales and links to external sources along with their forecasts. Because a good forecaster updates her judgment frequently.
  • They are also rigorous. We aim to write forecasting questions that pass what our cofounder Philip Tetlock (and author of Superforecasting) deemed the “Clairvoyance test” – if a true clairvoyant were asked the question, could they respond with the correct answer without having to ask “What do you mean by …?” Good questions also use terms that mean the same thing to most people, passing what we now call the “Tomato test” after a Supreme Court case about whether tomatoes are a fruit or a vegetable. A good question shouldn’t elicit many requests from forecasters to clarify what we mean, and we’re tracking that data to see how we’ve done thus far.Of course, sometimes questions are so rigorously defined that their outputs become less meaningful. So, some forecasting questions on GJ Open are “fuzzier,” in order to ask about important events and situations that can’t be clearly defined in advance. We’re also monitoring your engagement, clarification requests, and accuracy on these questions as well.
  • What about accuracy? A good forecasting question is not necessarily one on which most forecasters receive a very good score, nor is it a question where the crowd performs especially poorly. Overall, we’re looking at the distribution of accuracy across and within questions to discover whether the questions we’re asking can distinguish forecasters by their ability and provide an opportunity to learn and improve.

While we review our past quarter of questions, we’d like to hear your feedback. Which of our questions on Good Judgment Open were really solid, according to the criteria above? On what questions could we have done better? Do you have “questions about questions”? Share your thoughts in the comments box below, or on one of our social media pages; we’d love to hear them.

Reply on Facebook.

Reply on Twitter.

Reply on LinkedIn.

Share on FacebookTweet about this on TwitterShare on LinkedInEmail this to someone

5 thoughts on “The Good Judgment Team Breaks Down What Makes a Good Forecasting Question & Why

  1. I have a question about how certain questions can actually help determine if someone is a Superforecaster – take your example question, “Before 1 January 2017, will Philippine President Rodrigo Duterte proclaim martial law?” If I am guessing today (18 October 2016), there are still nearly 2.5 months before the end of the year, and I might guess 47% chance. Since I can change my answer, I might do that daily until on 31 December, I decide the answer is 1%. That doesn’t say too much about my forecast accuracy; rather, it only says I am rational about statistical probabilities. If Duterte proclaims martial law on 2 January, what have we learned?

    1. Hi Alan,

      I think the answer is in how Brier scores are calculated — “To determine your accuracy over the lifetime of a question, we calculate a Brier score for every day on which you had an active forecast, then take the average of those daily Brier scores and report it on your profile page.”

      More about Brier scores on our FAQ page on GJ Open —

      Does that help answer your question? Feel free to follow up with me via email.

      Gwyn (GJ Associate)

  2. There seems to be lots of disagreement about the Mosul question – when is it considered “retaken”. What if one of the parties simply declares it so? Especially complicated because there’s no agreements on who’s gonna govern there after.

  3. Hello, I would add to the criteria above that the questions about numerical values* should have many different ranges, so that the magnitude of the mistake is taken into account in a granular way.
    I don’t know if having a different number of possible outcomes makes the brier scores of different questions less comparable though.
    *(Such as the future value of the S&P500 or the question “How many civilian fatalities will ACLED record in the Democratic Republic of the Congo between 1 May 2016 and 31 December 2016?”)

Leave a Reply

Your email address will not be published. Required fields are marked *