$1 Trillion into AI, and It Still Can’t Ask the Right Questions

$1 Trillion into AI, and It Still Can’t Ask the Right Questions

Superforecaster Ryan Adler on getting frontier LLM models to write solid forecasting questions.

Anyone hoping that artificial intelligence’s dominance of the headlines might wane in 2025 has been very disappointed. With over $1 trillion in announced investment, and December still to go, AI is permeating every aspect of society at an impressive pace. Perhaps most startling for many is the prospect that advances in these technologies will come for our jobs. While this fear is nothing new, it’s certainly accelerating.


As someone who, among other things, writes forecasting questions for a living, I’m not indifferent to the prospect of an AI model making my knowledge, skills, and abilities obsolete. That said, I undertook a recent exercise to evaluate the current state of the threat. Pleasantly enough, I found that frontier AI systems have a very long way to go.

What Frontier Models Get Wrong about Forecasting Questions

My colleague Chris Karvetski created a detailed prompt we used to ask ChatGPT, Gemini, Claude, and Grok to draft solid forecasting questions related to Russia’s war in Ukraine. This task requires at least a modicum of qualitative assessment, unlike simpler questions about, say, asset prices or interest rates. If that’s what you want, FRED already does most of the work. But war is different. It’s complicated; political and historical fictions abound; and the current Russian government is nothing if not creative on paper. Clear and salient forecasting questions are an absolute must for this and many other topics.

Here are a few quick observations:

  • ChatGPT had an affinity for UN action, which should raise a red flag (no pun intended) for anyone familiar with the UN Security Council structure. Russian veto effectively prohibits any action that isn’t blessed by the Kremlin.
  • Gemini seemed to presuppose perfect knowledge: that casualties down to the man and locations down to the kilometer were readily available data. They aren’t. Surplusage is also an issue, and its attempts to define controlling sources created potentially fatal inconsistencies.
  • Claude tried to cover US and EU sanctions. On its face, it might look like a good framing. However, it tried to set a threshold (50%) without providing any notion of how the current spectrum of sanctions and restrictive measures would be quantified. Without clear metrics, that’s a complete nonstarter.
  • Grok shared some of Gemini’s overconfidence in information availability and often framed questions that could only be resolved long after the fighting had concluded. That’s hardly useful for policymakers.

For those who have seen 1776, you may recall the scene where Thomas Jefferson first shares his draft declaration of independence. There’s a pause, then just about everybody in Independence Hall starts clamoring with questions. If the LLM-generated questions above were presented to Superforecasters, I suspect a similar scene of questions and clarification demands. Great forecasters know that the quality of a forecast, as well as the information derived from it, depends on every element of a question being as clear as possible. Otherwise, you may end up with a probability that means next to nothing.

Will one or more of these model lines get better? Certainly. Will a domain-specific program give us a run for our money before Halley’s Comet makes its return? Probably. But if Lewis and Clark’s path to the Pacific Ocean were an analogy for AI’s journey to writing iron-clad forecasting questions, today’s frontier models have just made it to Kansas City.

* Ryan Adler is a Superforecaster, GJ managing director, and leader of Good Judgment’s question team

Meet the winner of the “Right!” said FRED: Q2 2025 Challenge

Meet the winner of the Q2 2025 “Right!” said FRED Challenge

The winner of the Q2 2025 “Right!” said FRED Challenge, Josh Jamner, is an investment strategist who has worked in both buy- and sell-side research roles over the course of his 16-year (and counting) career on Wall Street. Josh is known as jjamner on GJ Open and lives in New York City. In this interview, he discusses his interest in and approaches to forecasting, and how the Liberation Day tariff announcements (and their aftermath) affected his forecasting for this challenge.

GJO: To start, could you tell us a bit about yourself and your background?

Of course. I was raised in Connecticut about an hour outside of New York City, attended Colby College in Maine, where I was a Government major and Economics minor, and moved to New York City to pursue a career in finance after college. I have been here ever since. Outside of work, I enjoy skiing, hiking, reading (preferably near a beach, lake, or pool), cooking, and spending time with my wife and second grader.

GJO: How did you first become interested in forecasting, and what led you to GJ Open?

I joined GJ Open in 2019 on the recommendation of my two colleagues at ClearBridge Investments (my employer for the past eight years), but I was familiar with Good Judgment before that, having read Superforecasting and a few related books such as Thinking, Fast and Slow, Expectations Investing, and a personal (albeit tangentially related) favorite, Soccernomics. Prior to joining GJ Open, I had been tracking forecasts and calculating Brier scores in a spreadsheet for a year or two. When I saw how easy the platform was to use, plus the opportunity to see the forecasts of others and the group consensus, I switched over and my old spreadsheet hasn’t been touched since.

GJO: What has your experience on the platform been like so far?

My experience has been quite positive; otherwise, I wouldn’t have continued participating all these years! I find it helpful both professionally and personally. On the personal side, it is fun to try to anticipate the future and how situations may evolve, especially in areas where I have little subject matter knowledge. I like the challenge of forecasting and strive to push myself to improve my thinking, and participating in GJ Open is one avenue to do so.

Professionally, seeing how my own views differ from consensus is quite useful. But perhaps even more useful (to me) is seeing how consensus evolves in response to new information and comparing that to how my own views shift over time. I find it valuable to compare the distribution of odds for a given forecast (and how that distribution changes over time) between myself, consensus, online betting markets, financial markets using options data, and anything else I can find.

GJO: Could you walk us through your approach to the “Right!” said FRED Challenge? What do you think helped you reach the top of the leaderboard?

I think there was a bit of randomness or luck in reaching the very top of the leaderboard, although I have been near the top before (5th in Q3 2024). The Q2 2025 challenge came during a tumultuous period for economic data and financial markets sparked by the Liberation Day tariff announcements early in the quarter. In some cases, I felt comfortable leaving prior forecasts unchanged in the weeks following, while in others it made sense to shift the distribution to imply fatter tails, and in yet others my views evolved such that more of a directional shift was the outcome. Over the course of May and June, conditions continued to evolve fairly dramatically such that I found myself making larger-than-normal updates to my forecasts.

Ultimately, I think a lack of anchoring and a willingness to frequently update my thinking—and at times turn around and update my thinking again a few weeks later in a 180-degree opposite direction—was crucial. While I didn’t place at the top of any single question within the challenge, I was in the top 25 for all 13 and the top 15 for 9 questions, so consistency played a large role as well.

GJO: Are there particular types of forecasting questions you enjoy most?

I tend to focus on economic and financial market questions given my profession, but I do enjoy forecasting sports even though I am empirically quite bad at it. I don’t gamble on sports for many reasons, and seeing my “sports Brier” on GJ Open is one of them! I also enjoy forecasting things that are seemingly impossible to forecast. The challenge is fun, and thinking through a framework or approach to build a forecast upon is intellectually stimulating.

GJO: What advice would you give to beginner forecasters on GJ Open?

First, read a few books or papers on forecasting if you haven’t already. Any of the ones I mentioned is a good place to start. Second, build participating on GJ Open into your routine; for example, I have a recurring calendar reminder to check and update my forecasts every Friday morning. Sometimes I end up updating my forecasts at a different time, and sometimes I miss a week entirely, but in general I think having a periodic update such that it becomes a habit or part of your routine helps a lot.

In terms of actually participating on GJ Open, I would suggest focusing on an area where the individual has some sort of experience or subject-matter expertise or perceived edge, along with a sample of broader topics. While it takes a bit of time to build up enough forecasts to evaluate the results, I found I wasn’t quite as good in some areas as I might have thought (sports), while in others (currencies) I knew my weaknesses to begin with. And that works both ways: you might find out you actually are pretty good in an area you didn’t expect.

GJO: Is there anything else you’d like to share with other forecasters on GJ Open?

I would like to thank everyone involved with GJ Open. I very much appreciate being able to participate, and I know a lot goes on behind the scenes to keep things operating smoothly!

See the latest forecasting challenges on GJ Open and try your hand at forecasting!

What’s a month?

What’s a month?

Why question wording must be exact in forecasting

Superforecaster Ryan Adler turns a live CNBC disagreement about Tesla shares into a quick guide on clarity. Good forecasting starts with shared definitions.

On Monday morning (4 August 2025), I was pounding away on my keyboard with CNBC playing in the background. Living in the Mountain time zone, morning meant the Halftime Report, hosted by Scott “The Judge” Wapner. I was loosely listening in when it became clear that Wapner and “Investment Committee” member Joe Terranova were having a disagreement over whether Tesla shares were up or down over the past month. The exchange was cordial but awkward, as Wapner insisted that Tesla shares were down in the past month based on where the stock was trading that morning, but Terranova was very confident that it was up in the past month. They eventually went to commercial and came back having discovered the source of discrepancy. The problem wasn’t that one was right and the other wrong. The problem was that they were each defining “month” differently.

A month before 4 August 2025 would have been 4 July 2025, a market holiday. The chart CNBC showed related back to the closing price of Tesla on 3 July (about $315). Terranova, on the other hand, was using the opening price as of the opening bell on 7 July 2025, four weeks previous, when the price was a bit under $300. The two talked past each other for a bit until the reason for the difference was identified.

Ambiguity Kills Forecasts

What does this have to do with forecasting? Everything!

Among the many lessons that came out of the Good Judgment Project, it was clear that the fight against ambiguity is essential and never-ending. While others may give this fight a lower priority, it is front-and-center on our minds at Good Judgment with every question drafted and reviewed.

If a term or clause could be interpreted reasonably in different ways, we define that term and include examples as needed. And even if someone interprets something in an arguably unreasonable way, such as asserting that the death of a country’s president doesn’t mean that the person stops being that country’s president (it’s happened repeatedly, for some reason), we clarify.

We aren’t perfect, and the world sometimes creates situations that weren’t on anyone’s radar when a germane question was launched beforehand. That said, we know that everybody must be contemplating the same elements of an event they are asked to forecast. Leaning on Potter Stewart’s concurrence in Jacobellis v. Ohio, where he said, “I know it when I see it,” may work when deciding that a movie is not obscene, but it is no way to set a threshold for a forecasting question. Otherwise, we would invite static from the crowd instead of signal.

Bottom line: The CNBC confusion shows how ambiguity kills forecasts. Define upfront what counts, when it counts, and who decides, and leave as little as possible to interpretation. Good forecasting starts with good question writing.

Do you have what it takes to be a Superforecaster? Find out on GJ Open!

* Ryan Adler is a Superforecaster, GJ managing director, and leader of Good Judgment’s question team