Malcolm Murray is an AI risk management expert who qualified as a Superforecaster in Season 3 of the Good Judgment Project. As Chapter Lead for the International AI Safety Report, he is a core part of the team leading the work of 100 global AI experts. As Research Lead at SaferAI, he provides policy recommendations and risk management tools to governments and AI companies. Originally from Sweden, he is currently based in Abu Dhabi. He is a Research Affiliate with the Centre for the Governance of AI and a Chartered Financial Analyst (CFA), and he holds an MBA from INSEAD.
GJ: Could you please tell us how you first became a Superforecaster?
I learned about the Good Judgment Project from a David Brooks column in the New York Times. I registered, participated in Season 3, scored in the top 2%, and so became a Super. I was a Super in Season 4, and then there was a conference in Berkeley with Philip [Tetlock], Terry [Murray], and everybody. As the GJP transitioned into Good Judgment Inc, I kept forecasting. It suits me very well. I’m a classic news junkie. I read everything. It’s fun to apply that, because otherwise you’re not going to discuss, say, the latest situation in North Korea with your friends.
GJ: You wear several hats: Superforecaster, Chapter Lead for the International AI Safety Report, Research Lead at SaferAI. How has your Superforecasting work informed your approach to AI safety?
The Superforecasting work is important for every other piece of work I do. When it comes to the future of AI and the effects it will have on society, I look across the board at all these different AI risks, and a lot of that is forecasting. Both risk management and forecasting are about looking into the future: thinking about uncertainties and looking at what would make an event more or less likely.
Then, very practically, I’ve been doing quantitative risk assessment for AI since I moved full-time into this field about two and a half years ago. We’ve been running Delphi studies where I’ve taken a lot of principles from Superforecasting, both the principles from Philip Tetlock’s work and the best practices I’ve picked up myself. We make sure that the participants are in a group, so that they get to discuss with each other. They are asked to write good rationales that the others can read and then update their views. I’ve been thinking about having dedicated red teams and blue teams, like we have in Good Judgment. I haven’t tested that yet, but we did have one Superforecaster join a session to red-team the thinking of AI experts in cyber risk and biological risk.
We also published a paper at the end of last year on risk modeling in AI, where any Superforecaster or any reader of Phil Tetlock would recognize such components as thinking probabilistically. This is such a useful skill.
And then, being a Superforecaster is a badge of respect. You are a part of a bigger group, and having done this for more than a decade gives you additional credibility and respectability.
GJ: Over a year ago, you wrote that “when it comes to AI evolution, prediction seems dead.” But you were careful to distinguish prediction from forecasting. Could you elaborate on that distinction?
It’s a key difference. The AI field has always been full of people making predictions in a way that is very different from Superforecasting. When you look at early artificial intelligence back in the 1950s, the participants, including some great thinkers, predicted they would solve artificial general intelligence in a summer. Geoffrey Hinton in 2016 said that we should stop training radiologists because AI would be able to do their work fully in five years. And here we are, 10 years later. Yes, there are parts of the job that AI can do better than humans, but only a small part, right?
Superforecasters approach forecasts analytically, in a robust way: what is the base rate, and what are the various factors that influence it? Prediction is definitely not that, and I think in AI, there’s no point in making random predictions. But there is still a great need for detailed and elaborate forecasts.
“In AI, there’s no point in making random predictions. But there is still a great need for detailed and elaborate forecasts.”
Superforecaster Malcolm MurrayGJ: Early on, Superforecasters were criticized for being too skeptical about rapid AI progress. But lately, many AI experts have been revising their timelines outward. You have spoken about it to the Guardian recently. Were the Superforecasters right all along?
In the Existential Risk Persuasion Tournament (XPT) that the Forecasting Research Institute did, Superforecasters and experts were polled on AI capabilities and timelines. The Superforecasters were much more pessimistic and had much lower estimates on various benchmarks. At the time, it felt that the Superforecasters weren’t fully open to the possibility that these things could happen. But what we are seeing is that, yes, on paper, in the lab, in controlled experiments, AI has superhuman capabilities. But in the real world, where there’s so much inertia and resistance, the Superforecasters were much more on the money in terms of actual adoption and the actual impact on the job market.
GJ: Given that AI capabilities and the timelines are so volatile, where do you see Superforecasters adding most value today? And how do you see AI and Superforecasters working together?
If we look at where Superforecasters add value compared to other AI researchers, as I’ve been alluding to before, Superforecasters bring analytical and robust thinking, such as looking at the base rates. The problem with AI progress, of course, is that we don’t always know which base rate is right. Nate Silver captures this with his Technological Richter Scale, the TRS. I think Superforecasters will always have an edge in terms of looking at the bigger picture and not getting stuck in our own worldview.
I see this in my Delphi studies with experts in biological weapons, for example. They’ve thought about the subject so much that they’re biased toward ascribing a higher probability to certain pathways just because they’ve thought so deeply about them. So I think Superforecasters will continue to have an edge there.
And then there’s the forecasting question formulation, which, as we know, is a whole art in itself. Here Good Judgment has really led the way.
On the human-AI collaboration, we’ve always used AI at Good Judgment. Already in the early days of the Good Judgment Project, there were optimization algorithms that would help the performance. As a Superforecaster, I’m loving the fact that I have these tools at my disposal, both for double-checking and pressure-testing my own thinking as well as for more frequent updating.
GJ: One thing that sets Superforecasters apart is their willingness to update their views and learn from their mistakes. In the space of AI, what has been your biggest update?
The extreme jaggedness of AI. Like many others, I had this mistaken assumption around 2022-23 that AI would continue improving more or less uniformly across domains. Instead, AI is now completely superhuman in coding, math, and science, but it still fails at the most basic tasks. I can give it a website and say, “Tell me X,” and it claims it can’t find the said website, or it just makes something up. So you have this extremely strange, alien intelligence: a two-year-old’s performance versus superhuman performance from the same model. That’s been a big update for me, and as a Superforecaster, I definitely tried to make that update as soon as it became clear to me that this was the case.
GJ: You’ve argued that risk management should build resilience. What does a resilience-focused AI policy look like from your point of view?
I was a Chapter Lead on the International AI Safety Report, which comes out today [3 February 2026]. It has a new section in my chapter on resilience. We’re recognizing that resilience is vital because there’s no stopping AI. Even if Anthropic, Google DeepMind, and OpenAI stopped today, the open-source models are only a few months behind, and they cannot be withdrawn. So risk management cannot be the only solution anymore. What we need is to build societal resilience: the ability to withstand and the ability to adapt. That means stronger cybersecurity, education on misinformation and disinformation, and targeted measures like strengthening security at biological weapons facilities. So, resilience is a broad concept, but it’s really, really important.
GJ: Let’s get back to forecasting. What’s your advice for people who want to improve their skills today?
The classic things apply. First of all, practice. It’s very easy now. From Good Judgment Open to prediction markets, there’s no shortage of opportunities. There are tools to pre-register what your beliefs are because it’s so easy to retrospectively misremember what you actually thought. Practicing and being honest with yourself is key.
Same thing with calibration. It’s hard to achieve, but there are tools to practice it, and it makes a night-and-day difference when you do.
And, obviously, read widely. In terms of filtering signal from noise, find a few sources that you trust. Find your favorite Substack on a subject, make sure you read that religiously, and then dip your toes in more partisan takes. Avoid your own echo chambers.
Schedule a consultation to learn how our FutureFirst monitoring tool, custom Superforecasts, and training services can help your organization make better decisions.