A Breakthrough or a Blunder?
In October 2025, OpenAI sparked excitement when a series of social media posts suggested that GPT-5 had solved ten long-standing Erdős math problems and made progress on eleven more. The discussion began with researcher Sebastien Bubeck noting that GPT-5 had identified a previously overlooked solution to one problem and evolved through posts by collaborator Mark Sellke and OpenAI VP Kevin Weil, spreading rapidly across social media. But within days, mathematician Thomas Bloom, who runs the Erdős Problems website, clarified the situation. GPT-5 hadn't solved the problems from scratch; it had uncovered existing solutions buried in academic literature, a capability that was misinterpreted due to the loss of context in a chain of quote-tweets. The episode revealed a gap between AI's real capabilities and how they're sometimes presented.
This wasn't a case of bad faith. OpenAI's researchers were genuinely thrilled about GPT-5's ability to sift through vast mathematical databases, finding answers that even experts had overlooked. The way the news was shared, especially through quote-tweets that lost context, made it sound like the AI had conjured new proofs. The backlash was swift, with Google DeepMind's Demis Hassabis calling it embarrassing and Meta AI's Yann LeCun delivering sharp critiques. The incident underscores a bigger question: how do we talk about AI's role in research without inflating expectations?
AI's Real Power in Research
Despite the miscommunication, GPT-5's feat was no small thing. The model's ability to navigate citation networks and pinpoint solutions across decades of mathematical papers shows a leap in literature search capabilities. In tests, GPT-5 scored 94.6 percent on AIME 2025 math problems and 88.4 percent on graduate-level science questions, proving it can handle complex tasks. Beyond math, it's aiding fields like materials science and climate forecasting, where sifting through data quickly can shave months off research timelines.
Take Google DeepMind's AlphaFold as an example. By predicting protein structures, it solved a decades-old biological puzzle, earning Nobel recognition in 2024. AlphaFold's team clearly communicated its scope: the AI didn't invent new biology but accelerated existing research. This clarity helped scientists embrace the tool. OpenAI's stumble shows that even powerful AI needs precise framing to avoid skepticism from researchers who value accuracy above hype.
Why Words Matter in AI's Rise
The Erdős mix-up highlights a tricky balance. AI companies like OpenAI, Google DeepMind, and Meta are racing to outdo each other, with billions poured into data centers and talent recruitment. In 2025 alone, Meta planned to spend up to 72 billion dollars on AI infrastructure. This competition drives innovation but also tempts bold claims that can erode trust. When OpenAI's posts suggested GPT-5 was rewriting math history, they overlooked how such statements could mislead researchers, journalists, or even investors who rely on accurate capability assessments.
The fallout wasn't just a PR hiccup. Enterprises adopting AI need clear benchmarks to trust systems in fields like law or engineering, where GPT-5 shows promise across over 40 occupations. Missteps like this fuel doubts, slowing adoption. Mathematicians, including Thomas Bloom, emphasized that finding existing solutions, while useful, isn't the same as generating original proofs. Clear communication could have turned this into a win, showcasing AI's role as a research partner rather than a standalone genius.
Lessons From the Math Mishap
Looking at another case, NASA's use of AI for exoplanet detection offers a model for success. By analyzing telescope data, AI identified patterns humans might miss, speeding up discoveries without claiming to invent new physics. The team presented results as a collaboration between human expertise and AI tools, avoiding the hype that tripped up OpenAI. This approach built trust, letting scientists focus on the findings rather than debating the tool's role.
The Erdős incident teaches a clear lesson: honesty about what AI can and can't do builds credibility. OpenAI's researchers later clarified that GPT-5 only found existing solutions, but the initial framing caused a stir that could have been avoided. As AI takes on bigger roles in research, from astronomy to medicine, companies need to prioritize precision in their claims. This means working with domain experts, like mathematicians or biologists, to vet announcements before they hit social media.
Navigating AI's Future in Science
AI's potential to accelerate science is undeniable. Tools like GPT-5 can comb through literature faster than any human, uncovering insights that might take years to find otherwise. But as the industry grows, with electricity demands projected to hit 4 percent of U.S. usage by 2028, the stakes are high. Overhyping capabilities risks not just credibility but also public trust in a field that's already hard to understand.
Moving forward, collaboration could be the answer. Partnerships between AI developers, academics, and regulators can set standards for clear communication. The Erdős incident shows that even a well-meaning post can spark confusion if it lacks context. By grounding claims in verified results, like AlphaFold's precise protein predictions or NASA's data-driven discoveries, AI can cement its place as a trusted partner in science. The challenge now is to keep the excitement alive without letting hype outpace reality.