OpenAI's ChatGPT 5.5 Pro, released on April 23, 2026, solved a long-standing open problem in combinatorial number theory in under an hour, with experts calling its proof "original and ingenious". It is reshaping how mathematicians work, raising questions about authorship, and academic publishing.
Sometime in the weeks before its public launch on April 23, 2026, an AI model did something that mathematicians had been unable to do for years. In roughly the time it takes to drink a cup of coffee, OpenAI's ChatGPT 5.5 Pro solved an open problem in combinatorial number theory — a field concerned with the deep structural properties of numbers and sets. The proof it produced was reviewed by human experts and described as not merely correct, but genuinely creative. That single episode has since set off a much larger conversation: about what artificial intelligence can now do independently, what that means for the scientists who have dedicated their careers to this work, and whether the institutions of academic research are prepared for what comes next.
The problem at the centre of this story was posed by mathematician Mel Nathanson, and concerns the estimation of the upper bound of the diameter of sum sets — a concept in additive number theory. Prior to the AI's involvement, the best-known result established that this upper bound grew exponentially, a finding by MIT undergraduate student Isaac Rajagopal. The challenge for subsequent researchers was to find a tighter, more efficient ceiling.
The demonstration was conducted by Professor Timothy Gowers of the University of Cambridge, a Fields Medal winner, who guided the model through the problem over approximately one hour. The process was not a simple query-and-answer exchange but an iterative exploration. After about 16 minutes, ChatGPT 5.5 Pro produced a construction yielding a quadratic upper bound for a specific case of the problem — a result Professor Gowers assessed as the best possible for that scenario, meaning the AI had already advanced beyond the existing state of knowledge. From there, the model continued refining its approach, correcting itself and ultimately achieving a polynomial upper bound — a significantly more efficient and elegant solution than the previously known exponential one.
The proof was reviewed by Isaac Rajagopal himself, who assessed it as "almost certainly correct" in its core ideas. More significantly, he described the central insight as "original and ingenious" — language that places the AI's contribution firmly in the category of creative mathematical thinking, not rote computation. Professor Gowers identified several sophisticated capabilities at work: the model constructed original proof ideas outside of a complete theoretical framework, identified concise arguments that human researchers had overlooked, and drew on techniques from other areas of combinatorics to strengthen parts of the proof — a form of cross-disciplinary thinking long considered a distinctly human trait.
|
|
"The central idea behind the proof was original and ingenious" — Isaac Rajagopal, MIT, reviewing the AI-generated solution to the Nathanson problem. |
In a separate, more striking instance, the model addressed a related problem involving Rajagopal's own work — this time entirely without human guidance. After approximately 17 minutes of internal processing, it generated a complete solution and produced a full LaTeX-formatted preprint in just over 31 minutes. The entire research cycle, from identifying the problem to producing a publication-ready document, was compressed into half an hour. Experts compared the significance of the result to publishing a major peer-reviewed journal article or completing a doctoral thesis.
The Nathanson problem is the most dramatic example, but it is not an isolated one. ChatGPT 5.5 Pro has been acknowledged in a range of academic papers across diverse mathematical domains — from magnitude homology and Hamilton decompositions of directed tori to conjectural formulas for Macdonald indices and revisions of Litvak's conjecture on Gaussian minima. In each case, the model's role varied: generating proof strategies, exploring constructions, surfacing connections between fields, and even proofreading manuscripts.
One particularly striking example of the model's value came in the Litvak conjecture study, where the authors noted that a crucial connection to the Fejes Tóth zone conjecture was entirely unknown to them — until they asked the AI, which surfaced it immediately. This ability to bridge disparate branches of mathematics functions as a powerful accelerator, helping researchers make connections that might otherwise take years to identify through conventional literature review.
|
Role |
Example Contribution |
Research Area |
|
Proof Strategy |
Generated part of proof strategy for Theorem 4.5 |
Magnitude Homology |
|
Proof Exploration |
Proof exploration, exposition, and selector design |
Hamilton Decompositions |
|
Conjecture Suggestion |
Suggested bosonic formula for Macdonald indices |
Conjectural Fermionic Formulas |
|
Literature Connection |
Surfaced link to Fejes Tóth zone conjecture, unknown to authors |
Litvak's Conjecture Revision |
|
Manuscript Assistance |
Proofread manuscript for minor errors |
Magnitude Homology |
On standardised benchmarks, the model also posted strong numbers. It scored 81.2 on the AIME 2025 mathematics test, up from 65.4 for the previous version, and achieved a record 82.7% on the Terminal-Bench 2.0 coding benchmark — figures that establish a solid quantitative foundation beneath the more qualitative breakthroughs described above.
For the mathematicians watching these developments, the conversation is less about replacement and more about redefinition. The emerging picture is one in which human researchers transition from the painstaking manual work of proof-finding toward a higher-level role: framing ambitious questions, providing conceptual direction, critically evaluating machine-generated output, and interpreting results within the broader context of their fields. In this model, the AI handles the heavy computational and exploratory lifting; the human provides the strategic vision and the quality control.
Professor Gowers has proposed a new benchmark for what might constitute a meaningful human contribution in this environment: "proving something an LLM cannot." The implication is significant. As AI tools become more capable, many of the incremental steps and routine explorations that currently form the bread and butter of graduate student work may become automatable. The focus of human intellectual effort would shift — from asking "how do we solve this?" to asking "what should we ask the machine to solve next?"
This transition, while potentially liberating for senior researchers, raises real concerns for early-career academics. The smaller, incremental publications that form the typical foundation of a young mathematician's career may be among the first casualties of AI-driven acceleration, forcing early-stage scholars to compete for ever-more-ambitious problems simply to remain relevant.
The capabilities of ChatGPT 5.5 Pro have arrived ahead of the institutions designed to accommodate them. Preprint repositories such as arXiv currently refuse to accept AI-authored content — meaning that a mathematically valid, expert-reviewed breakthrough can exist without any formal channel for dissemination or peer review within the established academic system. Professor Gowers has proposed a practical solution: a dedicated repository for AI mathematical achievements, reviewed by human mathematicians, which would allow machine-generated insights to be formally vetted, credited, and integrated into the scientific record.
Reliability is a second fault line. The same studies that document the model's successes also flag its failures. In one instance, ChatGPT 5.5 Pro successfully constructed a valid proof for a differential equation on its first attempt — yet repeatedly produced other proofs that incorrectly assumed the very conjecture it was supposed to prove, treating an unproven claim as if it were already established fact. This is a known risk with large language models: the generation of plausible-sounding but logically flawed reasoning, commonly called hallucination. In a high-stakes research environment, such errors could propagate through the literature if not caught by rigorous human review. The AI is a powerful and sometimes unpredictable collaborator, not an infallible oracle.
The question of authorship and credit remains unresolved. When an AI independently solves a major problem, who owns that discovery? Current practice — acknowledging the model's contribution in paper appendices — is a pragmatic stopgap, but it is unlikely to hold as AI becomes a more central participant in research. New norms, new policies, and potentially new legal frameworks will be needed to define what intellectual contribution means in an era when machines can participate in the highest forms of creative thought.