Tech

GPT-5: OpenAI’s Flagship Model Brings PhD-Level Intelligence to the Masses

94.6% on advanced math, 74.9% on real-world coding: Why GPT-5's boring name hides radical ambition

BY Clinton Stark 08.07.2025 Modified date: 11.18.2025

Screenshot of ChatGPT model menu showing GPT-5, GPT-5 Thinking, and GPT-5 Pro options. — OpenAI has simplified ChatGPT’s model menu to just GPT-5, GPT-5 Thinking, and GPT-5 Pro.

Life in the world of AI moves at a different speed. And that speed would be: extremely fast.

Just a day after OpenAI surprised the industry with its first open-source models since GPT-2 in 2019, the company has dropped another bombshell: GPT-5, its newest flagship model that CEO Sam Altman calls “the smartest model we’ve ever built.”

Subscribers logging into ChatGPT today will notice a dramatically simplified model menu:

GPT 5 (Flagship model)
GPT-5 Thinking (Get more thorough answers)
GPT-5 Pro (Research-grade intelligence)

Gone are the alphabet soup options like o4-mini-high (the one I most often used for generating scripts and code for Stark Insider and AI projects), replaced by what OpenAI calls a more intuitive experience. Behind the scenes, a sophisticated router automatically switches between models based on query complexity—no more guessing which model to pick for your task.

The Numbers Game

With ChatGPT approaching 700 million weekly users, Altman’s prediction that “this is something a billion+ people will benefit from” doesn’t seem quite so far-fetched. The adoption curve continues its hockey-stick trajectory, with enterprises and individuals alike integrating AI into daily workflows at unprecedented rates.

The benchmarks tell an impressive story: GPT-5 scores 94.6% on AIME 2025 mathematics problems without tools, 74.9% on SWE-bench Verified for coding tasks, and 46.2% on HealthBench Hard. The Pro version pushes even further, hitting 88.4% on the notoriously challenging GPQA benchmark.

Altman describes the progression in relatable terms: “GPT-3 sort of felt like talking to a high school student… GPT-4 felt like you’re talking to a college student. GPT-5 is the first time that it really feels like talking to a PhD-level expert.”

The Open Question

While GPT-5 represents OpenAI’s closed-model prowess, the simultaneous release of open-weight models acknowledges a reality many in the AI community have long understood: different use cases demand different solutions. For enterprises handling sensitive data, researchers needing uncensored access, or developers wanting complete control, open models running on private infrastructure remain essential. Privacy concerns and the need for specialized fine-tuning ensure the open-source ecosystem will continue thriving alongside commercial offerings.

What Does “GPT” Stand For?

GPT stands for Generative Pre-trained Transformer.

Generative – It can create text, images, or other outputs rather than just analyze.
Pre-trained – The model is trained on massive datasets before being fine-tuned for specific tasks.
Transformer – The neural network architecture introduced in the 2017 Google paper “Attention Is All You Need”, which revolutionized natural language processing.

This naming convention began with GPT-1 in 2018 and continues through GPT-5 today.

As an aside for Claude users like myself: Claude’s name is a nod to Claude Shannon, the American mathematician and electrical engineer widely regarded as the “father of information theory.”

My experience with private, open LLMs running on consumer hardware have impressed. Open source tools such as LM Studio and Ollama make it quite easy for us tinkerers to download and test these free versions of the big models. Granted, they don’t quite reach the lofty heights of expertise the large trillion parameters ones such as GPT-4 and now GPT-5 can, they are becoming increasingly efficient and smart.

And, keep in mind, open LLMs really only became a thing we really experimented with and talked about in 2023!

The Healthcare Hazard

Altman’s claim that GPT-5 is “the best model in the world at healthcare” should make everyone pause. As one concerned commenter noted, “saying things like being the best at healthcare is just dangerous.” (The Verge)

This isn’t hyperbole. When AI confidently provides medical advice with the authority of a PhD but the understanding of a sophisticated pattern matcher, real harm becomes possible. The model may score well on HealthBench Hard, but that’s a far cry from understanding the nuanced, life-or-death decisions actual healthcare requires.

GPT-5 Accuracy: Humanity’s Last Exam (Full Set)

Bar chart comparing GPT-5, GPT-4o, and other AI models on Humanity’s Last Exam, showing accuracy scores with and without thinking or tools. — GPT-5 Pro leads the field on Humanity’s Last Exam, scoring 42% with tools, ahead of GPT-5, OpenAI o3, and GPT-4o. Source: OpenAI

Model & Setup	Accuracy (pass@1)
GPT-5 Pro (Python + search with blocklist)	42.0%
GPT-5 Pro (no tools)	30.7%
GPT-5 (Python + search with blocklist)	35.2%
GPT-5 (no tools)	24.8%
ChatGPT Agent (browser + computer + terminal)	6.3%
ChatGPT Agent (no tools)	41.6%
OpenAI o3 (Python + browser)	23.0%
OpenAI o3 (no tools)	24.3%
OpenAI o3 (no tools, alt run)	14.7%
Deep Research (Python + browser)	26.6%
GPT-4o (no tools)	5.3%

Key Takeaways

The Good: GPT-5 delivers on its promise of being faster, smarter, and more reliable. The unified interface with automatic model routing removes friction for everyday users. “Software on demand” feels tantalizingly close—during demos, GPT-5 generated fully functional websites with hundreds of lines of code in seconds.

The Reality Check: As AI researcher Gary Marcus notes, this took “almost 3 years, many billions of dollars” and remains “part of the pack, not a giant leap forward.” Competitors like Anthropic’s Claude and others keep pace, ensuring healthy competition.

The Practical: For developers, three API tiers (GPT-5, GPT-5 mini, and GPT-5 nano) offer flexibility for different use cases and budgets. New personality themes (Cynic, Robot, Listener, Nerd) add customization without complexity.

And The Pivot to Pragmatism: Altman’s framing is telling: “the main thing we pushed for is real-world utility and mass accessibility/affordability.” This represents a significant shift from OpenAI’s previous “AGI at any cost” messaging. They’re now optimizing for the billion users who’ve “only used models like GPT-4o” rather than pushing the bleeding edge.

Final Thoughts: Still the Go-To, But…

OpenAI remains the default choice for many. It’s like the AI equivalent of “nobody ever got fired for buying IBM.” The integration is seamless, the ecosystem mature, and now with GPT-5, the intelligence genuinely impressive. Though, selfishly, I wish they would add MCP so I can add Notion, which works so well on Claude (when the service is not down).

The landscape has evolved. While I, and others, still turn to Claude for polishing scripts and code (especially for WordPress, Ubuntu, and AI projects), GPT serves as my daily driver. It’s fast, reliable, and often surprisingly funny and entertaining. The new personality options only enhance this. Having dry wit in my traits customization for GPT keeps it fun while I work.

The race toward AGI continues, with GPT-5 representing another significant waypoint rather than the destination. As Altman himself admits, it’s still “missing something quite important.” And that is the ability to continuously learn from deployment.

But for the billion users he hopes to reach? GPT-5 delivers exactly what most need: an AI assistant that finally feels like talking to an expert rather than an eager but occasionally confused intern. In the compressed timeline of AI development, that’s no small achievement.

And, let’s be frank: we can all use fewer emojis these days!

The future arrives unevenly distributed, as William Gibson might say. With GPT-5, it’s arriving a little more evenly for everyone.