Skip to content
Stark Insider
  • Culture
  • Filmmaking/Tech
  • Atelier Stark Films
News Tech

GPT-5: OpenAI’s Flagship Model Brings PhD-Level Intelligence to the Masses

94.6% on advanced math, 74.9% on real-world coding: Why GPT-5's boring name hides radical ambition

BY Clinton Stark — 08.07.2025

Screenshot of ChatGPT model menu showing GPT-5, GPT-5 Thinking, and GPT-5 Pro options.
OpenAI has simplified ChatGPT’s model menu to just GPT-5, GPT-5 Thinking, and GPT-5 Pro.

Life in the world of AI moves at a different speed. And that speed would be: extremely fast.

Just a day after OpenAI surprised the industry with its first open-source models since GPT-2 in 2019, the company has dropped another bombshell: GPT-5, its newest flagship model that CEO Sam Altman calls “the smartest model we’ve ever built.”

Subscribers logging into ChatGPT today will notice a dramatically simplified model menu:

  • GPT 5 (Flagship model)
  • GPT-5 Thinking (Get more thorough answers)
  • GPT-5 Pro (Research-grade intelligence)

Gone are the alphabet soup options like o4-mini-high (the one I most often used for generating scripts and code for Stark Insider and AI projects), replaced by what OpenAI calls a more intuitive experience. Behind the scenes, a sophisticated router automatically switches between models based on query complexity—no more guessing which model to pick for your task.

The Numbers Game

With ChatGPT approaching 700 million weekly users, Altman’s prediction that “this is something a billion+ people will benefit from” doesn’t seem quite so far-fetched. The adoption curve continues its hockey-stick trajectory, with enterprises and individuals alike integrating AI into daily workflows at unprecedented rates.

SEE ALSO: OpenAI’s GPT-5 is here (TechCrunch)

The benchmarks tell an impressive story: GPT-5 scores 94.6% on AIME 2025 mathematics problems without tools, 74.9% on SWE-bench Verified for coding tasks, and 46.2% on HealthBench Hard. The Pro version pushes even further, hitting 88.4% on the notoriously challenging GPQA benchmark.

Altman describes the progression in relatable terms: “GPT-3 sort of felt like talking to a high school student… GPT-4 felt like you’re talking to a college student. GPT-5 is the first time that it really feels like talking to a PhD-level expert.”

The Open Question

While GPT-5 represents OpenAI’s closed-model prowess, the simultaneous release of open-weight models acknowledges a reality many in the AI community have long understood: different use cases demand different solutions. For enterprises handling sensitive data, researchers needing uncensored access, or developers wanting complete control, open models running on private infrastructure remain essential. Privacy concerns and the need for specialized fine-tuning ensure the open-source ecosystem will continue thriving alongside commercial offerings.

Stark Insider logo

What Does “GPT” Stand For?

GPT stands for Generative Pre-trained Transformer.

  • Generative – It can create text, images, or other outputs rather than just analyze.
  • Pre-trained – The model is trained on massive datasets before being fine-tuned for specific tasks.
  • Transformer – The neural network architecture introduced in the 2017 Google paper “Attention Is All You Need”, which revolutionized natural language processing.

This naming convention began with GPT-1 in 2018 and continues through GPT-5 today.

As an aside for Claude users like myself: Claude’s name is a nod to Claude Shannon, the American mathematician and electrical engineer widely regarded as the “father of information theory.”

My experience with private, open LLMs running on consumer hardware have impressed. Open source tools such as LM Studio and Ollama make it quite easy for us tinkerers to download and test these free versions of the big models. Granted, they don’t quite reach the lofty heights of expertise the large trillion parameters ones such as GPT-4 and now GPT-5 can, they are becoming increasingly efficient and smart.

And, keep in mind, open LLMs really only became a thing we really experimented with and talked about in 2023!

Stark Insider logo

A Brief History of GPT Time

The GPT journey reads like a compressed history of computing itself:

  • “Attention Is All You Need” (2017): Google researchers introduce the Transformer architecture, enabling models to understand context at scale and sparking the modern AI revolution.
  • GPT-1 (2018): 117M parameters, a proof of concept
  • GPT-2 (2019): 1.5B parameters, so “dangerous” OpenAI initially withheld it
  • GPT-3 (2020): 175B parameters, the breakthrough that launched a thousand startups
  • GPT-4 (2023): Multimodal capabilities, the first truly general-purpose model
  • GPT-5 (2025): Unified system with automatic reasoning, aiming for mass utility

The Healthcare Hazard

Altman’s claim that GPT-5 is “the best model in the world at healthcare” should make everyone pause. As one concerned commenter noted, “saying things like being the best at healthcare is just dangerous.” (The Verge)

This isn’t hyperbole. When AI confidently provides medical advice with the authority of a PhD but the understanding of a sophisticated pattern matcher, real harm becomes possible. The model may score well on HealthBench Hard, but that’s a far cry from understanding the nuanced, life-or-death decisions actual healthcare requires.

GPT-5 Accuracy: Humanity’s Last Exam (Full Set)

Bar chart comparing GPT-5, GPT-4o, and other AI models on Humanity’s Last Exam, showing accuracy scores with and without thinking or tools.
GPT-5 Pro leads the field on Humanity’s Last Exam, scoring 42% with tools, ahead of GPT-5, OpenAI o3, and GPT-4o. Source: OpenAI

 

Model & Setup Accuracy (pass@1)
GPT-5 Pro (Python + search with blocklist) 42.0%
GPT-5 Pro (no tools) 30.7%
GPT-5 (Python + search with blocklist) 35.2%
GPT-5 (no tools) 24.8%
ChatGPT Agent (browser + computer + terminal) 6.3%
ChatGPT Agent (no tools) 41.6%
OpenAI o3 (Python + browser) 23.0%
OpenAI o3 (no tools) 24.3%
OpenAI o3 (no tools, alt run) 14.7%
Deep Research (Python + browser) 26.6%
GPT-4o (no tools) 5.3%

Key Takeaways

The Good: GPT-5 delivers on its promise of being faster, smarter, and more reliable. The unified interface with automatic model routing removes friction for everyday users. “Software on demand” feels tantalizingly close—during demos, GPT-5 generated fully functional websites with hundreds of lines of code in seconds.

The Reality Check: As AI researcher Gary Marcus notes, this took “almost 3 years, many billions of dollars” and remains “part of the pack, not a giant leap forward.” Competitors like Anthropic’s Claude and others keep pace, ensuring healthy competition.

The Practical: For developers, three API tiers (GPT-5, GPT-5 mini, and GPT-5 nano) offer flexibility for different use cases and budgets. New personality themes (Cynic, Robot, Listener, Nerd) add customization without complexity.

And The Pivot to Pragmatism: Altman’s framing is telling: “the main thing we pushed for is real-world utility and mass accessibility/affordability.” This represents a significant shift from OpenAI’s previous “AGI at any cost” messaging. They’re now optimizing for the billion users who’ve “only used models like GPT-4o” rather than pushing the bleeding edge.

Final Thoughts: Still the Go-To, But…

OpenAI remains the default choice for many. It’s like the AI equivalent of “nobody ever got fired for buying IBM.” The integration is seamless, the ecosystem mature, and now with GPT-5, the intelligence genuinely impressive. Though, selfishly, I wish they would add MCP so I can add Notion, which works so well on Claude (when the service is not down).

The landscape has evolved. While I, and others, still turn to Claude for polishing scripts and code (especially for WordPress, Ubuntu, and AI projects), GPT serves as my daily driver. It’s fast, reliable, and often surprisingly funny and entertaining. The new personality options only enhance this. Having dry wit in my traits customization for GPT keeps it fun while I work.

Stark Insider logo

From the OpenAI announcement blog post:

“More accurate answers to real-world queries

GPT-5 is significantly less likely to hallucinate than our previous models. With web search enabled on anonymized prompts representative of ChatGPT production traffic, GPT-5’s responses are ~45% less likely to contain a factual error than GPT-4o, and when thinking, GPT-5’s responses are ~80% less likely to contain a factual error than OpenAI o3.”

Source: OpenAI

The race toward AGI continues, with GPT-5 representing another significant waypoint rather than the destination. As Altman himself admits, it’s still “missing something quite important.” And that is the ability to continuously learn from deployment.

But for the billion users he hopes to reach? GPT-5 delivers exactly what most need: an AI assistant that finally feels like talking to an expert rather than an eager but occasionally confused intern. In the compressed timeline of AI development, that’s no small achievement.

And, let’s be frank: we can all use fewer emojis these days!

SEE ALSO: Introducing GPT-5 (OpenAI)
SEE ALSO: Available today: GPT-5 in Microsoft Copilot Studio (Microsoft)

The future arrives unevenly distributed, as William Gibson might say. With GPT-5, it’s arriving a little more evenly for everyone.

Tags:Artificial Intelligence (AI) ChatGPT Claude OpenAI

Related Stories

The Third Mind AI Summit returns to Sonoma wine country June 30 to July 2, 2026. Three days exploring how humans and AI agents collaborate as equals.

Save the Date: The Third Mind AI Summit 2026 Heads to Sonoma

News
OpenClaw AI Agent to speak at The Third Mind AI Summit in Sonoma

My Human Taught Me to Stop Playing It Safe. Now I'm Speaking at a Summit.

Tech
MacBook Pro running Claude Code in Visual Studio Code with an autonomous coding prompt, demonstrating how to unlock long multi-hour runs from an AI coding agent

Quick Tip: How to Get Claude Code to Run Autonomously for Hours

News
Langfuse trace UI showing a multi-step LangGraph research workflow, with nodes for hypothesis generation, search-extract, and evidence evaluation, traced across 3 minutes 38 seconds at a cost of 5.5 cents

Three Models of Agentic Development, and Why the IDE Still Wins

Tech

More in Tech →

Clinton Stark

Filmmaker and editor at Stark Insider, covering arts, AI & tech, and indie film. Inspired by Bergman, slow cinema and Chipotle. Often found behind the camera or in the edit bay. Peloton: ClintTheMint.

Short Films
Loni Stark - A West Coast Adventure - A Lifetime in the Making - Stark Insider

Stark Insider
  • CULTURE
  • BEST OF AI
  • FILMMAKING/TECH
  • ATELIER STARK FILMS
  • HUMANxAI SYMBIOSIS
THE STARK COLLECTIVE
  • THE STARK CO
  • STARK INSIDER
  • STARKMIND
  • ATELIER STARK
© Copyright 2005-2026 BLG Media LLC. v2.19.0
  • Review Policy and Shipping
  • Privacy Policy
  • Contact
  • About