Stanford’s 2026 AI Index: Where AI Actually Stands (report)

KEY POINTS:

Capability is accelerating, not plateauing. SWE-bench coding scores jumped from 60 to nearly 100 percent in a single year, organizational adoption hit 88 percent, and generative AI reached 53 percent of the population faster than either the PC or the internet.
The frontier is jagged. The same models that win gold at the International Mathematical Olympiad read analog clocks correctly only 50.1 percent of the time. Headline benchmarks are a poor proxy for how a model will behave on the work you actually care about.
Safety, policy, and education are falling behind capability. Documented AI incidents rose from 233 to 362 year over year, only half of U.S. middle and high schools have AI policies, and experts and the public disagree by 50 points on whether AI will help people do their jobs.

The ninth edition of Stanford’s AI Index Report landed this week, and the headline from co-chairs Yolanda Gil and Raymond Perrault sets the tone. “The data does not point in a single direction,” they write. “It reveals a field that is scaling faster than the systems around it can adapt.”

Produced by the Stanford Institute for Human-Centered Artificial Intelligence, the 2026 AI Index spans nine chapters covering research and development, technical performance, responsible AI, economics, science, medicine, education, policy, and public opinion. It is one of the few comprehensive data products on AI that is not produced by a lab with a stake in the outcome, which is why governments, businesses, and newsrooms cite it. Here are the findings that stood out to us, with some context on what they mean for people using these tools every day.

Capability is still accelerating, not plateauing

The narrative that AI progress has hit a wall does not survive contact with the data. On SWE-bench Verified, a benchmark where models have to resolve real GitHub issues, scores climbed from 60 percent to nearly 100 percent in a single year. Frontier models now meet or exceed human baselines on PhD-level science questions, multimodal reasoning, and competition mathematics. Gemini Deep Think earned a gold medal at the International Mathematical Olympiad.

Adoption tracks capability. Organizational adoption reached 88 percent in 2025, and four out of five university students now use generative AI for coursework. When teams actually absorb AI into their workflows, the org dynamics start to shift in ways the adoption number alone doesn’t capture. Stanford notes that generative AI hit 53 percent population-level adoption within three years, which is faster than either the personal computer or the internet.

Industry is doing the work. More than 90 percent of notable frontier models released in 2025 came from private companies rather than academic labs.

The U.S.-China gap has effectively closed

Number of notable AI models by select geographicareas, 2003–25 Source: Epoch AI, 2026 | Chart: 2026 AI Index report — Number of notable AI models by select geographic areas, 2003–25
Source: Epoch AI, 2026 | Chart: 2026 AI Index report

The report confirms something that was becoming visible through the year. American and Chinese labs have been trading the performance lead. DeepSeek-R1 briefly matched the top U.S. model in February 2025. As of March 2026, Stanford measures Anthropic’s leading model at just 2.7 percent ahead of the best Chinese model on their basket of benchmarks.

The strengths are split. The U.S. still produces more top-tier models and higher-impact patents. China leads in publication volume, citations, patent output, and industrial robot installations. South Korea stands out for innovation density, leading the world in AI patents per capita.

The United States hosts 5,427 AI data centers, more than ten times any other country, and consumes more energy to run them than any other country.

On infrastructure, the picture is more lopsided. The United States hosts 5,427 AI data centers, more than ten times any other country, and consumes more energy to run them than any other country. One Taiwanese foundry, TSMC, fabricates nearly every leading AI chip, a supply chain dependency that policymakers have been watching closely. A TSMC expansion in the United States began operations in 2025.

The jagged frontier keeps getting more jagged

This is the finding that most deserves to travel beyond the AI press. The same models that win gold at the International Mathematical Olympiad can correctly read an analog clock only 50.1 percent of the time. AI agents went from 12 percent to roughly 66 percent task success on OSWorld, a benchmark that tests general computer use across operating systems, but they still fail about one out of every three structured tasks.

Stanford’s framing is useful here. We do not have generally reliable AI. We have AI that is superhuman in narrow benchmarked domains and unreliable in others, sometimes within the same conversation. We have written about what divergence looks like in practice when you run more than one agent on the same task. The practical implication is that headline benchmark scores are a poor proxy for how a model will behave on a task you actually care about. Evaluate before you deploy, or better, evaluate two models side by side on your own work and see where they diverge.

Responsible AI reporting is lagging capability

Almost every frontier lab reports capability benchmark results. Reporting on responsible AI benchmarks, which cover safety, bias, and harmful output rates, remains spotty. Documented AI incidents rose to 362 in 2025, up from 233 in 2024. Stanford also cites recent research showing that improving one responsible AI dimension can degrade another, so the tradeoffs are real and not always intuitive.

For teams building AI into their workflows, this is an argument for keeping a human in the loop on anything consequential, and for treating model updates as events that require fresh evaluation rather than drop-in replacements.

Adoption is outpacing policy, especially in education

Over 80 percent of U.S. high school and college students now use AI for school-related tasks. Only half of middle and high schools have AI policies in place, and just 6 percent of teachers say those policies are clear. The gap matters because even when AI helps, it also changes the shape of the work, which is a harder thing to teach. That mismatch is creating confusion that students and teachers are left to navigate on their own.

New AI PhDs in the U.S. and Canada rose 22 percent between 2022 and 2024

Outside the classroom, AI engineering skills are accelerating fastest in the United Arab Emirates, Chile, and South Africa. New AI PhDs in the U.S. and Canada rose 22 percent between 2022 and 2024, though that cohort is heading into academia rather than industry.

The economic data is equally striking. U.S. private AI investment reached $285.9 billion in 2025, more than 23 times China’s $12.4 billion in private investment, although government-guided funds make China’s total harder to measure from the outside. The estimated value of generative AI tools to U.S. consumers reached $172 billion annually by early 2026, and the median value per user tripled between 2025 and 2026. Many of those tools are free at the point of use.

A worrying drop in U.S. talent attraction

One number buried in the Economy chapter is worth sitting with. The count of AI researchers and developers moving to the United States has dropped 89 percent since 2017, with an 80 percent decline in the last year alone. The U.S. still leads on capital and entrepreneurship, with 1,953 newly funded AI companies in 2025, more than ten times the next closest country. But the talent flows have reversed.

Experts and the public are reading this moment differently

When asked about AI’s impact on how people do their jobs, 73 percent of experts expect a positive impact. Only 23 percent of the public agrees. That is a 50-point gap, and similar splits show up on questions about the economy and medical care.

Trust in institutions to regulate AI is fragmented. Among surveyed countries, the United States reported the lowest level of trust in its own government to regulate AI, at 31 percent. Globally, the EU is trusted more than the United States or China to regulate AI effectively.

What this means for how you use AI right now

A few things stand out as practical takeaways if you are reading the report as a user rather than a policymaker.

Build AI fluency deliberately. Adoption is racing ahead of formal education, and the people who know how to work with these tools, test their output, and recover from their mistakes have a real advantage. That is true at every stage of a career, not just the start.

For a worked example of what that looks like as a home setup, see how we built the IPE as our own AI command center.

Set policies in your household and at work. If your kids are using AI for school and the school does not have a clear policy, you need one at home. The same goes for teams that have quietly adopted AI without rules about when not to use it, what not to share with it, and how to disclose its use to colleagues and customers.

Treat benchmarks with caution. A model that aces graduate-level science can still misread a clock or confidently cite a paper that does not exist. If you are putting AI into a workflow that matters, evaluate it on your actual work, not on someone else’s benchmark suite.

Watch the responsible AI reporting, not just the capability reporting. The gap between the two is where incidents come from. When a new model launches, the safety card matters as much as the benchmark chart.

Download: Stanford 2026 AI Report

423 pages (PDF)

The 2026 AI Index Report is available for free from Stanford HAI. It is long, it is densely sourced, and the public dataset is downloadable for your own analysis. If you work with AI tools or make decisions about them, it is worth the time.