What Stanford's Latest Data Reveals About Europe's AI Position
IN BRIEF: Stanford HAI's 2026 AI Index documents a field where capability is accelerating faster than the benchmarks designed to measure it, where the US-China performance gap has effectively closed, and where transparency from frontier labs is declining precisely as their systems grow more consequential. For European policymakers and technologists, the report surfaces uncomfortable truths: the continent remains absent from the top tier of model development, talent flows are shifting, and the environmental costs of AI infrastructure are mounting faster than governance frameworks can adapt.
The numbers in this report demand more than reading – they demand response. That conversation happens May 19 in Vienna at Human x AI Europe, where the people shaping Europe's AI trajectory will be in the room.
The Transparency Paradox
A striking finding anchors the Research and Development chapter of this year's report: industry produced over 90% of notable AI models in 2025, yet the most capable systems are now the least transparent. Training code, parameter counts, dataset sizes, and training duration are no longer disclosed for several of the most resource-intensive systems, including those from OpenAI, Anthropic, and Google.
This creates a measurement problem that compounds over time. Parameter counts have held near 1 trillion for three years – but that stability may be an artifact of non-disclosure rather than a technical plateau. Training compute, which researchers can estimate independently through inference costs and energy consumption, has continued to rise.
For regulators attempting to calibrate risk thresholds or procurement officers evaluating vendor claims, this opacity is not merely inconvenient. It undermines the evidentiary basis for policy. The EU AI Act's tiered approach assumes that certain characteristics – compute thresholds, capability profiles, deployment contexts – can be verified. When frontier labs stop publishing the relevant data, verification becomes guesswork.
The US-China Convergence
The report's Technical Performance chapter documents what many suspected but few had quantified: the US-China AI model performance gap has effectively closed. US and Chinese models have traded places at the top of performance rankings multiple times since early 2025. In February 2025, DeepSeek-R1 briefly matched the top US model. As of March 2026, the top US model leads by 2.7% – a gap that fluctuated over the past year while remaining in single digits.
The Arena Leaderboard, which rates models through human voting, shows six organizations clustered within 79 Elo points: Anthropic (1,503), xAI (1,495), Google (1,494), OpenAI (1,481), Alibaba (1,449), and DeepSeek (1,424). Competitive pressure is shifting from raw capability toward cost, reliability, and domain-specific performance.
China's research output reinforces this picture. According to the report's key takeaways, China leads in publication volume, citations, and patent grants, while the US retains higher-impact patents and produced 50 notable models in 2025 to China's 30. China's share of the top 100 most-cited AI papers grew from 33 in 2021 to 41 in 2024.
Europe appears in neither column. The continent's absence from frontier model development is not new, but the closing US-China gap makes it more consequential. A bipolar AI landscape offers different strategic options than a unipolar one – and Europe's leverage depends on whether it can offer something neither pole provides.
The Jagged Frontier
R&D World's analysis of the report highlights what researchers call the "jagged frontier" – AI systems that achieve gold-medal performance on mathematical olympiads while failing to reliably read analog clocks.
Gemini Deep Think scored 35 points at the 2025 International Mathematical Olympiad, working end-to-end in natural language within the 4.5-hour time limit. On ClockBench, the top model read analog clocks correctly 50.1% of the time, compared with 90.1% for humans.
This unevenness has direct implications for deployment decisions. The report notes that AI models are expanding into professional domains – tax, mortgage processing, corporate finance, legal reasoning – showing performance ranging from 60% to 90% in evaluations. The performance of the top 15 models is separated by as little as 3 percentage points in each benchmark. These domains, where high competency and reliability are required, remain a challenge.
For public sector technologists evaluating AI procurement, the jagged frontier means that benchmark performance in one domain provides limited signal about reliability in adjacent tasks. A system that excels at document classification may fail at the edge cases that matter most for administrative fairness.
Infrastructure and Environmental Costs
The report documents AI's physical footprint with unusual precision. Global AI compute capacity grew 3.3x per year since 2022, reaching 17.1 million H100-equivalents. Nvidia accounts for over 60% of total compute, with Google and Amazon supplying much of the remainder.
The United States hosts 5,427 data centers – more than ten times any other country. A single company, TSMC (Taiwan Semiconductor Manufacturing Company), fabricates almost every leading AI chip, making the global AI hardware supply chain dependent on one foundry in Taiwan. A TSMC-US expansion began operating in 2025, but concentration risk remains.
Environmental costs are scaling with infrastructure. In 2025, Grok 4's estimated training emissions reached 72,816 tons of CO₂ equivalent – roughly the emissions from driving 17,000 cars for one year. AI data center power capacity rose to 29.6 GW, comparable to New York state at peak demand. Annual GPT-4o inference water use alone may exceed the drinking water needs of 12 million people.
For European policymakers navigating the intersection of AI strategy and climate commitments, these numbers create a constraint that US and Chinese competitors face less acutely. Europe cannot simply replicate the hyperscaler buildout model without confronting its own decarbonization targets.
The Talent Drain Accelerates
Perhaps the most consequential finding for long-term competitiveness: the number of AI researchers and developers moving to the United States has dropped 89% since 2017. The decline is accelerating – down 80% in the last year alone.
The US remains home to more AI talent than any other country, but it is attracting new talent at the lowest rate in over a decade. Switzerland and Singapore lead the world in AI researchers and developers per capita.
This shift creates an opening, but only if European institutions can convert it into sustained capacity. Talent that no longer flows to the US does not automatically flow to Europe. It may stay in place, move to Singapore, or exit the field entirely.
What This Means for European Strategy
The 2026 AI Index does not prescribe policy. It provides the evidentiary base from which policy must be built. Several implications emerge:
Transparency as leverage. If frontier labs are abandoning disclosure, Europe's regulatory framework could become a forcing function – requiring transparency as a condition of market access. This only works if enforcement mechanisms are credible.
Efficiency over scale. The report notes that OLMo 3.1 Think 32B, with nearly 90 times fewer parameters than Grok 4, achieves comparable results on several benchmarks through pruning, deduplication, and curation alone. Europe's compute constraints may be less binding than assumed if efficiency gains continue.
Deployment as differentiation. The US leads in model development; China leads in research volume and industrial robotics. Europe's comparative advantage may lie in deployment pathways – turning AI capabilities into accountable public services, procurement frameworks that work, and governance models that other democracies can adopt.
The data is clear. The question is whether European institutions can move at the speed the data demands.
Frequently Asked Questions
Q: What percentage of notable AI models were produced by industry in 2025?
A: According to the 2026 AI Index Report, industry produced over 90% of notable AI models in 2025, with the most capable models now being the least transparent about their training parameters and methods.
Q: How has the US-China AI performance gap changed?
A: The gap has effectively closed. As of March 2026, the top US model leads by only 2.7%, and US and Chinese models have traded the top position multiple times since early 2025.
Q: What is the "jagged frontier" in AI capabilities?
A: The jagged frontier describes AI systems that achieve expert-level performance in some domains (like mathematical olympiads) while failing at seemingly simpler tasks (like reading analog clocks at 50.1% accuracy versus 90.1% for humans).
Q: How much has global AI compute capacity grown?
A: Global AI compute capacity grew 3.3x per year since 2022, reaching 17.1 million H100-equivalents, with Nvidia accounting for over 60% of total compute.
Q: What are the environmental costs of training frontier AI models?
A: Grok 4's estimated training emissions reached 72,816 tons of CO₂ equivalent. AI data center power capacity rose to 29.6 GW, and annual GPT-4o inference water use may exceed the drinking water needs of 12 million people.
Q: How has AI talent migration to the United States changed?
A: The number of AI researchers and developers moving to the US has dropped 89% since 2017, with an 80% decline in the last year alone – the lowest attraction rate in over a decade.