Startups & Business

AI IQ Rankings Spark Heated Debate: New Website Claims to Measure Machine Intelligence on Human Scale

New AI IQ website scores over 50 AI models on human IQ scale, sparking praise for clarity and criticism for oversimplification. Experts warn of dangerous illusions.

Published 2026-05-14 07:05:16 • Farkesli Staff

AI IQ Website Ignites Controversy Over Measuring Machine Intelligence

A new website, aiiq.org, is claiming to assign IQ scores to over 50 of the world's most advanced AI language models, drawing both enthusiastic support and fierce criticism from technologists and researchers. The site, launched by entrepreneur Ryan Shea, plots models on a standard bell curve, making complex AI performance instantly comparable—but experts warn the single-number metric is dangerously oversimplified.

AI IQ Rankings Spark Heated Debate: New Website Claims to Measure Machine Intelligence on Human Scale — Source: venturebeat.com

“This is super useful,” wrote technology commentator Thibaut Mélen on X. “Much easier to understand model progress when it’s mapped like this instead of another giant leaderboard table.” Business strategist Brian Vellmure agreed, saying “This is helpful. Anecdotally tracks with personal experience.”

But the backlash came swiftly. “It’s nonsense. AI is far too jagged. The map is not the territory,” posted AI Deeply, an AI commentary account, echoing a concern shared by many researchers about the illusion of precision from a single score.

Background: How the AI IQ Score Is Calculated

AI IQ was created by Ryan Shea, an engineer and co-founder of blockchain platform Stacks, along with early investments in unicorns like OpenSea and Mercury. Shea holds a Bachelor of Science in Mechanical Engineering from Princeton University.

The methodology groups 12 benchmarks into four reasoning dimensions: abstract, mathematical, programmatic, and academic. The composite IQ is a straight average of these four scores, each derived from hand-calibrated difficulty curves that compress ceilings for easier benchmarks and retain higher ceilings for harder, less gameable ones.

Abstract reasoning uses ARC-AGI-1 and ARC-AGI-2, which test pattern recognition. Mathematical reasoning includes FrontierMath, AIME, and ProofBench. Programmatic reasoning draws from Terminal-Bench 2.0, SWE-Bench Verified, and SciCode. Academic reasoning uses Humanity's Last Exam, CritPt, and GPQA Diamond.

Supporters See Clarity; Critics See Misleading Simplicity

Enterprise technologists praise the visualizations for making a complex market legible. “Much easier to understand model progress when it’s mapped like this,” Mélen added. The interactive charts at aiiq.org have ricocheted across social media this past week.

However, researchers argue that reducing a model’s sprawling capabilities to a single IQ number creates a dangerous illusion. AI Deeply’s criticism that “the map is not the territory” highlights the risk of oversimplifying AI’s jagged skills. The debate underscores a fundamental divide in how AI progress should be communicated.

What This Means for AI Evaluation and the Tech Industry

For businesses and investors trying to choose among dozens of rapidly improving models, the AI IQ site offers a quick shortcut—but one that may obscure crucial trade-offs. A high overall IQ doesn’t guarantee excellence in every task, and a low score might miss specialized strengths.

This controversy signals a growing need for standardized, nuanced AI benchmarks that stakeholders can trust. As AI models become more integrated into critical systems, the pressure to simplify comparisons will only increase—alongside the risks of relying on a single number that may not reflect real-world performance.

The debate at aiiq.org is likely to intensify as more models are added and the methodology is scrutinized. For now, the site has already achieved something rare: making the arcane world of AI benchmarks a topic of public conversation.