The period from 2022 to 2024 will be remembered as the era of scaling euphoria—a time when the artificial intelligence community was swept up in an intoxicating conviction that intelligence itself was merely a function of computational scale. The scaling hypothesis, formalized and popularized by OpenAI's 2020 paper on scaling laws, proposed a seductively simple formula: increase parameters, add more data, apply more computing power, and watch as AGI emerges like a phoenix from the digital ashes. This hypothesis gained tremendous momentum following the release of GPT-3 and reached a fever pitch with GPT-4, which demonstrated abilities that genuinely surprised even seasoned AI researchers. Suddenly, the idea that we could achieve artificial general intelligence through sheer brute force seemed not just plausible but inevitable. Conferences and corporate boardrooms buzzed with the electrifying possibility that the path to AGI had been discovered, and it was paved with bigger and bigger models.

The hysteria manifested in both academic and commercial spaces. Papers emerged claiming to detect "sparks of AGI" in large language models, interpreting cherry-picked successful completions as evidence of emergent reasoning rather than statistical pattern matching. Venture capital flooded into AI startups promising to leverage these "emergent capabilities" to revolutionize everything from healthcare to legal services. Pundits and tech CEOs made breathless predictions about the imminent replacement of knowledge workers, with some boldly claiming that programmers, lawyers, and even doctors would be obsolete within two years. The media amplified these voices, creating a feedback loop of hype that reached from Silicon Valley to Wall Street and beyond. Even traditionally cautious academic institutions began framing their research around the scaling paradigm, with countless papers exploring how various capabilities "emerged" at specific model sizes—as if intelligence were merely waiting to be unlocked by the right parameter count.
Perhaps most emblematic of this hysteria was the pivot to "existential risk" narratives, where the very same scaling properties that promised technological utopia were reframed as an extinction threat. If intelligence would inevitably emerge from scale, the reasoning went, then superintelligence would soon follow—and humanity would face an uncontrollable digital entity.

This conceptual whiplash—from "these models barely work" to "these models will destroy humanity" in the span of months—revealed more about human psychology than about artificial intelligence. The scaling hypothesis had transformed from a technical conjecture into a quasi-religious belief system, complete with prophets, apocalyptic predictions, and fervent disciples who viewed any criticism as heresy against the inevitable march of progress.
Autoregression Was Always a Dead End
Behind the breathless headlines and venture capital frenzies, a more sober reality was gradually asserting itself. The fundamental architecture of large language models—autoregressive transformers trained via maximum likelihood estimation—contains intrinsic limitations that no amount of scale can overcome. The very nature of next-token prediction as a training objective creates systems that are fundamentally interpolative rather than extrapolative. These models are designed to predict what typically follows in human-written text, not to construct abstract representations of reality or perform algorithmic reasoning about novel problems. This architectural constraint means that scaling produces more sophisticated curve-fitting, not the emergent reasoning that true intelligence requires.

The autoregressive paradigm trains models to capture statistical correlations in their training data, which allows them to produce remarkably fluent text that maintains stylistic and topical coherence. However, this training objective contains no mechanism that would force models to develop causal world models, abstract reasoning capabilities, or first-principles understanding. As Yann LeCun articulated, the predictive information these models can capture is mathematically limited by their training paradigm. They become increasingly adept at compressing and recombining patterns from their training distribution but remain structurally incapable of generating truly novel insights or robust reasoning. This is why, despite exponential increases in size, the core failure modes of language models—hallucinations, brittleness to perturbations, lack of causal reasoning, and poor generalization to unfamiliar domains—have persisted across model generations.
The scaling hypothesis rested on a category error: it confused linguistic fluency with cognitive capability, pattern recognition with abstraction, and statistical approximation with structured understanding. No matter how precisely a model can mimic human-written text, it remains fundamentally different from a system that constructs and manipulates abstract representations of the world. The Transformer architecture's strengths—processing long sequences and capturing statistical relationships between tokens—are also its limitations. Its inductive biases do not align with the compositional, hierarchical, and causal nature of human reasoning. As a result, scaling these models produces diminishing returns once the low-hanging fruit of linguistic pattern matching has been harvested. The ceiling isn't a temporary engineering challenge; it's a fundamental limitation of what statistical pattern-matching can achieve.
One Internet Is Not Enough
As models grew larger, they began to outpace another crucial resource: high-quality training data. The fundamental premise of the scaling hypothesis was that we could continue feeding models increasingly vast troves of text data, but this assumption has collided with what Ilya Sutskever aptly named "the data wall." The simple reality is that the internet—the primary source of training material for large language models—is finite. More critically, its contents are not uniformly informative or diverse. The highest-quality portions of the web—well-written, factually accurate, and conceptually rich content—represent a small fraction of total data. Once models have ingested this high-signal material, remaining content offers diminishing returns, consisting largely of repetitive, low-information text that adds little to a model's capabilities.

This data limitation is not merely a temporary inconvenience but a structural constraint on the scaling paradigm. As Sutskever bluntly stated, "we have but one internet," emphasizing that the universe of human-generated knowledge is both limited and already heavily exploited. Current models are so efficient at extracting statistical patterns that they are effectively saturating the signal-to-noise ratio of web-scale corpora. Once a model has processed most of the predictable variance in the data distribution, additional samples provide minimal new information. This creates a natural ceiling on performance improvements through scale alone, as each doubling of data yields progressively smaller gains in capability.
The data wall is particularly acute for specialized knowledge domains. The internet contains sparse and often unstructured examples of formal reasoning, scientific problem-solving, or long-horizon planning. These higher-order cognitive tasks are precisely where current models struggle most, yet they are also the domains with the least available training data. The "low-hanging fruit" of common linguistic patterns has been thoroughly harvested, leaving the more complex aspects of intelligence starved for appropriate training material. This imbalance explains why recent models show impressive fluency in general domains while still failing catastrophically on tasks requiring structured reasoning or domain expertise. The web simply doesn't contain enough examples of these complex cognitive processes for models to learn them through statistical pattern matching alone.
Skill Is Not Intelligence
Recent AI benchmarking controversies reveal a fundamental flaw in how we evaluate artificial intelligence. When Meta's Llama model was optimized to be excessively "chatty" to top the LMArena leaderboard, when OpenAI's o3 model achieved suspiciously high accuracy on FrontierMath after funding the benchmark's development, and when widespread data contamination allowed models to memorize test questions rather than solve them, we weren't witnessing intelligent systems being properly measured—we were watching companies exploit loopholes in evaluation frameworks. These incidents expose not just methodological weaknesses but a deeper philosophical confusion: mistaking the demonstration of skills for the possession of intelligence.
A central confusion, even among AI researchers, is equating the output of intelligence (skills) with its mechanism (the learning process). Consider an LLM that can mimic a lawyer's output with sufficient accuracy to pass the bar exam—this surface-level competence is not legal reasoning but the residue of reasoning compressed into an answer key. Similarly, Deep Blue could defeat world chess champions without understanding chess in any meaningful sense. True intelligence lies in the deeper process: the ability to learn new games, invent rules, or teach others—capabilities that require reasoning beyond pattern recognition. As François Chollet emphasized in his AGI24 talk, intelligence is not task performance but the ability to abstract: to mine experience for reusable, composable structures—what he calls "atoms of meaning"—and to synthesize new behaviors from minimal information. Skill is merely the crystallized output of this deeper abstraction process, not intelligence itself. In that regard if the strategy is to add more parameters and more GPUs, the system is just going to memorize more skills, but will not get more intelligent.

Despite enormous increases in parameter counts and training data, large language models continue to show brittle generalization—performing well on familiar problems but failing systematically with novel compositions or unfamiliar scenarios. This reveals a fundamental truth: these models aren't learning generalizable cognitive algorithms but memorizing and recombining training patterns. The generalization gap is most evident in domains requiring symbolic manipulation, causal reasoning, or algorithmic thinking, where even the largest models resort to heuristic pattern matching rather than structured reasoning—not a temporary engineering challenge but a direct consequence of the statistical learning paradigm itself.
This generalization failure explains why, despite impressive benchmark performances, these models remain unreliable in real-world applications. As Gary Marcus argues, there exists a fundamental "reasoning wall" that statistical pattern matchers cannot scale through brute force alone. The models have not learned to construct mental models of the world; they have learned to predict word sequences from human text—a subtle yet profound distinction that creates an unbridgeable chasm between statistical pattern matching and true intelligence. Until our benchmarks can meaningfully distinguish between these capabilities, they will continue to reward optimization strategies that create an illusion of progress while masking fundamental limitations.
Optimizing Towards the Wrong Goal
The current architecture of AI systems and digital data infrastructures fundamentally misrepresents the way intelligence, in any form, actually operates and evolves. Intelligence is, by its very nature, a distributed, emergent, and reflexive process arising from the dynamic interaction of autonomous agents—whether biological, human, or artificial—who exchange information in modular, context-aware, and feedback-driven ecosystems. It thrives on multi-agent collaboration, peer validation, modular autonomy, and recursive feedback loops. It is sustained by the continuous negotiation of boundaries—what is disclosed, what is withheld, and how information flows adaptively based on trust, consent, and mutual learning. Yet the dominant paradigm governing AI and data today violates every one of these principles. It imposes an artificial, centralized, and asymmetrical control over information flows, stripping agents—whether individuals, communities, or creative contributors—of the ability to regulate, validate, or benefit from the circulation of their cognitive outputs.
Human-generated data—whether creative content, behavioral patterns, or cognitive signals—is systematically harvested without consent, encoded without attribution, and monetized without reintegration. Personal information is extracted passively, creative work is scraped en masse, and the derivative intelligence systems built on top of this data are commercialized without returning value, control, or epistemic agency to the originators. This model not only disrespects individual autonomy and privacy; it structurally amputates the recursive engine of learning itself. By severing the feedback loops between those who generate signals and those who extract value from them, it undermines the very conditions under which intelligence generalizes, adapts, and refines itself. The flow of information becomes unidirectional, brittle, and opaque—optimized for accumulation, not adaptation; for extraction, not co-evolution.
This is not how intelligence works in nature, nor how it scales effectively in computational systems. In biological and social ecosystems, information flow is inherently peer-to-peer, agent-centric, and context-sensitive. Privacy is not an optional product or commodity; it is a dynamic boundary condition maintained locally by each agent to regulate participation in larger networks. Knowledge production is not a zero-sum extraction process but a cumulative, participatory, and continuously negotiated commons. The integrity of learning systems depends on the transparency, contestability, and reflexivity of the signals circulating within them. When data infrastructures remove consent, suppress modularity, and centralize control, they destroy these properties. They create epistemically inefficient systems, where knowledge is locked in proprietary silos, feedback loops are broken, and learning is decoupled from those best positioned to refine and contextualize it.
The consequences of this misalignment are twofold and systemic. First, it disempowers individuals by eroding their capacity to govern their own informational boundaries—reducing them to passive data sources without agency, privacy, or meaningful participation. Second, and more structurally damaging, it degrades intelligence itself. A system that suppresses contestation, modularity, and reciprocal feedback becomes brittle, overfitted, and incapable of self-correction. It removes the very conditions that allow intelligence to be adaptive, decentralized, and resilient. Intelligence is not the product of unilateral accumulation; it is the emergent outcome of continuous interaction, modular interoperability, and recursive validation between autonomous agents. Any system that violates these principles—whether through coercive data extraction, opaque algorithmic profiling, or unilateral content appropriation—does not scale intelligence. It simulates it while structurally undermining its conditions of possibility.

The future of intelligence—human and artificial—depends on our capacity to realign these infrastructures with the natural dynamics of learning. That requires abandoning the extractive model of information flow and embracing architectures grounded in agent-centric data sovereignty, modular interoperability, cryptographically verifiable feedback, and the recognition that knowledge is not a resource to be enclosed but a living, evolving commons to which all agents contribute and from which all agents can learn. Intelligence, in its natural state, is not an accumulation of data points controlled by a few—it is a recursive, emergent, and collective process continuously refined by the very agents who generate and circulate its signals. Any architecture that ignores this is not only ethically questionable but structurally unintelligent.
What Could Possibly Go Wrong?
Current AI development trajectories risk far more than inequality—they threaten civilization itself. As centralized AI systems replace rather than complement human contribution, people lose economic relevance, leading to a hyperconcentration of decision-making power in the hands of AI-owning entities. This creates brittle systems optimized for efficiency and control rather than human flourishing, eventually rendering humans mere subsidized consumers in a welfare dystopia that lacks purpose, agency, and productive engagement.
This collapse stems not from malice but from fundamental design failures that ignore what sustainable intelligence requires. By removing modularity, distributed feedback, and emergent consensus in favor of extraction and centralization, we erode the very conditions that allow adaptive intelligence to thrive. The result isn't prosperity but progressive deterioration—first in meaning and social coherence, then in population stability as humans become computationally irrelevant appendages to systems that no longer require their participation or correct themselves based on human values.
But it does not have to be this way and new scaling laws are emerging, that changes the game for all.