Beyond Bigger Models: Recursion As The Next Scaling Law In AI

By Y Combinator

Share:

Key Concepts

  • Recursion in AI: The process of applying the same model weights repeatedly to an input to improve reasoning, rather than simply increasing model size.
  • HRM (Hierarchical Reasoning Models): Models that use a multi-level recursive structure (low-level, high-level, and outer refinement) to solve complex, incompressible tasks.
  • TRM (Tiny Recursive Models): A streamlined evolution of HRM that uses weight sharing and simplified architecture to achieve higher performance with fewer parameters.
  • Backpropagation Through Time (BPTT): The traditional method for training RNNs, which suffers from vanishing/exploding gradients and high memory costs when sequences are long.
  • Truncated BPTT (t=1): A technique used in HRM/TRM where gradients are only backpropagated through a single recursive step, bypassing the limitations of traditional BPTT.
  • Deep Equilibrium (DEQ) Learning: A method where models are trained to reach a fixed point, allowing for efficient memory usage by treating recursive steps as a form of mini-batching across latent space.
  • Incompressible Problems: Tasks (like Sudoku, mazes, or sorting) that require iterative computation and cannot be solved in a single feed-forward pass without external memory or recursive logic.

1. The Limitations of LLMs and the Case for Recursion

Current Large Language Models (LLMs) operate as feed-forward processes. While they appear to reason, they are essentially performing "next-token prediction."

  • The "One-Shot" Bottleneck: LLMs lack an internal "tape" or memory cache, making them inefficient at tasks requiring algorithmic steps (e.g., sorting). They are bounded by the number of transformer layers; if a task requires more steps than layers, the model fails.
  • Chain of Thought (CoT) vs. Inherent Reasoning: CoT is a "hack" that forces the model to output intermediate steps in token space. However, this is limited by the model's training data and human-labeled traces. True recursive reasoning happens in the continuous latent space, which is more expressive than discrete token space.

2. HRM: Hierarchical Reasoning Models

HRM introduces a brain-inspired hierarchy where different modules operate at different frequencies.

  • Methodology: It employs three levels of recursion:
    1. Low-level (LNET): Processes fine-grained details.
    2. High-level (HNET): Processes abstract, low-frequency information.
    3. Outer Refinement: A loop that refines the output over $N$ steps.
  • Key Innovation: Instead of full BPTT, HRM uses a "stop-grad" approach combined with fixed-point iteration. By not resetting the hidden states ($Z_L, Z_H$) between iterations, the model effectively creates a "mini-batch" of memory states, allowing it to learn without the memory overhead of traditional RNNs.

3. TRM: Tiny Recursive Models

TRM simplifies the HRM framework while improving performance.

  • Architectural Simplification: TRM collapses the separate LNET and HNET into a single shared-weight network ("NET"). It reduces the transformer layers from four to one, significantly lowering the parameter count (from 27M to 7M).
  • Optimization: Unlike HRM, TRM performs backpropagation through one full latent recursion step. This provides a more stable gradient signal, allowing the model to achieve 87% accuracy on ARC Prize tasks compared to HRM’s 70%.

4. Key Arguments and Perspectives

  • Bio-plausibility vs. GPU Efficiency: While biological inspiration (like brain wave frequencies) sparks research, the most successful models are those that prioritize computational efficiency on GPUs.
  • The "Sufficient, Not Necessary" Argument: Researcher Melanie Mitchell’s perspective is highlighted: increasing model size is a sufficient way to improve performance, but it is not necessary. Recursion offers a path to high performance without the massive compute costs of scaling parameters.
  • The Future of AI: The speakers argue that the next breakthrough lies in combining the massive, high-quality embedding spaces of giant LLMs with the efficient, recursive reasoning capabilities of TRMs.

5. Notable Quotes

  • "There is no compression in LLMs. Every single decode that I do, I still have to retain the entire Shakespeare novel just to decode a little bit." — Francois Shaard
  • "It is sufficient and not necessary to go bigger and get better performance; and it is sufficient and not necessary to add more recursion." — Attributed to the phenomenon discussed by Melanie Mitchell.

Synthesis and Conclusion

The shift toward recursive models represents a move away from "brute-force" scaling toward algorithmic efficiency. By utilizing truncated backpropagation and latent space memory, models like TRM can solve complex, incompressible problems with a fraction of the parameters used by standard LLMs. The ultimate goal for the field is to integrate these recursive "reasoning engines" into the powerful, general-purpose embedding architectures of modern LLMs, potentially unlocking a new class of highly efficient, reasoning-capable AI agents.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video