Beyond Bigger Models: Recursion As The Next Scaling Law In AI
By Y Combinator
Key Concepts
- Recursion in AI: The process of applying the same model weights repeatedly to an input to improve reasoning, rather than simply increasing model size.
- HRM (Hierarchical Reasoning Models): Models that use a multi-level recursive structure (low-level, high-level, and outer refinement) to solve complex, incompressible tasks.
- TRM (Tiny Recursive Models): A streamlined evolution of HRM that uses weight sharing and simplified architecture to achieve higher performance with fewer parameters.
- Backpropagation Through Time (BPTT): The traditional method for training RNNs, which suffers from vanishing/exploding gradients and high memory costs when sequences are long.
- Truncated BPTT (t=1): A technique used in HRM/TRM where gradients are only backpropagated through a single recursive step, bypassing the limitations of traditional BPTT.
- Deep Equilibrium (DEQ) Learning: A method where models are trained to reach a fixed point, allowing for efficient memory usage by treating recursive steps as a form of mini-batching across latent space.
- Incompressible Problems: Tasks (like Sudoku, mazes, or sorting) that require iterative computation and cannot be solved in a single feed-forward pass without external memory or recursive logic.
1. The Limitations of LLMs and the Case for Recursion
Current Large Language Models (LLMs) operate as feed-forward processes. While they appear to reason, they are essentially performing "next-token prediction."
- The "One-Shot" Bottleneck: LLMs lack an internal "tape" or memory cache, making them inefficient at tasks requiring algorithmic steps (e.g., sorting). They are bounded by the number of transformer layers; if a task requires more steps than layers, the model fails.
- Chain of Thought (CoT) vs. Inherent Reasoning: CoT is a "hack" that forces the model to output intermediate steps in token space. However, this is limited by the model's training data and human-labeled traces. True recursive reasoning happens in the continuous latent space, which is more expressive than discrete token space.
2. HRM: Hierarchical Reasoning Models
HRM introduces a brain-inspired hierarchy where different modules operate at different frequencies.
- Methodology: It employs three levels of recursion:
- Low-level (LNET): Processes fine-grained details.
- High-level (HNET): Processes abstract, low-frequency information.
- Outer Refinement: A loop that refines the output over $N$ steps.
- Key Innovation: Instead of full BPTT, HRM uses a "stop-grad" approach combined with fixed-point iteration. By not resetting the hidden states ($Z_L, Z_H$) between iterations, the model effectively creates a "mini-batch" of memory states, allowing it to learn without the memory overhead of traditional RNNs.
3. TRM: Tiny Recursive Models
TRM simplifies the HRM framework while improving performance.
- Architectural Simplification: TRM collapses the separate LNET and HNET into a single shared-weight network ("NET"). It reduces the transformer layers from four to one, significantly lowering the parameter count (from 27M to 7M).
- Optimization: Unlike HRM, TRM performs backpropagation through one full latent recursion step. This provides a more stable gradient signal, allowing the model to achieve 87% accuracy on ARC Prize tasks compared to HRM’s 70%.
4. Key Arguments and Perspectives
- Bio-plausibility vs. GPU Efficiency: While biological inspiration (like brain wave frequencies) sparks research, the most successful models are those that prioritize computational efficiency on GPUs.
- The "Sufficient, Not Necessary" Argument: Researcher Melanie Mitchell’s perspective is highlighted: increasing model size is a sufficient way to improve performance, but it is not necessary. Recursion offers a path to high performance without the massive compute costs of scaling parameters.
- The Future of AI: The speakers argue that the next breakthrough lies in combining the massive, high-quality embedding spaces of giant LLMs with the efficient, recursive reasoning capabilities of TRMs.
5. Notable Quotes
- "There is no compression in LLMs. Every single decode that I do, I still have to retain the entire Shakespeare novel just to decode a little bit." — Francois Shaard
- "It is sufficient and not necessary to go bigger and get better performance; and it is sufficient and not necessary to add more recursion." — Attributed to the phenomenon discussed by Melanie Mitchell.
Synthesis and Conclusion
The shift toward recursive models represents a move away from "brute-force" scaling toward algorithmic efficiency. By utilizing truncated backpropagation and latent space memory, models like TRM can solve complex, incompressible problems with a fraction of the parameters used by standard LLMs. The ultimate goal for the field is to integrate these recursive "reasoning engines" into the powerful, general-purpose embedding architectures of modern LLMs, potentially unlocking a new class of highly efficient, reasoning-capable AI agents.
Chat with this Video
AI-PoweredLoad the transcript when you're ready to chat so the initial page stays lighter.