Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Enterprise Internal Knowledge
By Stanford Online
Key Concepts
- Post-Training: The process of aligning a pre-trained base model to follow instructions, adhere to safety guidelines, and perform specific tasks.
- RLVR (Reinforcement Learning with Verifiable Rewards): A training technique where models are rewarded based on deterministic outcomes (e.g., code compilation, unit tests), driving emergent reasoning capabilities.
- Test-Time Compute: The strategy of allocating more compute during inference (e.g., "thinking" time) to improve reasoning performance, as seen in models like OpenAI’s o1.
- Continual Learning: The ability of a model to learn from sparse, real-world feedback in production environments to improve over time without full retraining.
- Synthetic Data: AI-generated data used to train subsequent models, increasingly important as the supply of high-quality human-generated data reaches a "wall."
- Agentic Coding: Using models to interact with the real world via code, treating code as a universal language for task execution.
1. The Evolution of Model Training
Yash Bottle outlines the progression of deep learning from the AlexNet era—which proved that scaling compute and data leads to predictive gains—to the modern Transformer architecture.
- Scaling Laws: The industry moved from simple pre-training (next-token prediction) to Chinchilla scaling laws, which emphasize the compute-optimal ratio of model parameters to training data.
- The Shift to Reasoning: Recent breakthroughs (e.g., OpenAI’s o1) are driven by test-time compute and RLVR. Unlike pre-training, which is a "compression" of internet-scale data, RLVR allows models to "think" and self-correct, leading to emergent reasoning properties.
2. The Role of Software Engineering as a Frontier
The speaker argues that coding is the primary frontier for AI development for three reasons:
- Verifiable Rewards: Code can be compiled and tested against unit tests, providing a deterministic signal for reinforcement learning.
- Data Abundance: There is a massive volume of code tokens available on the internet.
- AGI Completeness: Coding is viewed as a general-purpose language for interacting with the world; if a model can write code to solve a problem, it can effectively perform almost any task.
3. Applied Compute: Enterprise Specialization
Yash founded Applied Compute to bridge the gap between "genius" general models and the specific needs of enterprises.
- The Problem: General models lack context regarding proprietary enterprise data and specific business logic.
- Case Study (DoorDash): DoorDash needed to extract menu data from unstructured images. General models failed to follow DoorDash’s specific style guides for modifiers and ingredients. Applied Compute solved this by training a Vision-Language Model (VLM) to optimize directly against a "ground truth" error rate, rather than relying on prompt engineering.
- Strategy: Enterprises should not wait for "GPT-7." Instead, they should use smaller, specialized models trained on proprietary data to achieve higher ROI, lower latency, and better performance for specific workflows.
4. Emerging Techniques and Future Bottlenecks
- Continual Learning: The next major hurdle is moving from static training to systems that learn from sparse, real-world interactions. The speaker cites Cursor’s "Composer" as an example, where the model learns from implicit user feedback (e.g., accepting or reverting code suggestions) in production.
- Compute Scarcity: The demand for compute is outpacing supply. Yash suggests that the future of the industry may involve vertical integration, where labs move into in-house chip design to optimize the hardware-software stack.
- Data Market: While synthetic data is becoming essential, the speaker notes that as models get smarter, the "hill-climbing" process for RL tasks becomes harder, requiring more sophisticated data collection methods (e.g., robotics, egocentric data).
5. Notable Quotes
- "Whenever you join a company, work on the hairiest thing that no one wants to work on, because people will like you for it." — Yash Bottle (on his early career strategy).
- "General models set the floor, but in order to set the ceiling, you need to go and build specialized systems." — Yash Bottle (on the value proposition of Applied Compute).
- "RL is kind of this eval-maxing machine." — Yash Bottle (explaining how reinforcement learning is used to optimize models toward specific benchmarks).
Synthesis
The transition in AI is moving away from "brute force" pre-training toward highly efficient, specialized post-training and reinforcement learning. The most successful applications will be those that treat AI as an "agent" capable of using tools and learning from verifiable feedback loops. While the Transformer architecture remains dominant due to massive infrastructure investment, the future of the field lies in continual learning and the ability of enterprises to create proprietary "data-to-reward" pipelines that differentiate them from competitors.
Chat with this Video
AI-PoweredLoad the transcript when you're ready to chat so the initial page stays lighter.