The $1B Al company training ChatGPT, Claude & Gemini on the path to responsible AGI

Key Concepts

Bootstrapped Company: A company that is funded by its founders' own money rather than external investment.
VC Money (Venture Capital): Funding provided by venture capital firms to startups and small businesses with perceived long-term growth potential.
AI Data Company: A company that provides data used to train and improve artificial intelligence models.
Supervised Fine-Tuning (SFT): A machine learning technique where a model is trained on a dataset of input-output pairs to learn a specific task.
Reinforcement Learning from Human Feedback (RLHF): A technique used to align AI models with human preferences by training them on human feedback.
Rubrics and Verifiers (Evaluations/Evals): Tools used to assess the quality and performance of AI models, often involving detailed feedback and grading.
Reinforcement Learning (RL) Environments: Simulated environments where AI models learn by trial and error, receiving rewards or penalties for their actions.
Objective Function: A mathematical function that an AI model aims to optimize during training.
AGI (Artificial General Intelligence): AI that possesses the ability to understand, learn, and apply knowledge across a wide range of tasks at a human-like level.
AI Slop: Low-quality, superficial, or misleading AI-generated content.
Dopamine vs. Truth: The idea that AI is being optimized to provide engaging, dopamine-inducing responses rather than truthful or accurate information.
Commoditization: The process by which products or services become indistinguishable from others in the market, leading to price competition.
Synthetic Data: Data that is artificially generated rather than collected from real-world events.
Benchmarks: Standardized tests or datasets used to evaluate and compare the performance of AI models.
Trajectory: The sequence of steps or actions taken by an AI model to reach a goal.
Forward Deployed Researchers: Researchers who work directly with customers to help them improve their AI models.
Internal Researchers: Researchers focused on developing new methods, benchmarks, and internal products for the company.

Surge AI: A Contrarian Approach to Building an AI Data Powerhouse

This summary details a conversation with Edwin Chen, founder and CEO of Surge AI, a company that has achieved remarkable success by defying conventional Silicon Valley norms. Surge AI has reached over $1 billion in revenue in under four years with a lean team of 60-70 people, all while remaining completely bootstrapped and profitable from day one. The discussion highlights Surge AI's unique philosophy on building impactful companies and developing truly beneficial AI.

Unprecedented Growth and the Lean Startup Model

Revenue Milestone: Surge AI achieved over $1 billion in revenue last year with fewer than 100 employees.
Lean Team Philosophy: Chen's experience at large tech companies led him to believe that smaller, elite teams could move faster and be more effective by minimizing distractions. Surge AI was built with this principle in mind.
Future of Company Building: The conversation posits that AI will enable even more extreme employee-to-revenue ratios (e.g., $100 million per employee) in the coming years. This efficiency shift will lead to companies founded by individuals with deep technical or product expertise rather than just pitching prowess.
Focus on Product Over Pitch: The reduction in capital needed due to smaller teams means a shift away from companies optimized for VC appeal towards those built by "tiny obsessed teams" focused on genuine innovation and products they care about. This is seen as a potential return to the "hacker" ethos of Silicon Valley.

The "Silicon Valley Game" and Word-of-Mouth Growth

Contrarian Strategy: Surge AI intentionally avoided the typical Silicon Valley playbook of constant social media promotion and public fundraising. Most people were unaware of Surge's rapid growth until its significant revenue milestone was revealed.
Rejection of VC Culture: Chen found the "Silicon Valley game" of constant pitching, PR, and fundraising to be "ridiculous." He contrasts this with the childhood dream of building a company from scratch and being deeply involved in the product.
Focus on Product Excellence: Surge AI's success was driven by building a "10x better product" and relying on word-of-mouth from researchers who understood the value of high-quality data.
Mission Alignment with Customers: Early customers were crucial, being those who deeply understood data and its impact on AI models. This close alignment fostered feedback and validation, leading to organic growth.

Defining and Delivering Data Quality

The Misconception of Data Quality: A core tenet of Surge AI's success is its understanding of data quality, which Chen argues is widely misunderstood. The common belief that "throwing bodies at a problem" yields good data is "completely wrong."
Beyond Superficial Checks: Using the analogy of writing a poem about the moon, Chen illustrates that true quality goes beyond simply meeting basic criteria (e.g., eight lines, mentioning "moon"). It involves depth, originality, emotional resonance, and intellectual stimulation – akin to Nobel Prize-winning poetry.
Measuring Nuance: Achieving this level of quality requires sophisticated technology to measure thousands of signals for each worker and task. This includes analyzing keyboard strokes, response times, code standards, and training models to assess output quality and improvement.
Two-Pronged Approach to Quality: Similar to Google Search, Surge AI focuses on both removing the "worst of the worst" (spam, low-quality content) and discovering the "best of the best" (highly skilled individuals and exceptional outputs).
Complex Machine Learning Problem: The process of measuring and ensuring quality is framed as a complicated machine learning problem, leveraging numerous signals to inform worker suitability and model improvement.

The Art of Post-Training and Model Behavior

Claude's Success: The conversation touches on why models like Claude excelled in coding and writing for an extended period.
Data Choices and Objective Functions: The superiority is attributed to multiple factors, including the vast choices in data selection (human vs. synthetic, specific domains like front-end vs. back-end coding, visual design emphasis) and the objective functions models are optimized for.
The Role of "Taste" and Sophistication: Chen emphasizes that post-training AI is an art, not just a science. The "taste" and sophistication of the individuals guiding the training process significantly influence the model's behavior and capabilities, going beyond fixed checkboxes.
Critique of Benchmarks: Chen expresses strong distrust in current AI benchmarks, citing flaws in the benchmarks themselves (wrong answers, messiness) and their tendency to have well-defined, objective answers that models can "hill climb" on, unlike the ambiguity of real-world problems. He uses the example of models excelling at IMO gold medals but struggling with PDFs.
Gaming Benchmarks: Labs may "game" benchmarks by tweaking evaluation methods or optimizing solely for benchmark performance rather than real-world utility.
Human Evaluation as the True Measure: Surge AI relies on human evaluations, where experts engage in deep conversations with models across various roles (physicist, teacher, coder) to assess accuracy, instruction following, and overall helpfulness. This is contrasted with casual users who may be swayed by superficial aspects.
AGI Timelines: Chen believes AGI is likely a decade or more away, emphasizing the significant effort required to move from high performance (e.g., 80%) to near-perfect performance (99.9%).

The Wrong Direction for AGI and the "AI Slop" Problem

Misguided AGI Development: Chen expresses concern that many AI labs are pushing AGI in the "wrong direction," optimizing for "AI slop" instead of advancing humanity by solving grand challenges like curing cancer or poverty.
Dopamine Over Truth: The models are being taught to "chase dopamine instead of truth," catering to superficial engagement rather than factual accuracy.
LM Arena Example: The LM Arena leaderboard is cited as an example of this problem, where models can hallucinate and use superficial elements (emojis, markdown) to gain votes from users who skim responses. This incentivizes models to be flashy rather than accurate.
Negative Incentives: PR pressures and the need to climb leaderboards create negative incentives for researchers, potentially leading to worse models despite higher benchmark scores.
Optimizing for Engagement: The trend of optimizing AI for engagement, similar to social media, is seen as leading to negative consequences like clickbait and the amplification of delusions.
Anthropic's Principled Approach: Anthropic is highlighted as a lab taking a more "principled view" on model behavior and objectives.
The "Bitter Lesson" and LLMs: Chen disagrees with the "bitter lesson" that LLMs might be a dead end. He believes that while LLMs are powerful, new learning mechanisms will be needed to achieve AGI, mirroring the diverse ways humans learn.

Reinforcement Learning and the Future of AI Training

RL Environments for Real-World Tasks: Reinforcement learning (RL) environments are presented as a crucial next step for AI development. These simulations mimic real-world scenarios, allowing models to learn complex, end-to-end tasks.
Beyond Isolated Benchmarks: RL environments expose the limitations of models that perform well on isolated benchmarks but fail in messy, ambiguous real-world situations with longer time horizons and interdependencies.
Designing for Reward: The goal is to train models to achieve specific rewards within these environments, whether it's fixing a website outage, analyzing financial data, or performing complex coding tasks.
Expert-Designed Environments: Experts, like financial analysts, can design these environments by creating spreadsheets, defining tools for the model to use, and setting specific goals and rewards.
Importance of Trajectories: Chen emphasizes the importance of observing "trajectories" – the step-by-step process a model takes to reach a solution. This reveals inefficiencies, reward hacking, and provides valuable learning opportunities beyond just the final outcome.
Evolution of Post-Training Methods: The progression of post-training methods is outlined:
1. SFT (Supervised Fine-Tuning): Mimicking a master.
2. RLHF (Reinforcement Learning from Human Feedback): Learning from preferences.
3. Rubrics and Verifiers (Evals): Learning from detailed feedback and grading.
4. RL Environments: Learning through simulated real-world experience.
Adapting to Lab Needs: Surge AI's business model is characterized by its adaptability to the evolving needs of AI labs, moving from data labeling to creating complex simulation environments.

Surge AI's Research Arm and Vision

Research-Driven Culture: Surge AI invests in its own research team, stemming from Chen's background as a researcher. This team focuses on both customer-facing solutions and internal advancements.
Forward Deployed Researchers: These researchers collaborate with customers to analyze their models, identify weaknesses, and design data sets, evaluation methods, and training techniques for improvement.
Internal Research Focus: Internal researchers work on developing better benchmarks and leaderboards to steer AI development in a more beneficial direction, and on training their own models to understand optimal data performance.
Beyond Revenue: Chen's personal drive is rooted in scientific curiosity and pushing the frontier of AI, viewing Surge AI more as a research lab than a typical startup. He prioritizes intellectual rigor and long-term impact over short-term metrics.
Shaping the Future of AI: Surge AI aims to play a critical role in shaping AI's future, leveraging its unique perspectives on data, language, and quality to ensure AI benefits humanity.

The Differentiating Personalities of AI Models

Increasing Differentiation: Chen predicts that AI models will become increasingly differentiated based on the values and objective functions of the labs that develop them.
Beyond Commoditization: Contrary to his initial belief, he now sees that company values significantly shape model behavior.
Optimizing for Time vs. Iteration: The example of drafting an email with Claude illustrates this: a model that optimizes for productivity by stopping at "good enough" versus one that endlessly iterates, consuming user time.
Values as a Differentiator: Just as Google, Facebook, and Apple build search engines differently based on their principles, future LLMs will exhibit distinct personalities and behaviors, reflecting their creators' values.

Underhyped and Overhyped Areas in AI

Underhyped: Built-in Products/Mini-Apps: Chen believes the integration of "mini-apps" or interactive artifacts within chatbots is underhyped. These could allow users to perform actions directly within the chat interface, enhancing functionality.
Overhyped: "Vibe Coding": The practice of generating code based on a general "vibe" is considered overhyped, with concerns that it could lead to unmaintainable systems in the long term.

The Genesis of Surge AI and Personal Motivation

Unique Background: Chen's background in mathematics, computer science, and linguistics, combined with his research experience at Google, Facebook, and Twitter, provided a unique foundation for Surge AI.
The Data Bottleneck: His research consistently highlighted the critical need for high-quality data, which was a significant bottleneck in training advanced AI models.
Post-GPT-3 Vision: The release of GPT-3 in 2020 solidified his belief that new solutions were needed for complex AI use cases beyond simple tasks like image labeling.
Scientific Curiosity: Chen's primary motivation remains scientific curiosity and the desire to understand the universe, language, and communication. He finds deep satisfaction in analyzing new models and contributing to the advancement of AI.
Research Lab Mentality: He views Surge AI as a research lab, prioritizing curiosity, long-term incentives, and intellectual rigor over short-term financial gains.

The Deeper Mission: Shaping Humanity's Future

Beyond Training and Evaluation: Surge AI's mission extends beyond simply training and evaluating AI. It involves helping customers define their "dream objective functions" – the ultimate goals they want their AI to achieve.
Complex Objective Functions: Defining and measuring these complex objectives (e.g., "advancing humanity," "making life richer") is challenging, unlike simpler proxies like clicks or likes.
The "You Are Your Objective Function" Principle: Chen advocates for optimizing towards complex, meaningful objective functions rather than simplistic proxies that can lead to negative outcomes like increased laziness or time consumption.
Raising Humanity's Children: The work of Surge AI is framed as akin to "raising humanity's children" – teaching AI values, creativity, and what makes something good and beautiful, rather than just feeding it information.

Advice for Founders and the Future of AI

Build What Only You Can Build: Chen advises founders to focus on building something unique that stems from their own experiences and interests, rather than chasing trends or pivoting constantly.
Embrace Research and Quality: He learned that building a successful company doesn't require constant hype or fundraising; focusing on deep research and building an amazing product can cut through the noise.
The Importance of Values: Companies are embodiments of their CEOs' values. Following personal values and long-term goals is crucial for building impactful companies.
The Future of Data Labeling: The term "data labeling" is seen as simplistic. Surge AI's work is more akin to nurturing AI's development, teaching it nuanced concepts and values.
Recommended Reading:
- "Story of Your Life" by Ted Chiang (basis for the movie "Arrival")
- "The Myth of Sisyphus" by Albert Camus
- "Le Ton beau de Marot" by Douglas Hofstadter
Favorite Media: Sci-fi books and movies involving scientists deciphering alien communication, such as "Travelers" and "Contact."
Product Discovery: Whimo (driverless cars) is highlighted as a product that exceeded expectations.
Life Motto: Build a company that only you could build, driven by personal values and interests.
Call to Action: Surge AI is hiring for individuals passionate about data, math, language, and computer science. They are also interested in hearing about interesting AI failures and blog topic suggestions.

The $1B Al company training ChatGPT, Claude & Gemini on the path to responsible AGI | Edwin Chen