Inside the expert network training every frontier AI model

Key Concepts

Data Labeling: The process of identifying and labeling raw data (text, images, audio, etc.) to train AI models.
Pre-training: The initial phase of training an AI model on a massive dataset of general knowledge.
Post-training: The phase of refining an AI model's capabilities in specific domains using high-quality, targeted data.
Reinforcement Learning with Human Feedback (RLHF): A post-training technique where human preferences are used to guide the model's learning process.
Fine-tuning: A post-training technique where a pre-trained model is further trained on a smaller, domain-specific dataset.
Trajectories: A comprehensive record of a user's interaction with a system, including screen recordings, mouse movements, and voiceovers.
Rubrics: A set of criteria used to evaluate the quality of a model's output, often used in auto-evaluation systems.
ASI (Artificial Superintelligence): A hypothetical level of AI that surpasses human intelligence in all aspects.
Moat: A competitive advantage that protects a company from competitors.
CAC (Customer Acquisition Cost): The cost of acquiring a new customer.
LTV (Lifetime Value): The total revenue a customer is expected to generate over their relationship with a company.

What is Data Labeling and Why is it Valuable?

Training Models: Training a model involves pre-training and post-training.
Pre-training: Involves feeding the model a vast amount of information, like the entire internet, to learn knowledge, facts, and reasoning.
Post-training: Focuses on improving the model's capabilities in specific areas like coding, math, law, and finance by collecting high-quality data.
Asymptotic Gains: About 18-24 months ago, gains from pre-training started to diminish as models had absorbed most of the internet's knowledge.
Shift to Post-training: Labs shifted focus to post-training, augmenting and improving data across various disciplines.
Scientific Process: Research teams run experiments to test hypotheses on how to improve the model, collecting small data pieces to validate their ideas.
Data Types: Post-training data can include reinforcement learning environments, trajectories, audio, multimodal data, text-based prompt-response pairs, and preference ranking data (RLHF).
Demand: There's high demand to stay at the frontier of model development, driving the need for high-quality post-training data.

Types of Post-Training

Reinforcement Learning (RLHF): Preference ranking, where models learn from human feedback on which responses are preferred.
Fine-tuning: Training models on specific datasets to improve performance in particular tasks.
Trajectories: Collecting data on how users interact with tools and solve problems, including screen recordings, mouse movements, and voiceovers.
Expertise: The models have gotten so good that the generalists are no longer needed. What they really need is experts.

Handshake's Unique Proposition

Engaged Audience: Handshake has a large, engaged audience of 18 million professionals, including 500,000 PhDs and 3 million master's students.
Hyper-targeting: The platform can hyper-target users based on their academic knowledge (chemistry, math, physics, biology, coding).
Untapped Knowledge: Handshake can access parts of human knowledge that haven't been available on the internet.
Expert Focus: The company focuses on experts, as models now require specialized knowledge rather than generalist input.
Economically Valuable Capabilities: Model builders focus on economically valuable areas like advanced STEM, science, math, accounting, law, medicine, and finance.
Data Creation: Handshake creates new data and identifies model weaknesses.
Breaking Models: PhDs can identify where models fail in reasoning or ground truth, which is difficult for the average person.

What Data Labelers Actually Do

GPQA Example: Labelers break the model, provide the correct answer, and provide step-by-step reasoning.
Model Failure: They prove where the model fails, even if it gets the right answer, by identifying incorrect steps.
Branding Experience: Handshake treats PhD students like experts, providing a community and instructional design team to help them use the tools.
Data Creation: Labelers create data, and Handshake assesses the data's quality and potential gain in specific areas.
Real-world Tool Use: Modelers want real-world tool use and trajectory-based data.
Education Example: A PhD in education interacts with models in educational design, identifying incorrect issues and helping models understand educational design.
Professional Domains: People narrate their step-by-step tool use, screen recording their mouse movements and problem-solving processes.
JSON Data: The output of this work is typically JSON data.
Rubrics: Models can act as judges, evaluating responses based on rubrics (e.g., what makes a good educational design or MRI result).

Quality, Volume, and Speed

Model Builder Priorities: Model builders care about quality, volume, and speed.
Quality: High-quality data is essential to avoid training models with incorrect information.
Volume: Generating thousands of data pieces in advanced domains is crucial.
Speed: Quick turnaround is needed to test hypotheses and scale successful pipelines.
Assessment: Handshake uses technology to assess each data unit, with its own post-training teams and GPUs.
Collaboration: Handshake collaborates with researchers to share insights and improve models.

The Future of Work and AI

GDP Growth: The speaker believes AI will improve and accelerate human productivity and impact.
Employer Feedback: Employers aren't saying young people won't have jobs; instead, AI enables individuals to be more productive.
AI Native: Young people are at an advantage because they are AI native and understand how to leverage these tools.
Fellow Insights: Fellows bring insights into the classroom and use AI to advance their research.
Paid Learning: Fellows can earn high hourly rates ($150-$200) while learning and improving models.

Key Takeaways on Data Labeling

Human-in-the-Loop: Humans will be needed in the model training process for the next decade until full ASI is achieved.
Model Weaknesses: Experts are needed to identify and correct model weaknesses.
Evolving Data Types: The types of data needed will evolve, but human input will remain crucial.
Model Improvement: The model of today is the worst model you will ever use, as they are constantly improving.

Handshake's Competitive Advantage

Access to Audience: The only moat in human data is access to an audience.
Trust and Brand Affinity: Handshake has built a decade of trust with 18 million users.
Targeting: The platform can effectively target users based on academic performance and interests.
Marketplace Resonance: The competitive advantage of access to an audience resonates in the marketplace.

How Handshake's AI Business Emerged

Natural Extension: It's a natural extension of helping people start, restart, or jumpstart their careers.
Middleman Companies: Middleman companies were recruiting Handshake's PhDs and master's students.
Frustrating Experience: Users found the experience on other platforms frustrating (transactional, payment issues, drop-off).
Direct Outreach: Frontier labs started reaching out directly, cutting out the middleman.
Expert-First Platform: Handshake believed there needed to be an expert-first platform for advancing AI.
Timeline: The company entered the business in January, building the platform and monetizing relationships about 5 months later.
Current Status: Handshake is working with seven frontier labs and experiencing rapid growth.
Revenue: Hit $50 million in revenue in four months and is on track to exceed $100 million in the first year.

Handshake's Original Business

Network: The original business is a network of 18 million students and alumni.
Revenue: The original business does about $200 million in revenue.
Unconnected Graph: Handshake is an unconnected graph, focusing on discovery and exploration rather than existing connections.
Social Platform: It's a social platform with groups, messaging, profiles, and short-form video.
Early Career Focus: It helps young people build confidence and find their first and second jobs.
Timeframe: The original business has been around for 10 years.

Building a New Business Within an Existing Company

Hard to Incubate: It's hard to incubate something new inside a large company.
Strategic Advantage: Handshake has a strategic advantage with no customer acquisition costs and high conversion rates.
Expert-Based Data: The market has shifted from low-cost generalists to experts, and Handshake is set up for expert-based data.
Noticing the Opportunity: The company noticed model companies were coming to their people and that people were having hard times with other companies.
Exploring the Idea: They explored the idea by working with middleman companies, seeing direct outreach from labs, and realizing they could build a better experience.
Learning Curve: They accelerated the learning curve by hiring expert firms and doing calls with researchers.
Focus: They focused on delivering high-quality data to one customer before expanding.
Separate Teams: Separate engineering, design, accounts, operations, and finance teams were created.
Dedicated People: People were dedicated to the new business with no responsibilities in the existing part of the business.
Metrics-Oriented: The new business was more focused on data, metrics, and rigor from an early stage.
Founder Mode: The CEO was heavily involved, taking the lead and not delegating.
Different Culture: There was a different culture with a 24/7 expectation and a focus on ownership.
Leave Nothing to Chance: The motto was "leave nothing to chance," emphasizing the importance of execution.

Key Elements for Success

Founder Mode: The CEO was heavily involved and dedicated.
Dedicated People: People were dedicated to the new business with no other responsibilities.
Separate Teams: Separate teams were created for engineering, design, etc.
Metrics-Based Cadence: A metrics-based cadence was used to track progress.
Entrepreneurial Talent: The company hired entrepreneurs and people comfortable with ambiguity.
Upfront Communication: They were upfront about the chaotic nature of the new business.
Ownership: They emphasized ownership and celebrated impact.
Trust: They focused on building trust in the quality and consistency of their data.

The Future of Data and Models

Evolving Data Types: The types of data needed will evolve (CAD files, scientific tool use data, multimodal data).
Synthetic Data: Synthetic data has a role to play, but it won't dominate.
Value Extraction: There are billions of dollars of value to extract in following the frontier of AI development.

Advice for Entrepreneurs

Meaningful Work: Focus on doing something meaningful that helps people.
Societal Problem: Try to solve a societal problem that matters.

Lightning Round

Recommended Books: Zero to One by Peter Thiel, Shoe Dog, The Hard Thing About Hard Things.
Recent Movie/TV Show: Game of Thrones.
Favorite Product: The Snoo baby automated bassinet.
Life Motto: "Leave nothing to chance."
Early Hustle Story: Showering in Princeton's pool to save money.

Conclusion

Handshake's story is a remarkable example of how an established company can leverage its existing assets and expertise to capitalize on the opportunities presented by AI. By focusing on high-quality, expert-driven data labeling and fostering a culture of innovation and ownership, Handshake has successfully disrupted itself and positioned itself for continued growth and impact in the rapidly evolving AI landscape. The key takeaways from this story include the importance of focusing on meaningful work, building trust, and embracing a culture of continuous learning and adaptation.

Inside the expert network training every frontier AI model | Garrett Lord