How to use Karpathy's Autoresearch to 10x Claude

Key Concepts

Auto Research: A framework introduced by Andre Karpathy that enables AI agents to autonomously optimize their own skills and processes through iterative testing.
Optimization Loop: A cyclical process where an AI establishes a baseline, generates a hypothesis, tests it, evaluates the outcome, and decides whether to keep or discard the change.
Sub-Agents: Specialized AI roles within the framework, including the Test Runner (executes the task) and the LLM Judge (evaluates performance when deterministic scripts are insufficient).
Deterministic vs. Subjective Criteria: The distinction between hard, rule-based constraints (e.g., character counts) and creative/nuanced requirements (e.g., tone of voice, storytelling).
Human-in-the-Loop: A safety mechanism where users review or approve AI-driven changes before they are finalized.

1. The Auto Research Framework

The framework is designed to make AI agents self-improving. By applying it to specific "skills" (such as LinkedIn writing, email sequences, or knowledge management), users can move beyond static prompts to dynamic, evolving systems.

Mechanism: The system uses a loop:
1. Baseline Establishment: A sub-agent runs tests to score the current performance.
2. Hypothesis Generation: The main agent proposes a change to improve the score.
3. Execution: A test runner applies the change.
4. Evaluation: Either a script (deterministic) or an LLM Judge (subjective) determines if the change improved the outcome.
5. Decision: The system keeps or discards the change based on the evaluation.

2. Real-World Applications

LinkedIn Content: Optimizing hooks (under 136 characters), ensuring bullet points use hyphens, and incorporating personal stories.
Knowledge Management: Optimizing "Cloud MD" (second brain) for better folder routing, wiki-link creation, and information retrieval.
Marketing/Sales: Automating A/B testing for landing page copy (H1 tags) and email subject lines to improve Click-Through Rates (CTR) and open rates.

3. Methodology for Effective Optimization

To ensure the AI improves rather than degrades, the following best practices are recommended:

Criteria Formulation: Criteria must be binary (True/False). Avoid vague goals; use specific constraints (e.g., "The first line must be under 136 characters" instead of "Make the hook short").
Variable Isolation: Test only one variable per criteria. If a goal requires multiple conditions, split them into separate, testable criteria.
Iteration Limits: Limit loops to 5–10 iterations. The speaker notes that performance can degrade if the AI is allowed to iterate indefinitely (e.g., beyond 15 iterations).
Data-Driven Optimization: Feed the AI real-world data (e.g., top-performing vs. worst-performing posts) to help it identify patterns and generate more effective hypotheses.

4. The Three-Level Framework for Subjective Tasks

For creative tasks like copywriting, the speaker suggests a tiered approach to criteria:

Hard Rules: Clear-cut, deterministic best practices (e.g., character limits, sentence length).
Pattern-Based Nuance: Using an LLM Judge to ensure the output matches a specific tone or follows a framework (e.g., PAS or AIDA).
Real-World Feedback: Using scheduled tasks to scrape performance data and adjust the strategy weekly based on actual engagement metrics.

5. Implementation Steps

Setup: Download the Auto Research framework from the official GitHub repository.
Integration: Use platforms like Claude Code or Cloud Co-work to provide the AI access to the framework files.
Adaptation: Use an LLM to adapt the machine-learning-focused framework for specific business tasks or personal skills.
Automation: For advanced users, set up Scheduled Tasks to allow the AI to scrape performance data weekly, adjust variables, and iterate autonomously toward a specific goal (e.g., reaching 250 average likes per post).

6. Notable Quotes

"The skill or the process you're optimizing with Auto Research will only become as good as the criteria you define."
"The real unlock is using Auto Research specifically to optimize processes and skills that have some subjectivity or creativity embedded."

Synthesis

The Auto Research framework represents a shift from static AI assistance to autonomous AI optimization. By combining deterministic "hard rules" with LLM-based "subjective judges," users can refine complex, creative workflows. While the framework is powerful, its success relies heavily on the quality of the criteria defined and the ability to provide the AI with relevant, real-world performance data. Users are cautioned to start with clear, rule-based tasks before moving to autonomous, scheduled optimization loops to maintain control over the AI's output.