The Powerful Alternative To Fine-Tuning
By Y Combinator
Poetic: Recursively Self-Improving AI & The Future of LLM Applications
Key Concepts:
- Recursive Self-Improvement (RSI): AI systems improving their own intelligence, a core goal in AI research.
- Harness: A system built on top of Large Language Models (LLMs) to enhance performance beyond the base model’s capabilities.
- Poetic Meta System: Poetic’s proprietary RSI system that generates these “harnesses.”
- Frontier Models: Leading LLMs like GPT-4, Gemini, and Claude.
- Bitter Lesson: The observation that scaling compute and data consistently outperforms hand-engineered solutions in machine learning.
- Context Engineering/Stuffing: Optimizing prompts and providing relevant information to LLMs to improve output.
- ARC (AI Reasoning Challenge) & Humanity’s Last Exam: Benchmarks used to evaluate AI reasoning and problem-solving abilities.
I. Introduction to Poetic & the Problem Space
Ian Fischer, co-founder and co-CEO of Poetic, discusses the company’s mission: building recursively self-improving AI reasoning systems for LLMs. He highlights a critical challenge for startups building on LLMs – the constant need to re-fine-tune models as new, more powerful frontier models are released. Fine-tuning is expensive (millions to hundreds of millions of dollars) and ultimately temporary, as each new model release renders previous efforts obsolete. Poetic aims to provide a solution that allows developers to consistently outperform “out-of-the-box” LLMs without the ongoing cost and effort of fine-tuning. Fischer emphasizes the importance of experimentation with AI, stating, “You should just try things and every day do something with AI.”
II. Poetic’s Approach: Recursive Self-Improvement & Harnesses
Poetic differentiates itself from traditional approaches like Reinforcement Learning (RL) and context engineering by focusing on recursive self-improvement. While others attempt RSI by retraining LLMs from scratch (a costly and time-consuming process), Poetic’s approach is significantly faster and cheaper. The core of Poetic’s technology is the “Poetic Meta System,” which automatically generates “harnesses” – systems comprised of code, prompts, and data – that sit on top of existing LLMs. These harnesses consistently outperform the underlying models.
A key benefit is compatibility with future model releases. When a new frontier model is released, the same harness can be applied, providing an immediate performance boost without requiring any modifications or retraining. Fischer explains, “working with our systems means that I will always have the thing that is better than the thing that's out of box.” This effectively “vaccinates” users against the “bitter lesson” – the tendency for scaling to outperform hand-engineered solutions.
III. Performance & Validation: Benchmarks & Cost Efficiency
Poetic has demonstrated its capabilities through strong performance on challenging AI benchmarks.
- ARC AGI V2: Poetic achieved results exceeding Gemini 3 Deep Think (54% vs. 45%) at a significantly lower cost ($32 per problem vs. $70+).
- Humanity’s Last Exam: Poetic achieved 55%, surpassing Anthropic’s Claude Opus 4.6 (53.1%) – a benchmark designed to be difficult even for PhDs.
- Cost: The Humanity’s Last Exam run was completed for under $100,000, a fraction of the cost associated with training foundation models (hundreds of millions of dollars).
These results demonstrate Poetic’s ability to deliver substantial performance gains at a fraction of the cost of traditional methods. The company currently operates with a team of only seven research scientists and engineers.
IV. The Shift from Manual Optimization to Automated Systems
Fischer highlights a shift in AI development from manual prompt engineering and data collection to automated systems. Historically, developers would spend significant time collecting large datasets and fine-tuning models. Poetic’s approach automates this process, allowing the AI itself to understand the data, identify failure modes, and develop robust reasoning strategies. He notes that the prompts generated by Poetic’s system are often unconventional and wouldn’t necessarily be created by a human. This underscores the potential for AI to discover novel solutions beyond human intuition. He states, “historically in machine learning you always have to know your data set really well but now we're kind of outsourcing that to the AI itself.”
V. Poetic vs. RL & the Future of AI Development
Poetic’s approach is presented as a distinct paradigm from Reinforcement Learning (RL). While RL has its place, Poetic’s recursive self-improvement system offers a different S-curve of improvement. Each model and the Poetic system itself will have its own S-curve, continually shifting higher as both the underlying models and the meta-system evolve. Fischer suggests this could ultimately lead to Artificial General Intelligence (AGI) or even superintelligence. He jokes that Poetic’s “stilts” might allow them to “hit the ceiling first.”
VI. From Portable to Poetic: Fischer’s Journey
Fischer recounts his journey from founding a mobile app development company (Portable), which was acquired by Google, to becoming an AI researcher at Google and DeepMind. He initially explored robotics but quickly became passionate about machine learning. This transition reflects a broader trend of engineers shifting towards AI and recognizing its transformative potential.
VII. Call to Action & Early Access
Poetic is currently offering early access to its platform. Startups and companies facing challenging AI problems are encouraged to sign up at poetic.ai to learn more and potentially collaborate. Fischer emphasizes that Poetic aims to empower developers to build on top of the latest LLMs and achieve results comparable to leading AI labs like Anthropic and Google.
Technical Terms:
- LLM (Large Language Model): A type of AI model trained on massive amounts of text data, capable of generating human-like text.
- YC (Y Combinator): A startup accelerator program.
- Fine-tuning: Adjusting the parameters of a pre-trained LLM using a smaller, task-specific dataset.
- Prompt Engineering: Designing effective prompts to elicit desired responses from LLMs.
- Compute: The computational resources required to train and run AI models.
- Open Weights Model: An LLM whose parameters are publicly available.
- Verification: The process of evaluating the accuracy of an AI model’s responses.
- Agentic Company: A company that leverages AI agents to automate tasks and processes.
- RNN (Recurrent Neural Network): A type of neural network often used for sequential data processing.
This summary aims to provide a detailed and accurate representation of the YouTube video transcript, preserving the original language and technical precision. It focuses on actionable insights and specific details, offering a comprehensive overview of Poetic’s technology and its potential impact on the future of AI development.
Chat with this Video
AI-PoweredHi! I can answer questions about this video "The Powerful Alternative To Fine-Tuning". What would you like to know?