You can't just one shot it — Mehedi Hassan, Granola
By AI Engineer
Key Concepts
- Product Engineering: The intersection of software development and user experience, focusing on shipping features that solve specific user problems.
- LLM Tracing: The process of monitoring and logging the internal reasoning, tool calls, and data flow of Large Language Models to demystify "black box" behavior.
- Web Shell Architecture: Converting a desktop application (Electron) into a web-accessible format to enable rapid testing, CI/CD preview links, and parallel feature experimentation.
- IPC (Inter-Process Communication) Abstraction: Decoupling system-level APIs from the front-end to allow the application to run in both desktop and web environments.
- Feedback Loops: The iterative process of testing, observing, and refining AI outputs to ensure they align with user expectations.
1. Challenges in LLM Integration
The speaker highlights that while "one-shotting" (prompting an LLM to perform a task without prior examples) is easy to implement, it often fails in production environments.
- Generic Limitations: Simple chatbots often struggle with context, such as confusing professional meeting notes with personal interests (e.g., football coaching).
- Web Search Complications: Relying on built-in LLM web search tools is problematic due to:
- Cost: Token usage can spike, making queries expensive at scale.
- Reliability: External model updates can degrade search performance without warning.
- Control: Developers lack granular control over how search results are retrieved and processed.
- Output Variability: Different user roles (Sales vs. Engineering vs. HR) require different output formats, which a single, static prompt cannot satisfy.
2. Building Internal Tracing Tools
To move away from treating LLMs as "black boxes," the team at Cronulla built custom observability tools:
- Visibility: The tools provide a full audit trail of agent loops, including why specific tools were called, the reasoning process, and the associated costs.
- Accessibility: The UI is designed for non-engineers (Product, Data, CX teams), allowing them to debug failures without needing to query raw logs in services like CloudWatch.
- Methodology: By saving data to a database and wrapping the AI SDK, the team can inspect the entire lifecycle of a request, enabling them to identify exactly where an output went wrong and iterate accordingly.
3. Architectural Shift: Electron to Web Shell
A major hurdle for Cronulla was the friction of testing desktop-only features. To solve this, they transformed their Electron app into a web-compatible shell:
- The Process: They abstracted IPC APIs (system-level) and React APIs (routers, sessions) to fall back to web standards when the app is running in a browser.
- Benefits:
- CI/CD Efficiency: Every Pull Request generates a preview link, allowing for rapid, parallel testing.
- Automated Verification: Using tools like Cursor, the team can automate testing where the AI verifies its own work and uploads screenshots to PRs.
- User Experience: This allows the team to test multiple variants of a feature in practice rather than relying on static Figma designs.
4. Key Arguments and Philosophy
- The "Tennis Game" Analogy: The speaker argues that successful AI product engineering is not about writing the perfect prompt once; it is about creating a tight feedback loop between the developer and the LLM.
- Product Philosophy: The core goal is to ensure AI features "don't get in the way." The product should feel like "magic" rather than a complex, unpredictable tool.
- Control over Convenience: While SaaS providers offer easy integrations, the team prioritizes building their own infrastructure (like the tracing tool) to maintain control over the user experience and cost.
5. Synthesis and Conclusion
The main takeaway is that the "magic" of a high-quality AI product is not found in a single, clever prompt, but in the engineering infrastructure surrounding the model. By building custom tracing tools to demystify LLM behavior and re-architecting their desktop app to support web-based testing, the Cronulla team has created a robust framework for rapid iteration. This approach allows them to maintain high conviction in their product releases, ensuring that the AI output is tailored to specific user needs while remaining performant and reliable.
Chat with this Video
AI-PoweredLoad the transcript when you're ready to chat so the initial page stays lighter.