I Built Music You Can Talk To

By Volo Builds

Share:

Building an AI-Powered Music Creation Tool with Strudel & Voice Control

Key Concepts:

  • Strudel: A live coding platform for creating music using code-like patterns.
  • OpenRouter: A platform for accessing multiple AI models (GPT, Gemini, Anthropic) through a single API.
  • Web Speech API: A browser API enabling voice recognition and text transcription.
  • Live Coding: The practice of writing and modifying code in real-time to generate music or visuals.
  • AI Prompt Engineering: The process of crafting effective prompts to guide AI models towards desired outputs.
  • Uklitian Rhythm: A rhythmic pattern used in the example, specifying note durations and timing. (Specific details of this rhythm are not fully explained in the transcript).
  • Firebase: A backend-as-a-service platform (used initially, then replaced).
  • Cloudflare Workers KV: A lightweight database solution used for storing song data.

1. Project Overview & Motivation

The creator embarked on building an AI-powered music creation tool after being inspired by a YouTube video showcasing Strudel, a live coding platform. Initially intimidated by Strudel’s syntax, the creator envisioned a system where musical ideas could be expressed in natural language, with AI translating those ideas into functional Strudel code. The ultimate goal was voice control, allowing for real-time music manipulation through spoken commands. The project evolved from a proof-of-concept using Claude to a fully functional application built with Cursor, incorporating voice recognition and a shareable song feature. The creator notes a personal history with music, starting with guitar and transitioning to electronic music production, but finding time for it challenging with increasing responsibilities.

2. System Architecture & Workflow

The system operates on a four-stage process:

  1. Speech-to-Text: User input (voice commands) is transcribed into English text using the Web Speech API.
  2. AI Translation: The English text is sent to an AI model (initially Claude, later Gemini 2.5 Flash via OpenRouter) which translates it into Strudel code. This relies on a highly detailed prompt containing the Strudel syntax and desired musical actions.
  3. Strudel Execution: The generated Strudel code is executed by the Strudel platform, producing the corresponding audio output.
  4. Visualization: A HAL 9000-inspired visualization reacts to both the audio output and the voice input, providing visual feedback.

The initial design considered using Firebase for backend storage, but this was replaced with Cloudflare Workers KV for its lightweight nature and integration with the existing Cloudflare infrastructure.

3. Strudel & Live Coding Fundamentals

Strudel is described as a live coding platform where music is created by writing code-like patterns. The transcript demonstrates basic Strudel operations:

  • Adding instruments (kick, snare, high hat, bass, chords).
  • Modifying instrument sounds (e.g., changing a snare).
  • Adjusting rhythmic patterns (e.g., double time breakdown).
  • Muting instruments.

The creator highlights the complexity of Strudel, noting that memorizing patterns is challenging, justifying the need for a natural language interface. An example is shown of a pre-existing Strudel pattern generating jazzy chords, a bassline, and a beat, demonstrating the platform’s capabilities.

4. AI Integration & Prompt Engineering Challenges

The core challenge was enabling the AI to understand and generate valid Strudel code. This required extensive prompt engineering:

  • Initial Proof of Concept (Claude): Claude was used to demonstrate the feasibility of translating natural language into Strudel code. A prompt was crafted to instruct Claude to research Strudel and create a basic song. The resulting code, while functional, required refinement.
  • AI App Builders (Lovable) & Code Editors (Cursor): The creator initially attempted to use Lovable, an AI app builder, but found it insufficient for controlling the Strudel player. Cursor, an AI code editor, provided more granular control but still required significant manual intervention.
  • The "Thousands of Lines" Prompt: The final solution involved a massive, highly detailed prompt containing the entire Strudel documentation and specific instructions on code generation. This prompt was iteratively refined based on errors and unexpected outputs.
  • Accessing Documentation: A key obstacle was the AI’s inability to access Strudel’s documentation during code generation. The solution involved directly copy-pasting the documentation into the prompt.

5. Implementation Details & Technologies

  • OpenRouter: Used to access various AI models (Gemini 2.5 Flash proved fastest) through a unified API.
  • Web Speech API: Enabled voice control by transcribing spoken commands into text.
  • Cloudflare Workers & KV: Used for deploying the application and storing song data (shareable links).
  • Cursor: The primary AI code editor used for building the application.
  • HAL 9000 Visualization: A custom visualization inspired by the HAL 9000 computer, reacting to both audio and voice input. This was achieved by overriding some of Strudel’s internal functions, a “hacky” but effective solution.

6. Notable Quotes

  • “I thought I was going to be a rock star.” – Reflecting on the creator’s initial musical aspirations.
  • “It’s a lot more difficult than it would seem because the AI models don’t already know about the strudel syntax.” – Highlighting the challenge of AI understanding specialized code.
  • “I had to explain it to the AI in one of the longest prompts I’ve ever written. Seriously, it’s like thousands of lines long.” – Emphasizing the complexity of prompt engineering.
  • “When you’re coding with AI, make sure to keep an eye open to ensure that they can actually access the documentation for the libraries that you’re working with.” – A crucial lesson learned during development.
  • “It’s actually working end to end with the text.” – Expressing excitement upon achieving functional voice control.

7. Data & Statistics

  • The prompt used to guide the AI was described as “thousands of lines long.”
  • Gemini 2.5 Flash was identified as a faster AI model for generating code.
  • The project utilized Cloudflare Workers KV as a lightweight database solution.

8. Demonstration & Functionality

The creator demonstrates the application’s functionality through a series of voice commands:

  • Removing all instruments except the high hat.
  • Adding a kick and snare.
  • Adding a bass playing E2 in a specific rhythm.
  • Modifying the bass to be “fat and resonate.”
  • Adding chords.
  • Muting instruments.
  • Initiating a double-time breakdown.

9. Conclusion & Takeaways

The project successfully demonstrated the feasibility of building an AI-powered music creation tool using Strudel and voice control. Key takeaways include:

  • Prompt engineering is critical: Effective prompts are essential for guiding AI models to generate desired outputs, especially in specialized domains like code generation.
  • AI access to documentation is crucial: Ensuring AI models can access relevant documentation is vital for accurate and reliable code generation.
  • Iterative development is key: The project involved significant iteration and refinement, particularly in prompt engineering and code integration.
  • Combining technologies can unlock new possibilities: Integrating Strudel, OpenRouter, the Web Speech API, and Cloudflare Workers enabled a unique and powerful music creation experience.
  • The potential for AI-assisted music creation is significant: The project showcases the potential for AI to democratize music production and empower users to express their musical ideas in new ways.

The final application allows users to create and share music through simple voice commands, offering a glimpse into the future of AI-assisted music creation.

Chat with this Video

AI-Powered

Hi! I can answer questions about this video "I Built Music You Can Talk To". What would you like to know?

Chat is based on the transcript of this video and may not be 100% accurate.

Related Videos

Ready to summarize another video?

Summarize YouTube Video