$6.6B AI CEO: How to Make Your First $10,000 with AI | Mati Staniszewski from ElevenLabs
By Silicon Valley Girl
Key Concepts
- 11 Labs: A $6.6 billion leader in voice AI, co-founded by Mari.
- Voice AI: Artificial intelligence focused on speech synthesis, recognition, and interaction.
- Voice Marketplace: An ecosystem where individuals can clone their voices and earn passive income from their usage.
- Voice Agents: AI-powered conversational interfaces designed to handle tasks like customer support, sales, and product navigation.
- Authenticated AI: AI-generated content or interactions that are verified as legitimate and permissioned, often through watermarking or device-level encoding.
- Impersonation Safeguards: Multi-layered systems designed to detect and prevent the misuse of voice cloning technology.
- Domain Expertise + AI: The combination of specialized knowledge in a particular field with AI tools, identified as a high-value skill for the future workforce.
- Black Forest Labs, Anthropic's Claude Code, Lovable: Recommended AI tools for image generation, coding assistance, and prototyping/go-to-market strategies, respectively.
- SMB Voice Agent Deployment: A significant entrepreneurial opportunity involving the deployment of AI voice agents for small and medium-sized businesses.
- User Problem Obsession: A core principle for entrepreneurship, emphasizing a deep understanding of user needs and burning problems.
11 Labs: A Leader in Voice AI and its Impact
Mari, CEO and co-founder of 11 Labs, highlights the company's rapid growth to a $6.6 billion valuation, positioning it as a leader in the voice AI space. 11 Labs is actively shaping how people communicate, work, and generate income through its advanced voice technology. A key innovation is their voice marketplace, which allows individuals to clone their voices and earn passive income. The company has already paid $5 million in royalties to its community of voice creators.
The Evolving Role of Voice in AI Interaction
The discussion emphasizes a significant shift in AI interaction, moving beyond text-based interfaces like ChatGPT (prominent in 2023) towards powerful voice capabilities. Mari asserts that voice will be one of the key interfaces to technology, enabling the transfer of more information than text alone. Voice AI can convey emotionality, inflection patterns, and even imperfections, leading to a richer input for technology and a more pleasurable user experience.
Voice AI in Business Transformation
Voice AI is transforming various business functions:
- Customer Support: Voice agents are replacing traditional IVR (Interactive Voice Response) systems, offering quicker, more understanding, and generally better customer interactions.
- Product Navigation: Voice agents act as "partner programmers" or "product persons," guiding users through product experiences, similar to how chat widgets once functioned.
- Sales (Inbounding and Outbounding): 11 Labs uses its own voice agents internally to accelerate its sales pipeline. These agents help potential customers understand product offerings, pricing, and use cases, leading to self-qualification or faster conversion. For "business tier" clients, immediate conversion is possible, while "enterprise tier" still requires human KYC checks. The company has observed a significant increase in converted leads that would have otherwise taken weeks or months to acquire.
Setting Up Voice AI for Businesses
Implementing 11 Labs' voice AI involves:
- Platform Registration: Businesses can register on the 11 Labs platform.
- Agent Platform Offering: This abstracts the complexity of connecting speech, LLM (Large Language Model) elements, and text-to-speech, ensuring low latency and reliability.
- Business Logic Integration: Businesses must integrate their specific knowledge base, FAQs, and desired questions into the platform.
- Workflow Configuration: Users can set up predefined workflows (e.g., "if this happens, then this happens") for tasks like appointment scheduling, which can integrate with existing calendars.
Specific Use Cases and Features:
- Selling Courses: Voice agents can facilitate course sales, offering omnichannel solutions (sending links, email follow-ups) or embedding agents directly on websites to guide users through subscription and checkout processes.
- Multilingual Support: A key feature is the ability to switch languages while maintaining the user's cloned voice, which is particularly beneficial for language learning businesses.
- Cost: For small businesses, the cost is estimated to be in the hundreds of dollars per month, depending on volume.
- IP Calling Integration: The system integrates with existing telephone systems like Twilio, allowing businesses to use their current phone numbers.
- Overcoming Language Barriers: Voice AI can significantly reduce the barrier for non-native English speakers, as the AI does not "judge" accents, making phone interactions more comfortable. This opens possibilities for AI-powered language practice tools.
The Voice Marketplace: Earning Passive Income
11 Labs has created a voice marketplace where individuals can monetize their voices:
- Process: Users record approximately 30 minutes or more of their voice, go through an authentication flow, and create a "perfect replica" of their voice. This replica can speak in the original language and all other supported languages (currently 30 variations, soon 70).
- Monetization: If a user chooses to share their voice on the marketplace under specific conditions, they get paid royalties whenever their voice is used within the 11 Labs ecosystem.
- Scale and Earnings: The marketplace hosts almost 10,000 shared voices. 11 Labs initially paid $2 million in royalties and has now paid $5 million to the community. While average earnings are difficult to pinpoint, many creators earn "a few hundred per month." Success is often tied to community engagement and having a unique voice (e.g., a deep Spanish voice became a top 10 voice globally, even in English-speaking contexts).
Nuances and Future of Voice Cloning
- Cloning Challenges: Cloned voices may sound slightly different from specific video segments due to variations in intonation or emotional patterns within a scene, as the clone is an "average" of the overall recording.
- Future Solutions: 11 Labs is working on features to precondition audio after a few seconds of video upload to better match specific scene nuances. Short-term fixes include re-generating audio or using shorter audio samples for cloning.
- Ubiquitous Digital Voice Agents: Mari predicts that within 2-3 years, everyone will have their own digital AI voice and an authenticated digital voice agent for tasks like booking appointments or managing personal data.
- Impersonation and Safeguards: Acknowledging that voice cloning for malicious purposes will happen, 11 Labs implements safeguards like traceability and content moderation. For the future, an ideal three-layered system is proposed:
- Check for Human: Verify human presence through device encoding (e.g., calling from an authenticated phone).
- Watermark Authenticated AI: Embed watermarks in AI-generated content from known, permissioned tools.
- Default to AI: If content doesn't pass the first two layers, it's assumed to be AI and untrusted by default. This requires a mindset shift from "maybe this is AI" to "this is definitely AI, unless proven otherwise."
- Personalization and Selection: The future will see businesses serving different voice styles (e.g., slower, calmer for older demographics; quicker, more emotional for younger) based on customer data. Conversely, users will be able to pre-select their preferred AI voices for various services (e.g., a specific voice for banking or travel directions).
Founder's Perspective: Opportunities, Fears, and Responsibilities
Mari views the current AI shift as an "incredible opportunity," potentially larger than the internet's impact. 11 Labs aims to lead the voice frontier, defining how technology is used and creating value.
Key Responsibilities and Fears:
- Continued Innovation: Pushing the audio frontier in text-to-speech, speech-to-text, and music, striving to outperform even the largest AI labs.
- Risk Mitigation: Investing heavily in developing safeguards against the misuse of voice technology, such as impersonation.
- Economic Impact: Addressing how AI will change jobs. The goal is to enable people to be part of the disruption rather than just being displaced, exemplified by the voice ecosystem.
The Future of Work: AI and Job Transformation
Mari emphasizes that people who use AI will replace those who don't. The key message is to actively engage with AI tools to stay at the forefront.
- Jobs at Risk: Repetitive, manual, and recipe-based tasks (e.g., simple customer support, appointment taking, basic refunds) are most vulnerable.
- Valued Skills: Human expertise in complex problem-solving (e.g., debugging, nuanced customer issues) will become even more valued.
- Advice: Understand how AI works, become an expert, and combine domain expertise with AI for higher value and output. The creative space will also benefit from faster iterations and wider audience reach.
Recommended AI Tools
Beyond 11 Labs, Mari recommends:
- Black Forest Labs: For advanced image generation, noting its realism and potential.
- Anthropic's Claude Code: Described as "incredible" for assisting engineers and even non-engineers in coding tasks.
- Lovable (or Vzero, Replit): For prototyping, client demonstrations, and go-to-market strategies. Mari uses it for both internal demonstrations and personal projects, like creating a story generator for his nieces.
Entrepreneurial Opportunities in Voice AI
Mari identifies a significant entrepreneurial opportunity, particularly for those seeking to earn around $10,000 per month without necessarily being coders:
- Market Gap: Deploying existing voice agents into small and medium-sized businesses (SMBs). Most SMBs are unaware of these possibilities.
- Process: Entrepreneurs can leverage self-serve platforms (like 11 Labs) to take voice agents and deploy them in specific domains.
- Examples:
- Local Doctor's Offices/Dentists: Automating appointment scheduling to prevent missed calls and free up staff.
- Local Mechanics: Handling appointment bookings.
- Value Proposition: These services can generate thousands to tens of thousands of dollars per month per business.
- Localization: The opportunity extends to non-English speaking countries where such services are often not localized.
Advice for New Entrepreneurs
- Deeply Understand the User and the Problem: Obsess over identifying a "burning problem" that people genuinely have. Mari shares 11 Labs' origin: initially aiming to fix poor Polish movie dubbing (one flat voice for all characters), they discovered a more immediate need for voiceovers to repair lines or deliver content without speaking, especially for creators and book authors who struggled with expensive and time-consuming audiobook recordings. This led to a pivot from their initial dubbing prototype.
- Carefully Select Co-founders and Early Team: These individuals will be crucial for success, culture, and long-term collaboration. Mari highlights his 15-year relationship with his co-founder and the high bar set by their early hires.
The Future of Language Learning
Mari believes people will still learn languages, but the primary purpose will shift from necessity to a hobby or self-development. Similar to horse riding, it will become an enjoyable pursuit that broadens one's perspective and cultural understanding. While AI devices (headphones, Neuralink) will enable real-time translation, native speaking will likely remain superior due to processing time. The need to interact and understand across language barriers will diminish, but learning for personal enrichment and cultural immersion will persist, transforming the language learning industry.
Chat with this Video
AI-PoweredHi! I can answer questions about this video "$6.6B AI CEO: How to Make Your First $10,000 with AI | Mati Staniszewski from ElevenLabs". What would you like to know?