Stop Typing. (Seriously)

By Prompt Engineering

Share:

Key Concepts

  • On-device Speech-to-Text: Performing voice transcription locally on a computer without relying on external APIs.
  • Large Language Model (LLM): AI models used for understanding and generating human language, enhancing transcription accuracy and formatting.
  • Hands-Free Mode: Voice transcription activated by toggling hotkeys, allowing simultaneous hand use.
  • Push-to-Talk Mode: Voice transcription activated while holding down a hotkey.
  • Speaker Diarization: Identifying and separating speech from multiple speakers within an audio file.
  • Right (App): The speaker’s developed application for fast, on-device voice-to-text transcription.

The Bottleneck of Typing & Introduction to "Right"

The speaker argues that typing speed is a significant productivity bottleneck in today’s world, representing a century-old technology ripe for disruption. Instead of meticulously crafting each word, the speaker advocates for utilizing voice-to-text technology, specifically highlighting their own application, “Right.” Right is presented as a fast, on-device voice-to-text transcription system designed for macOS, particularly effective on Apple silicon (M1 and later). The key advantage of Right is its privacy and independence from external services; it operates locally with a one-time purchase fee, avoiding monthly subscriptions and data sharing associated with cloud-based alternatives. As the speaker states, “You don't need any external API or service in order to be able to build realtime transcription services.”

Functionality & Modes of Operation

Right features a simple dashboard for configuring hotkeys, reviewing transcription history, and accessing basic statistics. Users can also define custom prompts to tailor transcriptions for specific applications, influencing formatting and tone. The application offers two primary modes:

  • Hands-Free Mode: Activated by toggling hotkeys, allowing continuous speech input while maintaining hand availability. The speaker personally prefers this mode, stating, “it gives me the ability to use my hands and you probably notice I use my hands a lot.”
  • Push-to-Talk Mode: Activated by holding down a hotkey during speech, providing immediate text output upon release.

The system operates in two stages: speech-to-text conversion, followed by processing through a Large Language Model (LLM) to refine the transcription and apply custom prompts. The LLM’s application is particularly useful when different prompts are set for different applications. Users can interrupt a transcription with the Escape key, offering options to continue, pause, or cancel.

Technical Capabilities & Performance

Right functions across any macOS application accepting text input, eliminating the need for separate dictation tools. By default, the LLM corrects errors and adjusts tone, but this “enhanced mode” can be disabled for a faster, pure speech-to-text output. The speaker emphasizes the speed, noting “You’re going to see almost instantaneous output that is going to be shown to you which is kind of crazy.”

Beyond real-time transcription, Right also supports local audio file transcription, performing the task “almost 30 times faster than whisper.” An optional 900MB file download enables speaker diarization, a feature that accurately identifies and separates speech from multiple speakers within an audio recording. All hotkeys are customizable to suit individual preferences.

Pricing, Trial & Future Development

Right is currently priced at $24.99 with a one-time purchase. A 3-day free trial is available, and viewers are offered a 25% discount using the code “prompt.” The speaker intends to dedicate significant time to further development, promising numerous updates and “really cool ideas” for the app. Feedback from the community is actively solicited, even from those who choose not to purchase the application. The speaker concludes, “I would love your feedback.”

Logical Connections

The video progresses logically from identifying a problem (slow typing speed) to presenting a solution (Right). It then details the application’s functionality, technical capabilities, and practical applications, culminating in information about pricing, trial access, and future development plans. The discussion of modes (hands-free vs. push-to-talk) and features (LLM integration, speaker diarization) builds upon the core concept of on-device, real-time transcription.

Data & Statistics

  • Transcription Speed: Local file transcription is “almost 30 times faster than whisper.”
  • File Size (Speaker Diarization): The speaker diarization feature requires an additional 900MB download.
  • Pricing: $24.99 (current price), 25% discount with code “prompt.”

Synthesis/Conclusion

The core takeaway is that “Right” offers a compelling alternative to traditional typing and cloud-based voice-to-text services. By leveraging on-device processing and a Large Language Model, it delivers fast, accurate, and private transcription capabilities for macOS users. The application’s flexibility, customizable features, and focus on user feedback position it as a potentially disruptive force in the productivity space. The speaker’s emphasis on a one-time purchase and data privacy further differentiates Right from subscription-based competitors.

Chat with this Video

AI-Powered

Hi! I can answer questions about this video "Stop Typing. (Seriously)". What would you like to know?

Chat is based on the transcript of this video and may not be 100% accurate.

Related Videos

Ready to summarize another video?

Summarize YouTube Video