How AI is revolutionizing sign language recognition with Sam Sepah and Thad Starner

By Google for Developers

TechnologyAIEducation
Share:

Key Concepts

Augmented Reality, Wearable Computing, Sign Language Recognition, Accessibility Technology, Human-Computer Interaction, Language Deprivation, American Sign Language (ASL), Data Collection, AI Model Training, Kaggle Competitions, Pop Sign AI, Deaf Community, Inclusivity, User Studies, Real-time Translation, Assistive Technology.

Thaad Starner's Journey into AI and Wearable Computing

Thaad Starner's interest in AI began at age 14 after reading "Machines Who Think" by Pamela McCordic. He aimed to become a professor teaching AI, leading him to Georgia Tech. Despite growing up in a rural area with limited technology, he was drawn to video games and discovered the book at the library. This book inspired him to program agents that could assist in daily life.

In college, Starner focused on headworn displays and wearable computers, creating AI glasses in 1993 that would listen to conversations and display relevant past emails or conversations. He demonstrated this "remembrance agent" to Larry Page and Sergey Brin in 1998 at the New Paradigms in Computing conference. At the time, Page and Brin were graduate students working on Google, a project to improve web search results. Starner discussed the potential of having Google's search capabilities on an eyeball display for quick access to information during conversations.

Although offered the Google index for his wearable computer, Starner didn't join the company immediately. Later, after Android's release, he contacted Page and Brin, leading to his involvement with Google Glass. He collaborated with Sam Seipa on captioning for the deaf and hard of hearing, discovering their shared passion for American Sign Language. Starner's master's thesis was on recognizing ASL, and he had been working with the deaf community to improve sign language recognition.

Starner emphasizes the concept of AI as an extension of oneself, like a cyborg, enhancing independence, intelligence, and social grace. He uses Viewix Z 100 glasses at Georgia Tech for AI research, aiming to improve human-computer interaction. He envisions real-time language translation and seamless integration of technology into social interactions.

Early Google Glass prototypes were bulky, weighing a kilogram on the head and 10 kilograms in a backpack. Through development, the weight was reduced to 42 grams, lighter than normal eyeglasses. Starner highlights the importance of making technology unobtrusive, so it doesn't draw attention when used for captioning or other assistive purposes.

His initial interest in sign language recognition was driven by the need for a master's thesis. He applied hidden Markov models, learned at Bolt, Beranek and Newman (BBN), to sign language recognition, becoming the first to demonstrate phrase-level ASL recognition with standard cameras. However, he struggled to find a practical application until Harley Hamilton, a sign linguist, suggested using it to help hearing parents of deaf infants learn sign language.

Starner notes that 95% of deaf infants are born to hearing parents, most of whom don't learn enough sign language to teach their children. This lack of language exposure can lead to short-term memory issues and hinder learning. He aims to provide sign language lessons to hearing parents through smartphone games, reducing the financial and time burden.

Sam Seipa's Advocacy for Accessibility and the Deaf Community

Sam Seipa was born hearing in Iran but became deaf at 14 months due to spinal meningitis. His parents, highly educated, decided to learn sign language and moved to Germany to provide him with better deaf education. They recognized the importance of early language acquisition and wanted to ensure he could attend college.

In Germany, Seipa attended a special school for the deaf that used sign language. His parents learned German and German sign language to communicate with him. He emphasizes the critical period of language acquisition from ages zero to five and the importance of parents communicating with their young infants.

After two years in Germany, the family moved to Tucson, Arizona, where Seipa attended the Arizona School for the Deaf and Blind. This school provided a fully signing environment with deaf faculty and administrators, offering role models and resources.

Seipa's bachelor's degree is in world history with a minor in sociology, and his master's degree is in human resources management. He initially pursued film and animation but shifted to HR to work with people and improve their working experiences. He believes that creating a better workplace is essential, as people spend most of their waking hours at work.

Despite not being a STEM major, Seipa's understanding of technology helped him in HR roles at IBM, GE, and the National Institutes of Health. He was later invited to work with Google Glass on captioning, where he met Thaad Starner.

Seipa was impressed by Starner's genuine understanding of deaf people and his willingness to incorporate feedback. During the pandemic, he participated in Google Glass testing, receiving the device in a "spy movie" scenario due to safety protocols. This experience solidified his desire to work with Starner.

Seipa highlights the importance of the Deaf community, defined by the use of American Sign Language. He emphasizes that learning sign language provides insight into the culture and community. He recounts an experience at a conference where his sign language demonstration was misunderstood by a deaf audience because they were expecting spoken English translation.

The Role of AI in Advancing Sign Language Recognition

AI has significantly enhanced sign language recognition through increased data collection. Starner notes that more progress has been made in the past two years than in the previous 30 due to collaborations with deaf project managers, colleagues, and student researchers.

A key insight came from a deaf student researcher, Max Shing Shangelia, who pointed out the shift towards one-handed signing on smartphones. This led to the use of high-resolution smartphone selfie cameras for data collection. Google's MediaPipe hand tracker enabled on-device processing.

The team distributed Pixel 4a smartphones to collect data, creating databases 10 times larger than previous ASL datasets. Current sign language recognition efforts involve less than 2,000 hours of signing, compared to millions of hours for speech recognition. The goal is to accelerate sign language recognition by leveraging AI advancements in speech recognition.

Isolated signing was used for the Pop Sign AI game, which helps hearing parents learn vocabulary. A $100,000 Kaggle competition demonstrated the feasibility of running AI models on a smartphone. Finger spelling was also addressed through a Kaggle competition with 3 million characters of data. A third Kaggle competition is planned for phrase-level recognition.

Pop Sign AI provides immediate feedback on sign accuracy, helping users learn at their own pace. The AI recognizer corrects hand placement and movement, preventing incorrect sign usage. The game is being adapted for Japanese, Indian, and Australian sign languages.

The team is developing toolkits for creating sign language games, including a Boggle game. The goal is to create a game-making platform that incorporates sign language recognition, making it easier for hearing parents to learn sign in their respective countries.

Seipa emphasizes the untapped market potential of sign language accessibility, noting that the sign language industry in the US is worth about $10 billion. He highlights the $13 trillion global disability spending market, encouraging game developers to make their products accessible.

The team is also working on phrases for game playing, such as instructing a hero character. This aims to improve short-term memory in deaf children who may have language deprivation issues.

Addressing the Divide Between the Deaf and Hearing Communities

Starner mentions SmartSignDictionary.org and the ASL dictionary on YouTube as resources for learning sign language. These platforms contain translations of English phrases and idioms into ASL.

Seipa emphasizes two workflows: teaching hearing people how to communicate with deaf individuals and making Google platforms more accessible to the deaf community. He notes that 70 million signers worldwide are not getting information as quickly as others.

The team is working to provide real-time information access for deaf consumers. They also recognize the importance of relationship-based communication between parents and children, and colleagues.

Search engine data reveals that the most common sign language search is for "I love you," highlighting the universal desire for connection.

Seipa acknowledges the challenges of learning about data sets, model training, and model tuning. He credits his experience in HR and organizational development for his ability to bring people together and motivate them towards a common goal.

Starner emphasizes Seipa's deep connection with the deaf community, which has facilitated collaborations with DPAN, NTID, and student researchers. He notes that 50% of his team members are deaf, which significantly changes the dynamics and perspectives.

What it Means to Be a Person of AI Today

Starner believes that being a person of AI means finding meaningful problems to improve the human condition and creating useful and usable interfaces for AI. He emphasizes the importance of combining interface and AI to make technology understandable and meaningful in daily life.

Seipa believes that being a person of AI means advocating for the inclusion of underrepresented communities and bridging the gap between available technology and their needs. He highlights the importance of applying AI to address the market needs of the signing community.

Conclusion

The conversation highlights the transformative potential of AI in bridging communication gaps between the deaf and hearing communities. Thaad Starner's pioneering work in wearable computing and sign language recognition, combined with Sam Seipa's advocacy and deep connection to the deaf community, has created a powerful collaboration. Their efforts are focused on developing accessible technologies, promoting inclusivity, and improving the lives of deaf individuals worldwide. The development of Pop Sign AI and the ongoing Kaggle competitions demonstrate the practical applications of AI in sign language education and accessibility. Their commitment to this work serves as an inspiration and sets a high standard for future collaborations in the field.

Chat with this Video

AI-Powered

Hi! I can answer questions about this video "How AI is revolutionizing sign language recognition with Sam Sepah and Thad Starner". What would you like to know?

Chat is based on the transcript of this video and may not be 100% accurate.

Related Videos

Ready to summarize another video?

Summarize YouTube Video