AI beyond language and vision | Paul Liang

Multisensory Intelligence & the Future of AI: A Conversation with Paul

Key Concepts:

Multisensory Intelligence: The development of AI systems capable of perceiving and interacting with the world through all human senses (sight, sound, touch, smell, taste).
Moravex Paradox: The counterintuitive discovery that tasks easy for humans (like grasping objects) are hard for AI, while tasks hard for humans (like complex calculations) are easy for AI.
Crossmodal Plasticity: The brain’s ability to reorganize and adapt following damage or loss of a sense, enhancing other senses.
Haptics: Technology that recreates the sense of touch through mechanical stimulation.
AGI (Artificial General Intelligence): Hypothetical AI with human-level cognitive abilities.
BCI (Brain-Computer Interface): Technology enabling direct communication between the brain and external devices.

I. The Limitations of Current AI & the Importance of Multisensory Input

The conversation begins with a discussion of the current state of AI, highlighting its limitations compared to human perception. While AI excels in narrow tasks like natural language processing and image recognition, it lacks the holistic, multisensory experience that defines human interaction with the world. The speaker, John, draws a parallel to the difference between experiencing a real strawberry (activating 255 taste buds) versus a synthetic strawberry flavor (activating only five), questioning whether reliance on artificial stimuli distorts our perception of reality. He emphasizes that humans perceive the world through a “congregation accumulation of all senses,” a capability currently far beyond AI. Paul, leading the “Multisensory Intelligence” group at MIT, agrees, stating that current AI systems perceive only a “very very narrow slice of the world.” He notes that while advancements are being made in vision and language, AI is significantly behind in replicating senses like touch, smell, and taste.

II. The Unique Potential of Smell & the Ability to "See the Past"

Paul focuses on the potential of AI to replicate the sense of smell, arguing it’s profoundly important for perception, connection, and memory. He envisions a future where people can share not just images of food, but also its aroma with loved ones. A key point is the unique ability of smell to evoke memories and provide information about the recent past. He states, “Smell is the only modality that can allow us to see the past. Using smell, you can detect what food and beverages were here 30 seconds or a couple minutes ago. You can detect how many people were in this room.” This is contrasted with vision, language, and sound, which cannot provide this temporal information. This unique characteristic makes AI-driven smell perception a particularly exciting area of research.

III. Academic Journey & the Rise of Multimodal Models

Paul details his academic path, beginning his PhD in 2018 during the proliferation of deep learning in computer vision and natural language processing. Initially indecisive about his research focus, he gravitated towards understanding human communication, recognizing its multifaceted nature involving verbal language, facial expressions, tone, and gesture. This led him to develop “multimodal models” combining natural language processing, visual processing, and speech processing. The development of large language models further propelled this research, creating momentum for multimodal AI.

IV. Ranking the Senses & Exploring New Sensory Possibilities

The discussion shifts to ranking the senses in terms of AI development potential. Language, vision, and audio are considered the most readily achievable, due to the abundance of available data. Touch is identified as the next frontier, illustrated by the “Moravex Paradox.” The paradox explains why tasks intuitive for humans (like grasping) are difficult for AI, while tasks challenging for humans (like complex calculations) are easy. This difficulty stems from the intuitive, often indescribable nature of touch, making it hard for AI to learn. Paul’s group is actively working on tactile sensing and haptic technology, including “haptic intuition gloves” designed to restore or enhance the sense of touch.

The conversation then explores the possibility of creating entirely new senses through technology, such as “seeing” Wi-Fi or heat signatures. Paul suggests these could be extensions of existing senses (like vision) or entirely new sensory modalities. He also references research at MGH involving a metal rod implanted in a brain, demonstrating the brain’s remarkable plasticity and ability to adapt to altered sensory input.

V. Augmented Intelligence, BCI, and Future Directions

The discussion expands to the concept of “augmented intelligence” – using AI to enhance human capabilities rather than replace them. Paul highlights the potential of AI to augment memory, providing access to vast amounts of information and contextualizing current decisions with past experiences. He also emphasizes AI’s potential in healthcare, analyzing physiological data from sensors to improve health and wellbeing.

Brain-Computer Interfaces (BCIs) are introduced as a potential future integration point for AI. While Paul believes BCIs are 10-20 years away from widespread implementation, he acknowledges their potential to create a powerful synergy between AI and the human brain. He stresses the importance of addressing ethical concerns related to robustness, privacy, and ensuring AI augmentation enhances rather than overrides human agency.

VI. Recruitment & the Vision of Multisensory Experiences

Paul concludes by outlining the qualities his group seeks in potential recruits: individuals who can bridge disciplines (AI, biology, chemistry, neuroscience, engineering) and bring new perspectives to the field. He describes an upcoming exhibition at MIT’s Media Lab where visitors can input a favorite memory and experience it through a fully immersive multisensory simulation – sight, smell, touch – powered by AI.

Notable Quotes:

John: “If a kid grows up having more Jolly Ranchers than strawberries, do they think the strawberry is really the Jolly Rancher taste?” – Illustrating the potential for artificial stimuli to distort perception.
Paul: “Smell is the only modality that can allow us to see the past.” – Highlighting the unique temporal information provided by smell.
Paul: “AI technically has unbounded memory.” – Emphasizing AI’s potential to augment human memory.

Technical Terms:

AGI (Artificial General Intelligence): AI with human-level cognitive abilities.
BCI (Brain-Computer Interface): Technology enabling direct communication between the brain and external devices.
Haptics: Technology recreating the sense of touch.
Multimodal Models: AI models that process and integrate information from multiple sensory inputs (e.g., text, images, audio).
Moravex Paradox: The difficulty AI has with tasks easy for humans and vice versa.
Crossmodal Plasticity: The brain’s ability to reorganize after sensory loss.

Logical Connections:

The conversation flows logically from a broad discussion of AI’s limitations to a focused exploration of multisensory intelligence. The discussion of the Moravex Paradox provides a framework for understanding the challenges of replicating human senses in AI. The exploration of new sensory possibilities builds upon the foundation of existing senses, suggesting potential avenues for future research. The concluding remarks emphasize the practical applications of this research and the importance of interdisciplinary collaboration.

Data & Research Findings:

The example of the strawberry vs. Jolly Rancher illustrates the difference in sensory activation between natural and artificial stimuli.
The discussion of the MGH brain implant case study highlights the brain’s plasticity.
The mention of AI’s success in standardized tests (SATs, law exams) demonstrates its proficiency in tasks with abundant data.

Conclusion:

This conversation underscores the critical need to move beyond narrow AI and develop systems capable of perceiving and interacting with the world in a more holistic, human-like manner. Paul’s research on multisensory intelligence, particularly his focus on smell and touch, represents a significant step towards this goal. The potential for AI to augment human senses, enhance memory, and improve healthcare is immense, but requires interdisciplinary collaboration and careful consideration of ethical implications. The future of AI lies not just in replicating intelligence, but in expanding our sensory capabilities and enriching our experience of the world.

AI beyond language and vision | Paul Liang | TEDxMIT

Multisensory Intelligence & the Future of AI: A Conversation with Paul

Chat with this Video

Related Videos

Ready to summarize another video?