Accelerating human-centered AI + XR innovation with XR Blocks
By Chrome for Developers
Key Concepts
- XR Blocks: An open-source SDK (Software Development Kit) designed to accelerate AI and XR (Extended Reality) innovation by providing a unified framework for building AI-driven XR experiences on the web.
- Web AI: The integration of artificial intelligence capabilities within web applications.
- Web XR: A web standard that allows for the creation of immersive experiences (virtual reality and augmented reality) directly in a web browser.
- AI + XR Innovation: The convergence of artificial intelligence and extended reality technologies to create new paradigms of computing and user experiences.
- On-device Machine Learning: Running machine learning models directly on the user's device (e.g., smartphone, XR headset) rather than relying on cloud servers, enhancing privacy and reducing latency.
- Perception: The ability of an XR system to understand and interpret the physical world, including depth, lighting, and geometry.
- Interaction: The ways in which users can engage with XR environments, including gestures, touch, and voice commands.
- Generative AI: AI models capable of creating new content, such as text, images, or 3D models.
- Node Graph Editor: A visual programming interface where developers connect different functional blocks (nodes) to create complex pipelines or applications.
- Low-level Integration: The process of manually connecting and configuring various hardware and software components in XR development, which can be complex and time-consuming.
- Abstraction: Hiding complex underlying details to provide a simpler, higher-level interface for developers.
- Interactive Primitives: Reusable building blocks for common XR interactions and functionalities.
- Simulator: A tool that mimics XR hardware and environmental conditions, allowing developers to test their applications without needing physical XR devices.
XR Blocks: Accelerating AI + XR Innovation
This presentation introduces XR Blocks, a new open-source framework developed by Google XR Labs to significantly accelerate the development of AI-driven Extended Reality (XR) experiences on the web. The framework aims to bridge the gap between the rapidly advancing AI ecosystem and the more fragmented XR landscape.
The Problem: The AI-XR Divide
While AI development benefits from mature frameworks like Jax, TensorFlow, and PyTorch, and benchmarks like Hugging Face Arena, XR development often requires practitioners to manually integrate disparate low-level systems for perception, rendering, and interaction. This manual integration, coupled with frequent Unity version upgrades and the challenges of migrating mobile AR interfaces to XR, creates a significant barrier to entry and slows down innovation. The speaker highlights that even a simple task like changing a cube's color using pinch, click, or touch gestures can require over 200 lines of code in traditional web development, and even with Unity, adapting to different headsets (MetaQuest, Apple Vision Pro, Android XR) is complex and time-consuming.
XR Blocks: The Solution
XR Blocks aims to simplify XR development by providing a high-level abstraction of AI-driven XR paradigms. It offers a unified SDK for both desktop and Android XR platforms, focusing on:
- XR Realism: Features like depth-aware physics, geometry-aware occlusion, and lighting estimation enhance the visual fidelity and believability of XR experiences.
- XR Interaction: Enables custom gestures with on-device machine learning integration, allowing for intuitive interactions like touching and grabbing physical objects.
- AI + XR Integration: Facilitates the creation of XR applications with object understanding and proactive conversational agents.
The core philosophy of XR Blocks is to allow creators to focus on the core idea of their XR application rather than getting bogged down in low-level integration. The framework strives to achieve this with minimum code, exemplified by a cube color-changing script requiring only 39 lines of JavaScript, and a core logic script of less than 15 lines. The same code can be deployed across desktop, mobile phones, and Android XR headsets.
Key Features and Components
XR Blocks is built upon several foundational technologies and incorporates a range of subsystems:
- Web XR and 3.js: Chosen as initial building blocks for the framework, leveraging the power of web-based XR and a popular 3D graphics library.
- Gemini: Integrated for AI capabilities, enabling features like object recognition and conversational agents.
- Subsystems within the SDK:
- AI Module: For integrating machine learning models.
- Camera: For capturing visual input.
- Depth: For understanding the 3D structure of the environment.
- Lighting Estimation: For capturing ambient lighting conditions.
- Physics: For realistic object interactions.
- Sound: For spatial audio.
- Input: For handling user interactions (gestures, touch, etc.).
- Agent: For creating conversational AI agents.
- UX: For user experience elements.
- Effect: For visual effects.
- UI: For user interface elements.
- Simulator: A crucial component that simulates depth maps, lighting estimation, and hand gestures, allowing for efficient testing and development without physical XR hardware. This simulator ensures that code written for the simulator will work seamlessly on actual Android XR headsets.
Real-World Examples and Applications
The presentation showcases several compelling examples of what can be achieved with XR Blocks:
- Model Viewer: A tool to quickly wrap geometry primitives, 3D models, and 3D splatting instances for interactive viewing and manipulation in XR.
- Specialized UI Library: Features fine-grained distance functions for high-quality text rendering and composable APIs for generative user interfaces, powered by LightRT.js.
- On-device Gesture Recognition: Creators can simulate hand gestures like "thumbs up" and "victory sign" to trigger machine learning models on-device, ensuring user privacy as no gesture data is sent to servers.
- Spatial Audio and Geometry-Aware Effects: Demonstrations include ring drops appearing on hands and other immersive visual effects.
- Object Recognition with Gemini: Users can point at objects in their environment, and Gemini can identify them and answer questions, such as "Where can I buy this coffee table?"
- Depth Sensing and Physics: Real-time demos on Android XR headsets show pinch-to-shoot functionality with colorful balls interacting realistically with the environment using on-device depth sensing.
- Gesture-Controlled Interactions: Thumbs up to summon balloons, victory sign to summon colorful strips, and dynamic web gestures to navigate between portals in future photo applications.
- AI-Powered Content Generation:
- Poem Generation: Using the see-through camera to recognize objects and ask Gemini about their calorie content.
- Art Gallery: An infinite gallery created with XR Blocks and Gemini Converse, where users can prompt for creation and navigate between art pieces.
- Procedural City Generation: Building a city by clicking and pinching on a virtual map.
- Bubble XR: A personal project where users can summon and dismiss bubbles using hand gestures.
Development Methodology and Vision
XR Blocks is envisioned as a collaborative effort, with the current release representing "halfway towards this road map." The project welcomes community contributions to complete the remaining pieces. The goal is to build a set of interactive primitives for web coding for XR, unifying basic interactions like hand pinch, mouse click, and screen touch under a single "select" action.
The "northstar" of XR Blocks is to turn ideas into reality at the speed of thought, enabling AI to maximize human creativity. The framework aims to make XR development as scalable and accessible as AI development.
Future Directions and Community Involvement
The team is open to extending XR Blocks to native C++ with OpenXR and Unity, further broadening its reach. The project's roadmap and interaction paradigms are detailed in an accompanying archive paper. The ultimate vision is to make AI "saturated in XR," empowering everyone to unleash their creativity.
The presentation concludes with a thank you to all the contributors to XR Blocks over the past year and an invitation for continued listening, watching, and contributing to the project.
Chat with this Video
AI-PoweredHi! I can answer questions about this video "Accelerating human-centered AI + XR innovation with XR Blocks". What would you like to know?