Google Photos Magic Editor: GenAI Under the Hood of a Billion-User App - Kelvin Ma, Google Photos

Google Photos Editing: Leveraging AI for Enhanced User Experience

Key Concepts:

Computational Photography
Machine Learning (ML)
Generative AI (GenAI)
Model Inference
Segmentation
Inpainting
TensorFlow Lite (Light RT)
Edge TPU
Evals (Model Evaluation)
Latency
Ambiguous Problem Space
AI Engineering

1. Introduction to Google Photos and Editing Team

Google Photos is a home for memories with 1.5 billion monthly active users.
It offers auto backup and uses ML for indexing, OCR, and automated album creation.
The computational photography (editing) team started in 2018, focusing on using device compute for image edits.
The goal is to create great image edits for any image, regardless of capture device.

2. Computational Photography Explained

Traditional HDR requires multiple images at different exposures, a tripod, and Photoshop skills.
Computational photography uses ML to achieve similar results from a single image.
Google Photos leverages vertical integration (Pixel hardware, Edge TPU, internal research) for accelerated compute.
In 2018, building features required close collaboration with internal computer vision and ML researchers.

3. Tech Stack Overview

Clients: Android, iOS, Web.
Shared C++ library for on-device model inference.
Integration with research partners.
Model inference using TensorFlow Lite (now Light RT).

4. Early Editing Features (2018-2021)

Post-Capture Segmentation: Adding bokeh to the background of portraits after the photo is taken.
Portrait Lighting: Fixing lighting issues like washed-out faces or shadows.
Magic Eraser: Removing unwanted objects or people from the background.

5. Deep Dive: Post-Capture Segmentation

Uses a U-Net convolutional neural network for single-subject portrait segmentation.
Separates the foreground (subject) from the background.
ML provides better performance compared to traditional computer vision.
Models always return a result, which can be both a pro and a con.

Challenges:

Model size (10MB was considered large).
Model management and IP protection.
Evals: Building and maintaining benchmarks to ensure model quality and prevent regressions.
Imperfect segmentation (e.g., fine hair strands not captured accurately).

Solutions:

Post-model image understanding to refine segmentation (e.g., identifying and following hair strands).

6. Magic Eraser (2021)

A system of models: distractor detection, segmentation, inpainting.
Custom GL rendering for seamless visualization and animation.

Challenges:

Increased model size (hundreds of megabytes).
More obvious failure cases (e.g., attempting to inpaint in the foreground).

7. Learnings from Early ML Features

Pros:

Great capabilities not achievable with traditional image understanding.
Easy-to-use features.
Consistent latency on specific devices.

Cons:

Unpredictability of edge cases.
Slower iteration due to the nature of ML development.
Hard deadlines for product launches (e.g., Pixel launches) require careful planning and model iteration.

8. The Rise of Generative AI (2022-2023)

The emergence of models like DALL-E and ChatGPT shifted focus to GenAI.
Led to the development of the new Magic Editor experience using state-of-the-art models.
Addresses the discoverability issue of specific editing features.

9. Magic Editor: A Grounded Approach

Product goal: Be more grounded and avoid generating unrealistic or out-of-place content.
Prompting: Use specific prompts to guide the AI.
User Interaction: Allow users to select and visualize what they want to edit, acting as a co-editor.
Leverage the best models available, even if it requires server-side processing.

10. Server-Side Challenges

Server capacity planning (TPUs, GPUs).
Latency due to network quality and distance to data centers.
Testing becomes more difficult due to large model sizes.

11. Addressing the Ambiguous Problem Space

The "solved problem" fallacy: Just because a model works in some cases doesn't mean it's ready for production.
Constrain the problem being solved to ensure reliability.
Prompt engineering: Extract user intent and provide the desired result, rather than relying on detailed prompts.
Reduce ambiguity across all functions (product, research, UX, engineering).

12. Specific Use Cases for Magic Editor

Moving objects: Seamlessly relocate objects within the image.
Reimagining scenes: Change the background to be more exciting.
Improved Erasing: Erase objects and their reflections for a more natural result.

13. Trust and Safety

Preventing deepfakes and being responsible with AI.
Managing expectations: Acknowledge that AI will not be perfect and requires ongoing improvement.
Addressing the ambiguity of human language in prompts.

14. Learnings on Working with AI

AI engineering is software engineering with ML on top.
ML adds randomness, and the engineer's job is to reduce it.
Build and use evals to measure model performance.
Replace large models with smaller, faster, and more efficient models.
Faster iteration leads to better product improvements.

15. Future Directions

Rebuilding the Google Photos editor from the ground up to be AI-first.
Surfacing relevant edits based on image content.
Using AI and deterministic tools for better edits.
Potential for on-device models to become as powerful as previous server-side models.

16. Conclusion

The Google Photos editing team is leveraging AI, particularly generative AI, to create powerful and easy-to-use editing features. While challenges exist around model size, latency, and the inherent unpredictability of AI, the team is focused on reducing ambiguity, building robust evaluation systems, and iterating quickly to deliver valuable experiences to users. The future of Google Photos editing is AI-first, with the goal of meeting users where they are and providing intelligent assistance to enhance their memories.