Recsys Keynote: Improving Recommendation Systems & Search in the Age of LLMs - Eugene Yan, Amazon

Summary of YouTube Video: Recommendation Systems and Language Models

Key Concepts:

Semantic IDs
Data Augmentation (Synthetic Data)
Unified Models
Cold Start Problem
Popularity Bias
Co-start Coverage
Co-start Velocity
Two-Tower Network
Multi-modal Content Embeddings
Lightweight Classifier
Exploratory Search
Unified Ranker
Unified Embeddings

1. Semantic IDs: Addressing Cold Start and Sparsity

Problem: Hash-based item IDs don't encode item content, leading to cold start problems (new items require relearning) and sparsity (tail items have insufficient interactions). Recommendation systems become popularity-biased.
Solution: Semantic IDs that incorporate multimodal content.
Example: Kuaishou (Short Video Platform):
- Challenge: Learning from hundreds of millions of daily uploaded short videos.
- Approach: Trainable multimodal semantic IDs.
- Model: Standard two-tower network (user tower and item tower).
  - User Tower: Standard sequence of IDs and user ID.
  - Item Tower: Incorporates content input (visual, video descriptions, audio).
    - Visual: ResNet
    - Video Descriptions: BERT
    - Audio: VGish
  - Key Technique: Concatenating content embeddings and learning cluster IDs (e.g., 1000 cluster IDs for 100 million videos via CIN's clustering).
  - Trainable vs. Non-Trainable Embeddings: Non-trainable embeddings above cluster IDs, trainable cluster IDs below, mapped to embedding table.
  - Mechanism: Motor encoder learns to map content space (via cluster IDs) to behavioral space.
- Results:
  - Outperformed hash-based IDs on clicks and likes.
  - Increased co-start coverage by 3.6%.
  - Increased co-start velocity (number of new videos hitting a view threshold).
- Benefits: Addresses cold start, enables recommendations to understand content.

2. Data Augmentation with Language Models: Generating High-Quality Data at Scale

Challenge: Acquiring good quality data at scale is essential for search and recommendation systems, especially metadata for search (query expansion, synonyms, spellchecking). Human annotation is costly.
Solution: Using Language Models (LLMs) for synthetic data and labels.
Example 1: Indeed (Job Recommendations):
- Problem: Poor job recommendations leading to user churn (unsubscribes). Sparse explicit negative feedback (thumbs down). Implicit feedback is imprecise.
- Approach: Lightweight classifier to filter bad recommendations.
- Process:
  1. Expert Labeling: Experts labeled job recommendations and user pairs (resume data, activity data).
  2. Prompting Open LLMs (Mistral, Llama 2): Poor performance; models couldn't understand resume and job description context.
  3. Prompting GPT-4: High precision and recall (90%), but too costly and slow (32 seconds).
  4. Prompting GPT-3.5: Poor precision (63%); throwing out 37% of good recommendations.
  5. Fine-tuning GPT-3.5: Improved precision, but still too slow (6.7 seconds).
  6. Distilling a Lightweight Classifier: Trained on fine-tuned GPT-3.5 labels. Achieved high performance (0.86 AUROC) and real-time filtering latency.
- Outcome:
  - Reduced bad recommendations by 20%.
  - Application rate increased by 4%.
  - Unsubscribe rate decreased by 5%.
- Key Insight: Quality of recommendations is crucial; quantity is not everything.
Example 2: Spotify (Podcast and Audiobook Discovery):
- Problem: Users primarily search for songs and artists. Introducing podcasts and audiobooks creates a co-start problem on category. Exploratory search is essential for expanding beyond music.
- Solution: Query recommendation system.
- Approach:
  1. Generating New Queries:
    - Extract from catalog titles, playlist titles.
    - Mine from search logs.
    - Add "cover" to artist names.
  2. LM Augmentation: Use LLMs to generate natural language queries.
  3. Ranking: Rank new queries alongside immediate search results.
- Outcome: +9% exploratory queries; one-tenth of users exploring new products.
- Key Insight: Augment existing techniques with LLMs where needed, rather than relying solely on LLMs from the start.
Benefits of LM-Augmented Data: Richer, high-quality data at scale, especially for tail queries and items. Lower cost and effort than human annotation.

3. Unified Models: Consolidating Systems for Efficiency and Knowledge Transfer

Challenge: Separate systems for ads, recommendations, and search. Multiple models for different recommendation scenarios (homepage, item-to-item, cart-to-cart, thank you page). Duplicative engineering pipelines, high maintenance costs, and lack of knowledge transfer.
Solution: Unified models.
Example 1: Netflix (Unified Ranker for Search and Recommendations):
- Problem: Bespoke models for search, similar item recommendations, and pre-query recommendations. High operational cost and missed learning opportunities.
- Approach: Unified contextual ranker (Unicorn).
- Model:
  - User Foundation Model: User watch history.
  - Context and Relevance Model: Context of videos watched.
  - Unified Input: User ID, item ID (video/drama/series), search query (if exists), country, task (search, pre-query, more like this).
  - Imputation of Missing Items: Use title of current item to find similar items when no search query exists.
- Outcome: Matched or exceeded the metrics of specialized models on multiple tasks.
- Benefits: Simplifies system, removes tech debt, builds a better foundation for future iterations, and accelerates iteration speed.
Example 2: Etsy (Unified Embeddings):
- Problem: Helping users get better results from specific or broad queries. Constantly changing inventory. Lexical embeddings don't account for user preferences.
- Solution: Unified embedding and retrieval.
- Model:
  - Product Tower (Product Encoder):
    - T5 models for text embeddings (item descriptions).
    - Query product log for query embeddings (query and clicked/purchased product).
  - Query Tower (Search Query Encoder):
    - Shared encoders for text tokens, product category tokens, and user location.
    - User preferences encoded via query user scaler features (search history, purchase history).
  - Quality Vector: Concatenated to product embedding vector to ensure good quality (ratings, freshness, conversion rate). Constant vector added to query embedding to match dimensions.
- Outcome:
  - 2.6% increase in conversion across the entire site.
  - More than 5% increase in search purchases.
- Benefits: Simplifies the system. Improvements to one side of the tower benefit other use cases.
- Alignment Tax: May need to split into multiple unified models if alignment tax is too high (improving one task makes another worse).

Conclusion:

The talk highlights three key areas for improving recommendation systems: semantic IDs for addressing cold start and sparsity, data augmentation with LLMs for generating high-quality data at scale, and unified models for consolidating systems and improving efficiency. The examples from Kuaishou, Indeed, Spotify, Netflix, and Etsy demonstrate the practical applications and benefits of these approaches. The speaker emphasizes the importance of quality over quantity in recommendations and the potential for significant gains through strategic use of language models and unified architectures.

Recsys Keynote: Improving Recommendation Systems & Search in the Age of LLMs - Eugene Yan, Amazon

Summary of YouTube Video: Recommendation Systems and Language Models

1. Semantic IDs: Addressing Cold Start and Sparsity

2. Data Augmentation with Language Models: Generating High-Quality Data at Scale

3. Unified Models: Consolidating Systems for Efficiency and Knowledge Transfer

Chat with this Video

Related Videos

Ready to summarize another video?