Personalization in the Era of LLMs - Shivam Verma, Spotify

By AI Engineer

Share:

Key Concepts

  • Foundational User Modeling: Creating comprehensive, vector-based representations of user tastes across all historical interactions.
  • Semantic IDs: A technique to tokenize content (tracks/episodes) into hierarchical, compressed numerical representations, allowing LLMs to process them like language tokens.
  • Soft Tokenization: Projecting user embedding vectors into the latent space of an LLM to provide personalized context during generation.
  • Generative Recommendation Systems: Moving from traditional multi-stage "candidate generation and ranking" pipelines to unified, transformer-based models.
  • Steerability: The ability for users to influence model outputs through natural language prompts or direct edits to their "Taste Profile."
  • Cross-Content Modeling: Embedding users, music tracks, and podcast episodes within the same shared hypersphere to enable multi-modal recommendations.

1. Evolution of Recommendation Architecture

Spotify is transitioning from a "siloed" multi-stage pipeline—consisting of candidate generation (reducing millions of items to hundreds) followed by multiple ranking stages—to a unified, transformer-based generative model. This shift allows for a more holistic understanding of user intent and content relationships.

2. Foundational User Modeling

The AI Foundation team at Spotify focuses on generating user embeddings for over 750 million users.

  • Methodology: Previously, the team utilized autoencoder models to compress user interaction features into small vectors.
  • Current Shift: They are moving toward sequential transformer models that treat user history as part of the prompt context. This allows the system to understand a user's position in a shared embedding space alongside tracks and episodes, enabling the model to identify "neighborhoods" of content relevant to specific user tastes.

3. Catalog Understanding via Semantic IDs

To bridge the gap between Spotify’s massive catalog (100M+ tracks, 400k+ audiobooks) and LLMs, the team employs Semantic IDs.

  • Process: Instead of using raw URIs, content is compressed into a sequence of 4–6 tokens.
  • Hierarchical Structure: These tokens function hierarchically. For example, two different pop artists might share the first two tokens (representing the genre "pop") while the remaining tokens differentiate their specific niches.
  • Application: This allows the LLM to perform "next-item prediction" auto-regressively, treating a song or episode as the next "word" in a sequence.

4. Achieving Personalization: The "Soft Token" Approach

Because LLMs cannot be retrained on every individual user, Spotify uses soft tokenization to achieve personalization:

  • Mechanism: The user’s embedding vector is projected into the LLM’s latent space.
  • Integration: This projected vector acts as a "soft token" inserted into the prompt. This provides the model with the necessary user-specific context to generate personalized recommendations without requiring a full model update for every user.

5. Real-World Applications and Steerability

Spotify is actively deploying these technologies to increase user control:

  • AI DJ & Prompted Playlists: Users can use natural language to request specific moods or content types, which the model then generates.
  • Taste Profile: A new feature (currently in select markets) that exposes the model's understanding of the user. Users can view, edit, or remove specific data points, allowing them to "steer" the model’s future recommendations.
  • Cross-Modality: The system successfully models both music and podcasts in the same space, allowing for seamless transitions between different media types based on user context.

6. Synthesis and Conclusion

The transition to LLM-based recommendation systems represents a fundamental change in how Spotify handles user data. By combining Semantic IDs for catalog compression and soft tokenization for user personalization, Spotify is moving toward a future where recommendation systems are not just black-box rankers, but steerable, generative interfaces. The primary takeaway is that the future of personalization lies in the ability to project user history into the same latent space as content, allowing for a more fluid, conversational, and user-controlled experience.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video