Back to all videos

DeepSeek Engram: We’ve Been Building LLMs Wrong

By Prompt Engineering

Large Language Models AI Architecture Machine Learning Optimization

Share:

Key Concepts

Large Language Models (LLMs): Powerful AI models trained on massive datasets to generate human-like text.
Transformer Layers: The core building blocks of LLMs, responsible for processing and understanding language.
Retrieval-Augmented Generation (RAG): A technique to improve LLM performance by retrieving relevant information before generating a response. Deepseek’s engrams represent a novel approach to RAG.
Engrams: Deepseek’s system acting as an “address book” for LLMs, enabling fast factual recall.
Computation vs. Memory: The distinction between the processing power used for reasoning and the storage capacity used for recalling information.

Inefficient Computation in Current LLM Architecture

Current Large Language Models (LLMs) exhibit significant computational waste with each query, even for questions with readily available answers. The video highlights the example of asking “Who is Princess Diana?” – a fact the model demonstrably already knows. Despite this, the LLM processes the query through all 30+ transformer layers, consuming substantial computational resources. This is presented as fundamentally inefficient, likened to using a calculator solely for storing a small set of numbers; it’s technically possible, but demonstrably wasteful. The core issue is that LLMs currently dedicate processing power to recalling information that could be efficiently retrieved.

Deepseek’s Solution: Engrams for Efficient Recall

Deepseek addresses this inefficiency with their newly developed system, “engrams.” Engrams function as an “address book” for the LLM, providing a mechanism for rapid factual recall. Instead of re-processing information through the transformer layers for known facts, the LLM first undertakes a retrieval step using engrams. This retrieval step performs a quick lookup to access the necessary information. The key advantage is the system’s universality; engrams can be integrated with any existing LLM architecture.

Shifting the Computational Burden: Reasoning vs. Recall

The implementation of engrams fundamentally shifts the computational burden. Previously, LLMs utilized computation for both reasoning and recall. With engrams, computation is reserved for complex reasoning tasks, while memory (engrams) handles the efficient retrieval of factual information. This separation of concerns is presented as a significant optimization.

Unexpected Improvement in Reasoning Capabilities

A surprising outcome of implementing engrams is the observed improvement in the LLM’s reasoning abilities. While the primary goal was to reduce computational waste, the system also demonstrably enhanced the quality of the LLM’s reasoning process. The video doesn’t detail how this improvement occurs, but implies that freeing up computational resources from recall allows for more focused processing during reasoning.

Practical Implications and Further Exploration

The video positions engrams as a potentially transformative development in LLM efficiency and performance. The presenter encourages viewers to learn more by watching a linked video (link provided in the description) for a deeper dive into the technical details of the system. The implication is that this approach could lead to significant cost savings and performance gains in LLM applications.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video