Deepseek AI with React, Tanstack Start and Ollama

Key Concepts

Deep Seek R1: A large language model (LLM) with "thinking" capabilities.
Ollama: A tool for running LLMs locally.
TanStack Start: A full-stack framework used for building the chat application.
Server Functions: Functions in TanStack Start that run on the server.
Streaming: Sending data in chunks rather than all at once.
Async Iterators: A way to process data streams asynchronously.
Custom Hooks: Reusable functions in React that manage state and side effects.

Running Deep Seek R1

Two Options:
- Hosted version via API key.
- Local execution using tools like Ollama.
Ollama Installation:
- Install Ollama on your machine.
- Pull the desired Deep Seek model using the command ollama pull deepseek-ai/deepseek-llm-33b. The specific model size (e.g., 32b, 7b) depends on your hardware.
Model Interaction:
- Deep Seek models exhibit a "thinking" process, indicated by [THINK] and [/THINK] tags in the output.
- The output is in Markdown format, allowing for structured presentation of the thinking process and the final solution.

API Endpoint and Streaming

API Endpoint: The /api/chat endpoint is used to interact with the Deep Seek model.

CURL Command Example:

curl http://localhost:11434/api/chat -d '{
  "model": "deepseek-ai/deepseek-llm-32b",
  "stream": true,
  "messages": [{"role": "user", "content": "Hello"}]
}'

Streaming Response: Setting "stream": true enables streaming, where the response is sent in chunks.
JSON Messages: The stream consists of JSON messages containing the model name, creation timestamp, and the content (message.content).
Think Tags: The [THINK] and [/THINK] tags are embedded in the content to delineate the model's reasoning process.

Building the Chat Application with TanStack Start

Initial Setup: A TanStack Start application is set up with a text input for user messages.
Server Function:
- A server function is created using createServerFunction from TanStack Start.
- The function is defined as a POST request to avoid caching.
- The handler function takes the chat messages as input.
- It uses fetch to call the Deep Seek API endpoint with the model, streaming option, and messages.
- TanStack Start automatically handles returning the stream to the client.
Client-Side Implementation:
- A handleSubmit function is created to handle form submissions.
- The function adds the user's message to the list of messages.
- The chat server function is called, returning a readable stream.
Parsing the Stream:
- getReader() is used to get a reader from the readable stream.
- A streamingAsyncIterator helper function is used to convert the reader into an async iterator compatible with for await...of.
- The helper function also decodes the Uint8Array data into text.
Building the Response:
- An assistantResponse variable is used to accumulate the agent's response.
- Each JSON blob from the stream is parsed, and its content is appended to the assistantResponse.
- setMessages is used to update the message list with the new assistant response.

Separating Thinking and Content

Message Structure: The message object is extended to include thinking, content, and finishedThinking properties.
Custom Hook (useMessagesWithThinking):
- A custom hook is created to transform the messages array into an array of messages with thinking.
- The hook iterates through each message, checking if it's from the assistant.
- If the message is from the assistant, it checks if the [THINK] and [/THINK] tags are present.
- The content is split into thinking and content sections based on these tags.
- The finishedThinking flag is set to true when the [/THINK] tag is encountered.
UI Integration: The custom hook is used to provide the UI with the transformed messages, allowing the thinking section to be displayed separately.

Notable Quotes

(Implied) "All this code is available to you for free in GitHub and a link in the description right down below no strings attached" - Encourages viewers to explore and use the provided code.

Technical Terms

LLM (Large Language Model): A type of AI model trained on a massive amount of text data.
API (Application Programming Interface): A set of rules and specifications that software programs can follow to communicate with each other.
JSON (JavaScript Object Notation): A lightweight data-interchange format.
Markdown: A lightweight markup language with plain text formatting syntax.
JSX (JavaScript XML): A syntax extension to JavaScript that allows writing HTML-like structures in React components.
Uint8Array: An array of 8-bit unsigned integers, often used to represent binary data.

Logical Connections

The video progresses logically from setting up the environment (Ollama, Deep Seek) to building the core functionality of the chat application (server function, client-side stream processing) and finally enhancing the UI to display the model's "thinking" process separately. Each step builds upon the previous one, creating a complete and functional application.

Synthesis/Conclusion

The video provides a practical guide to building a chat application powered by the Deep Seek R1 model, emphasizing the model's unique "thinking" capabilities. By leveraging Ollama for local model execution and TanStack Start for full-stack development, the tutorial demonstrates how to create an application that not only generates responses but also reveals the reasoning behind them. The use of streaming and custom hooks allows for a responsive and informative user experience, showcasing the potential of local-first AI development.