Back to all videos

Your Agent Can Now Train Models — Merve Noyan, Hugging Face

By Unknown Author

Hugging Face Agentic Ecosystems VLMs Quantization

Share:

Key Concepts

Open Source AI: Models with accessible weights and code, often under permissive licenses (MIT, Apache 2.0).
Agentic Ecosystem: AI systems capable of autonomous reasoning, tool use, and task execution.
Hugging Face Hub: A central repository hosting millions of models, datasets, and spaces, serving as the infrastructure layer for open-source workflows.
VLM (Vision Language Models): Models capable of processing both text and visual inputs, often used for "computer use" tasks.
Quantization: The process of reducing model precision (e.g., to 4-bit) to fit large models onto consumer-grade hardware (e.g., L4 GPUs).
GGUF: A file format optimized for fast inference and compatibility with local serving tools like llama.cpp.
MCP (Model Context Protocol): A standard for connecting AI agents to external data sources and tools.
Traces: A repository type for logging and analyzing agent execution steps, useful for debugging and fine-tuning.

1. The Value of Open Source AI

The speaker emphasizes that open source is "differential" for machine learning. Key benefits include:

Transparency: Unlike closed models, open models do not suffer from "silent" performance degradation.
Customization: Users can quantize, fine-tune, and shrink models to fit specific hardware.
Privacy: Models can be deployed to edge devices or browsers, ensuring data never leaves the user's environment.
Performance: Open models (e.g., GLM 5.1) are now competitive with or superior to closed-source counterparts, as evidenced by the Artificial Intelligence Index.

2. Navigating and Selecting Models

With over 3 million models on the Hugging Face Hub, selection can be challenging. The speaker suggests:

Benchmark Data Sets: Use the "Benchmark" button in the datasets tab to filter models by performance on tasks like SWE-bench Pro (coding) or AIME (math).
Inference Providers: Use Hugging Face’s routing service to compare providers (e.g., Groq, Cerebras) based on speed, cost, and tool-use capabilities.
Hardware Compatibility: Check the GGUF section of a model repository to see if it fits your VRAM constraints (e.g., a quantized Gemma 4 model fitting on 24GB VRAM).

3. Building and Running Agents

The talk highlights several frameworks for deploying agents:

Local Coding Agents: Tools like Pi or llama.cpp (with its built-in llama-agent binary) allow users to run agents locally by simply providing a Hugging Face Hub ID.
Hermes Agents: Recommended for advanced memory management and ease of integration with communication platforms like Slack or WhatsApp.
Setup Process:
1. Select a model (e.g., GLM 5.1).
2. Use a setup wizard to input API keys.
3. Integrate with desired platforms (Slack/WhatsApp).
4. Use the agent to debug its own integration issues via natural language prompts.

4. Supercharging Agents with "Skills"

Hugging Face "Skills" allow agents to interact with the Hub's infrastructure directly:

LLM Trainer Skill: Automates fine-tuning. The agent calculates required VRAM, handles batch sizes, and launches jobs on remote infrastructure.
Hugging Face CLI Skill: Enables agents to manage repositories, run jobs, and launch demos.
Dataset/Space Skills: Allows agents to query the "App Store of AI" (Spaces) or explore datasets via the API.
Example Application: The speaker demonstrated using an agent to generate images by calling the qan-image model via MCP, showcasing how agents can dynamically fetch and use external tools.

5. Real-World Case Study: OCR Processing

The speaker’s colleague, Neils, automated the OCR processing of 30,000 research papers:

Selection: Used OCRBench to identify a high-performance, cost-effective model (Chandra OCR).
Automation: Asked an LLM to write a script to process the papers.
Infrastructure: The agent performed "napkin math" to calculate the cost and VRAM requirements for the job.
Execution: The job was launched on Hugging Face infrastructure using "Buckets" (a high-performance, low-cost storage solution).

Notable Quotes

"Having an AI engineer at your fingertips."
"I absolutely recommend using [Hermes agent] with the open models... I asked GLM 5.1 to fix [a Slack integration error] and it fixed on its own."
"[The agent] calculates the amount of VRAM required... which to me is absolute sci-fi still to this day."

Conclusion

The ecosystem for open-source agents has matured significantly, moving from high-friction manual setups to automated, agent-driven workflows. By leveraging Hugging Face Hub’s infrastructure—specifically Skills, MCP, and Traces—developers can now build, train, and deploy sophisticated agents that manage their own compute resources and tool integrations, effectively democratizing high-level AI engineering.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video