Back to all videos

Your Coding Agent Should Do AI System Engineering — Ben Burtenshaw, Hugging Face

By AI Engineer

AI Systems Engineering Machine Learning Engineering GPU Optimization

Share:

Key Concepts

Coding Agents: Autonomous AI systems capable of writing, testing, and optimizing code.
AI Systems Engineering: The practice of building and optimizing the underlying infrastructure (kernels, training pipelines) for AI models.
CUDA Kernels: Specialized code written to execute mathematical operations directly on NVIDIA GPUs.
Arithmetic Intensity: The ratio of floating-point operations to memory access; increasing this is key to GPU efficiency.
Skills: File-based context provided to agents to enable few-shot learning and task execution.
AutoLab: A multi-agent framework for automated research and model training optimization.
Trackio: An open-source, data-centric dashboard for monitoring agentic workflows.

1. The Shift to AI Systems Engineering

Ben from Hugging Face argues that as coding agents become mainstream, engineers must move "closer to the silicon" to remain relevant. The goal is to use agents to solve complex systems engineering problems, such as kernel optimization and automated research, rather than just high-level application development.

2. Boss 1: Writing and Optimizing CUDA Kernels

Writing custom kernels is traditionally difficult due to hardware-specific requirements and complex installation matrices.

The Bottleneck: Contrary to popular belief, compute (FLOPs) is rarely the bottleneck; memory bandwidth is. Modern GPUs (e.g., H100) are often idle while waiting for data.
The Solution: Custom kernels like Flash Attention increase "arithmetic intensity," ensuring the GPU performs more calculations per memory read/write cycle.
Methodology:
- Use Skills (file-based context) to provide agents with examples of kernel writing and benchmarking.
- Utilize the Hugging Face kernels library, which acts as a repository for hardware-specific kernels.
- Result: A 94% speedup was achieved for Qwen 3 8B on H100 hardware by using agent-generated kernels.
Tooling: The upskill library allows developers to evaluate and compare different LLMs for generating these kernels, optimizing for both accuracy and token cost.

3. Boss 2: Autonomous Fine-Tuning

Agents can now handle the end-to-end fine-tuning process. By using the Hugging Face CLI and integrated GPU compute on the Hub, agents can take a prompt (e.g., "fine-tune Qwen 3 6B on this dataset") and execute the training run autonomously. This is increasingly cost-effective through optimized libraries like Onslaught.

4. Boss 3: AutoLab (Multi-Agent Research)

Inspired by Andre Karpathy’s Auto Research, this framework distributes the research process across specialized agents to improve training efficiency:

Researcher: Uses the HF papers CLI to scout literature and formulate hypotheses.
Planner: Maintains a queue of experiments and manages the job lifecycle.
Workers: Implement changes to training scripts (e.g., parameter tuning, architectural tweaks) and submit patches.
Reporter: Monitors metrics via Trackio, an open-source dashboard that uses a Parquet-based data layer, allowing agents to query and visualize progress without proprietary API constraints.

Workflow:

Agents operate within a Git repository.
Experiments are run on Hugging Face compute.
Results are logged to Trackio, where the data layer allows for custom visualizations (e.g., Gantt charts) to track agent performance over time.

5. Key Arguments and Perspectives

Expose, Don't Just Abstract: Ben argues that while APIs are useful, they can become "ceilings." For agents to be truly effective, we must expose the underlying primitives (like raw data stores and hardware-specific configurations) rather than hiding them behind opaque abstractions.
The Hub as an Agentic Platform: The Hugging Face Hub is positioned as the primary infrastructure for these workloads, providing the necessary storage, tracking, and compute to scale agentic engineering.
Standardization: To enable these workflows, the industry needs standard repository structures that agents can easily navigate and manipulate.

6. Synthesis and Conclusion

The future of AI engineering lies in agentic systems that manage their own infrastructure. By moving from zero-shot tasks to few-shot "Skills," and by utilizing open-source, data-centric tools like Trackio and the Hugging Face Hub, engineers can automate the most grueling parts of AI development—kernel optimization and iterative research. The main takeaway is that agents are ready to handle "hard" engineering, provided they are given access to open, transparent, and well-structured primitives.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video