Back to all videos

Introducing GPT-5.5 with Databricks

By OpenAI

Agent Harness Office QA Custom Parsing and Databricks' Agent Supervisor API.

Share:

Key Concepts

GPT-5.5: The latest iteration of the model, noted for significant improvements in parsing and error reduction.
Agent Harness: A testing environment used to evaluate how AI agents perform in complex, multi-step workflows.
Office QA: A benchmark proxy used by Databricks to simulate real-world customer document processing workflows.
Custom Parsing: The process of extracting structured data from unstructured, "messy" documents.
Agent Supervisor API: A Databricks product designed to orchestrate and manage custom agent workflows.

Performance Metrics and Benchmarking

GPT-5.5 demonstrates a substantial leap in performance compared to its predecessor, GPT-5.4. In the "agent harness" setting, GPT-5.5 achieves a 46% reduction in errors. Notably, it is the only model currently capable of surpassing the 50% accuracy threshold on the benchmark.

Furthermore, Codeex with GPT-5.5 has established itself as the current state-of-the-art (SOTA) solution among all available agents and models. This performance is attributed to a "step-wise function lift" in capability, particularly regarding the model's ability to handle complex data extraction.

The Role of Parsing Quality

The primary technical differentiator between GPT-5.4 and GPT-5.5 is parsing quality. Previous iterations (5.4 and earlier) struggled with the accurate extraction of numerical data from documents. GPT-5.5 has overcome these limitations, allowing for precise digit recognition and data interpretation. This improvement is critical for Databricks, as their customers frequently present "messy" or unstructured documents that require high-fidelity parsing to be useful for downstream tasks.

Real-World Applications: Databricks Workflows

Databricks utilizes the Office QA benchmark as a proxy to predict how AI will perform in actual customer environments. The integration of GPT-5.5 into the Databricks ecosystem is expected to transform knowledge-level tasks through:

Multi-Agent Setups: Utilizing specialized agents within a harness to perform granular parsing tasks.
Agent Bricks and Agent Supervisor API: These tools allow customers to build and deploy custom agent workflows. By positioning GPT-5.5 as the "supervisor" for these workflows, Databricks enables a higher level of automation and reliability for complex business processes.

Synthesis and Conclusion

The transition from GPT-5.4 to GPT-5.5 represents a critical advancement in AI-driven document processing. By achieving a 46% error reduction and setting a new standard in the agent harness benchmark, GPT-5.5 provides the necessary reliability for enterprise-grade applications. The combination of superior parsing capabilities and the orchestration power of the Agent Supervisor API allows Databricks to offer a robust framework for automating complex, knowledge-intensive customer workflows. The "step-wise" improvement in model intelligence is the key catalyst for unlocking these advanced automation capabilities.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video