First impressions of GPT-5.5 from Will Koh

Key Concepts

GPT-5.5: The latest iteration of OpenAI’s language model, characterized by enhanced reasoning and autonomous task execution.
Context Window Compaction: A process where the model manages long-running tasks by condensing information; GPT-5.5 demonstrates superior retention of goals and findings across these cycles.
Autonomous Tool Usage: The model’s ability to independently select and utilize external tools (databases, telemetry) without explicit step-by-step prompting.
Perfect Extraction Rate: A benchmark metric used by Ramp to measure the accuracy of automated data extraction from complex financial documents.
Inspect: A proprietary testing harness developed by Ramp to evaluate AI model performance in real-world engineering workflows.

1. Evolution of AI in Software Engineering

The conversation highlights a significant shift in AI-assisted coding over the past two years. The transition has moved from simple "tab completion" (predictive text) to the current state where AI models can handle ambiguous, high-level tasks. Unlike previous models that required granular, instruction-heavy prompts, GPT-5.5 exhibits a higher level of "intent understanding," allowing engineers to assign vague objectives that the model then decomposes and executes autonomously.

2. Autonomous Problem Solving and Tool Integration

A core advancement in GPT-5.5 is its ability to navigate complex codebases and utilize integrated tools without constant human intervention.

Methodology: When integrated into Ramp’s "Inspect" harness, the model was granted access to internal databases and telemetry tools.
Key Observation: Previous models often required the engineer to specify which tool to use or frequently selected the incorrect tool. GPT-5.5, conversely, discovers novel ways to leverage these tools to solve problems, effectively "researching" the codebase to identify the most efficient path to a solution.

3. Context Management and Compaction

A notable technical improvement discussed is how the model handles long-running tasks that exceed its context window.

The Challenge: Large tasks often require "compaction," where the model must summarize or condense its history to continue processing.
The Breakthrough: In previous models, information loss during compaction was common. With GPT-5.5, the model maintains continuity, successfully passing relevant details, findings, and the overarching goal from one compaction cycle to the next. This allows the model to operate as if the context window limit had not been reached.

4. Performance Benchmarking: The "Perfect Extraction" Case Study

Ramp utilizes specific benchmarks to measure the model's efficacy in financial services, specifically regarding the extraction of data from large customer financial documents.

Metric: "Perfect Extraction Rate" (the frequency at which the model extracts all required information correctly with zero human touch).
Result: GPT-5.5 has achieved the highest performance rate in this benchmark to date. This is described as a "magical experience" for end-users, as it significantly reduces the manual labor required to process complex financial documentation.

5. Synthesis and Conclusion

The discussion underscores a paradigm shift in AI engineering: the move from "instruction-following" to "goal-oriented autonomy." GPT-5.5 represents a leap forward in three specific areas:

Reasoning: The ability to interpret ambiguous instructions and self-direct research within a codebase.
Tool Proficiency: The capacity to independently identify and utilize external resources to solve novel problems.
Memory Persistence: Improved handling of long-context tasks through seamless information retention during compaction.

The primary takeaway is that GPT-5.5 reduces the "intervention tax" on engineers, allowing them to delegate complex, multi-step workflows to the model with higher confidence in the final output.