OpenAI just destroyed AI coding… Codex 2.0

Key Concepts

Codex: OpenAI's AI coding agent built on the GPT-5 High model.
GPT-5 High: A family of models with high reasoning effort, allowing for extended thinking time on tasks.
Cloth Code: Anthropic's AI coding tool, a competitor to Codex.
Synchronous vs. Asynchronous Coding: Synchronous coding involves the AI agent working alongside the user, while asynchronous coding involves the AI agent working separately in the cloud.
Reasoning Effort: A parameter in GPT-5 High that determines the amount of thinking time allocated to a task (Minimal, Low, Medium, High).
Test Time Compute: The ability of a model to reason for an extended period during task execution.
Prompt Compression: Reducing the length of a prompt without losing essential context.
Tool Budget: Limiting the number of tool calls an AI agent can make to prevent overthinking.

Codex: The Best AI Coding Agent

The video argues that OpenAI's Codex, particularly with its hidden 2.0 update, is currently the best AI coding agent available, surpassing tools like Cloth Code, Cursor, and others. The speaker emphasizes that Codex isn't a single product but rather a suite of tools:

CH GBD Codex: An asynchronous AI agent that lives in the cloud.
Codex CLI: OpenAI's version of Cloth Code, significantly improved recently.
Codex Extension: A hidden gem that can be used from any IDE, directly competing with Cursor.

The key to Codex's power lies in its foundation: GPT-5 High. This model family dedicates a significant portion of its tokens (80%) to hidden reasoning, allowing it to think for extended periods (5+ minutes) on a single task. This "test time compute" enables Codex to perform better on complex programming tasks compared to models like Opus 4.1.

Understanding GPT-5 High and Reasoning Effort

The video delves into the inner workings of GPT-5 High, highlighting the importance of the "reasoning effort" parameter. This parameter has four levels:

Minimal: No reasoning, considered worse than GPT-4.1.
Low: 0.2 ratio of tokens dedicated to reasoning.
Medium: 0.5 ratio (default).
High: 0.8 ratio, the most effective for coding tasks.

The speaker strongly recommends using only the "High" reasoning effort for coding, as lower levels produce inferior results.

Synchronous vs. Asynchronous Coding with Codex

The video explains the two paradigms of AI-assisted coding: synchronous and asynchronous. Codex excels in both:

Asynchronous: The CIGBD Codex agent works separately on tasks in the cloud.
Synchronous: The Codex CLI and extension work alongside the user in real-time.

This versatility makes Codex a comprehensive solution for various AI programming needs.

Codex for Pull Request Reviews

A recent addition to Codex's capabilities is its ability to review pull requests. This feature acts as a "Cursor backbot on steroids," identifying bugs that human reviewers might miss. The speaker emphasizes that LLMs and humans think differently, allowing Codex to catch errors that are obvious to one but not the other.

Setting up Codex for pull request reviews involves enabling the feature in the CH GBD Codex settings and granting the necessary permissions to access GitHub repositories.

Building a Prompt Compression Tool with Codex: A Step-by-Step Example

The video provides a practical example of using Codex to build a prompt compression tool. The goal is to reduce the length of long prompts (e.g., 70,000 tokens) by 30-40% without losing essential context. The process involves the following steps:

Chunking: Splitting the long prompt into smaller chunks using a markdown-aware chunker.
Rating: Rating each chunk's relevance using GPT-4.1.
Target Calculation: Computing the global token target based on the desired reduction.
Compression: Compressing the least relevant chunks.
Stitching: Stitching the compressed chunks back together.
Output: Writing the resulting compressed prompt and relevant statistics.

The speaker demonstrates the use of both the Codex extension and the CLI throughout the development process.

Installation and Setup

To install Codex, the following command is used:

npm install -g @openai/codex

The Codex extension can then be installed from within an IDE like Cursor. The reasoning effort should be set to "High" for optimal performance.

Prompt Engineering Tips from OpenAI

The video highlights six prompt engineering tips from OpenAI for using GPT-5 for coding:

Be Precise: Avoid vague or conflicting instructions.
Use the Right Reasoning Effort: Stick to "High" for coding tasks.
Use XML Tags: Delineate different sections of prompts to improve understanding.
Avoid Overly Firm Language: Overly strict prompting can lead to overthinking and excessive tool calls.
Give Room for Planning and Self-Reflection: Split tasks into smaller steps and use tags like "self-reflection" to improve reasoning.
Control the Eagerness: Set boundaries on scope and execution, potentially using a "tool budget."

Challenges and Solutions

The speaker encounters several challenges during the development process, including:

Overly Complex Initial Implementation: The initial implementation of the chunking step was unnecessarily complex (300 lines of code). This was addressed by simplifying the step-by-step plan and emphasizing the need for concise code.
Lack of Observability: The initial script lacked print statements, making it difficult to track progress and debug. This was resolved by adding print statements to provide more information about the chunking and rating processes.
Incorrect Chunking Logic: The initial chunking logic resulted in either too many or too few chunks. This was corrected by adjusting the chunking thresholds to be between 2,000 and 4,000 tokens.
Missing User Input: The initial design did not ask the user for the intent or desired token reduction. This was addressed by adding prompts to gather this information from the user.

Comparison with Cloth Code

The video compares Codex with Cloth Code, highlighting the strengths of each tool:

Codex: More powerful for complex programming tasks, error fixing, and feature implementation.
Cloth Code: Better at explaining concepts, teaching, providing second opinions, and quick UI changes.

The speaker's preferred workflow involves using Codex for the heavy lifting and Cloth Code for consultation and quick iterations.

Vectal: An AI-Powered Task Manager

The video briefly mentions Vectal, an AI-powered task manager developed by the speaker. Vectal allows users to:

Create and pin tasks.
Use chat agents to get help with tasks.
Generate steps for completing tasks.
Conduct automated AI research on specific topics.
Identify the highest leverage tasks.

Vectal leverages user context to provide personalized assistance and automate task management.

Conclusion

The video concludes that Codex, powered by GPT-5 High, is currently the best AI coding agent available. Its ability to reason for extended periods, combined with its versatility in both synchronous and asynchronous coding, makes it a powerful tool for developers. The speaker emphasizes the importance of prompt engineering, continuous testing, and a human-in-the-loop approach to effectively leverage AI for coding. The video also highlights the value of combining Codex with other AI tools like Cloth Code to create a comprehensive AI-assisted development workflow. The speaker shares his personal experience using these tools to build his AI startup, Vectal, and encourages viewers to explore the New Society community for further learning and upskilling in AI. The GitHub repository for the prompt compressor tool is provided for viewers to use and contribute to.