Getting More from Every Copilot Interaction
By GitHub
Key Concepts
- UBB (Usage-Based Billing): A shift from unlimited models to a metered system where costs are determined by token consumption (input, output, context, and tool overhead).
- AI Credits: The new unit of currency for GitHub Copilot usage, replacing the previous "premium request" model.
- Agentic Workflows: Complex tasks where the AI acts as an agent, often consuming significantly more tokens than simple code completions.
- Context Window: The amount of information (tokens) the model can process at once; excessive context leads to higher costs and potential "context poisoning."
- MCP (Model Context Protocol) Servers: Tools that provide external data to the AI; while useful, they can be "bloated" and consume excessive tokens if not managed correctly.
- Caveman Speak: A prompting technique designed to minimize output tokens by removing pleasantries and unnecessary words.
1. The Shift to Usage-Based Billing (UBB)
Starting June 1st, GitHub Copilot for Teams and Enterprise will transition to a usage-based billing model.
- The Goal: The objective is not to discourage the use of Copilot, but to encourage efficient habits and reduce waste.
- What Stays Free: Basic features like inline completions and "next edit" suggestions remain outside the new credit system.
- What Consumes Credits: Premium models, large context windows, advanced completions, and agentic workflows.
- Budgeting: Administrators can set budgets at the organization or user level. When a limit is reached, chat functionality may be disabled, though basic autocomplete features will remain active.
2. Optimization Strategies & Methodologies
The speakers emphasized that "throwing the most powerful model at every task" is an inefficient habit.
- Model Selection: Use frontier models (like Opus) only for complex reasoning tasks. For standard coding, documentation, or simple tasks, use lighter, cheaper models (e.g., GPT-3.5/4o-mini). The cost difference can be up to 24x.
- Auto-Select Mode: GitHub’s "Auto-select" feature has been improved to intelligently match the model to the task complexity, often selecting the most cost-effective model that still meets performance requirements.
- Prompt Engineering:
- Caveman Speak: Use concise, direct language to reduce output tokens.
- Language: Stick to English, as non-English languages can consume up to 30% more tokens.
- Fresh Starts: Start new chat sessions when a conversation becomes too long to avoid "context poisoning" and unnecessary token consumption from accumulated history.
- Tool Management: Disable unused MCP servers. If a tool is needed, scope it to specific repositories rather than enabling it globally.
3. Technical Frameworks for Efficiency
- Custom Instructions: Use the
github/instructionsfolder to provide specific, condensed context. Use theapply tofeature to scope instructions to specific files or folders, preventing the model from processing irrelevant data. - Code Act Plugin: A tool for the CLI that collapses multiple tool calls into a single sequence, reducing the back-and-forth overhead.
- Token Killers: Use proxy hooks that compress tool calls without losing semantic meaning.
- Budgeting Tools: Administrators should use the "Billing & Licensing" dashboard to monitor usage trends, identify "power users," and set granular budget overrides.
4. Key Arguments and Perspectives
- Efficiency vs. Quality: Marco argued that using the most expensive model for simple tasks often adds unnecessary complexity and does not improve the quality of the output.
- The "Shiny Object" Syndrome: Developers often default to the newest, most powerful model (like Opus) simply because it is the "latest," even when it is overkill for the task.
- Strategic Resource Allocation: Enterprises can segment users into different organizations to restrict access to expensive models for specific groups, ensuring budget control.
5. Notable Quotes
- "The idea here is not to panic... the idea here is that we need to optimize for right. We're going to stop waiting for interactions and we're going to work in ways that help efficiently make a real difference." — Andrea (Senior Developer Advocate)
- "If you spend too many thoughts on a simple problem, you might lose a lot of time and you might not even get a good result." — Marco (on tuning reasoning effort)
- "The genie is not going back in the bottle... we just got to be smarter about what we use it for and how we're using it." — Andrea
6. Synthesis and Conclusion
The transition to usage-based billing is a necessary evolution for AI-assisted development. By moving away from "unlimited" habits and adopting a more strategic approach—selecting the right model for the task, pruning unnecessary context, and managing tool overhead—developers can maintain high productivity without incurring excessive costs. The primary takeaway is to treat AI credits as a finite resource, prioritize efficiency in prompt design, and utilize the provided dashboard tools to monitor and control spending proactively.
Chat with this Video
AI-PoweredLoad the transcript when you're ready to chat so the initial page stays lighter.