GitHub Copilot Adds Context Caching and Auto Model Selection to Cut Costs

GitHub Copilot is rolling out two new features designed to reduce the number of tokens developers consume when using the AI coding assistant. Context caching and auto model selection aim to lower costs without sacrificing code quality. The updates are available now.

How context caching works

Context caching lets Copilot reuse information from earlier interactions. When a developer works on a file, the system stores the surrounding code context. Subsequent requests within the same session can reference that cached data instead of re-encoding it from scratch. That cuts down on redundant token usage, which directly affects billing for developers on usage-based plans. The feature is particularly helpful when working with large files or long conversations with the assistant.

Auto model selection in practice

Auto model selection automatically picks the best AI model for each task. Simple autocomplete suggestions might run on a smaller, faster model, while complex refactoring queries could call on a larger one. The system decides based on the request's structure and complexity. Developers don't have to manually choose a model or worry about overpaying for a heavyweight response when a lightweight one will do.

Why token efficiency matters

Token costs have been a growing concern for teams using large language models. Each API call consumes tokens — the units of text the model processes — and inefficient use can inflate monthly bills quickly. GitHub, which is owned by Microsoft, has been adding efficiency features to Copilot since its launch. The company hasn't disclosed specific savings from these updates, but even small per-request reductions add up for developers making thousands of calls daily. The goal is to make the tool more affordable for individual developers and teams.

What developers can expect

Both features are rolling out gradually to all Copilot users. No manual configuration is required — the changes happen server-side. They apply to Copilot's chat and code completion functions. For teams on usage-based pricing, the reduction in token consumption should show up in their billing dashboards. GitHub has not said whether similar efficiency measures will come to its other AI products, or if context caching and auto model selection will reach enterprise tiers. For now, developers using Copilot can expect these features to appear automatically, with the impact visible in their next billing cycle.

How context caching works

Auto model selection in practice

Why token efficiency matters

What developers can expect

Related Articles