GitHub Copilot Attack of the Context Window
Where We Left Off
In Part 1, we looked at the GitHub Copilot billing changes and the shift toward GitHub AI Credits.
The key point was simple:
The token meter is becoming visible.
That does not mean AI-assisted development is over.
It does mean developers and organisations need to understand what drives usage.
So now we need to talk about tokens, cached tokens, context windows, and how to avoid accidentally feeding the model half your repository when a sandwich-sized prompt would have done the job.
This is not a sermon.
I am not here to tell you there is One True Way to use AI tools. Different teams, codebases, constraints, and workflows will need different approaches.
But there are patterns that can help. Think of this as guidance for making your AI-assisted work easier to reason about, easier to review, and less likely to set fire to your credit allowance in the background.
Not “this is the way.”
More “this is a way, and it is probably worth keeping in your utility belt.”
Not All Tokens Are Equal
GitHub’s usage-based model accounts for different kinds of tokens:
| Token type | What it means |
|---|---|
| Input tokens | The content sent to the model: your prompt, selected files, chat history, tool context, repository snippets, instructions, and so on. |
| Output tokens | The content generated by the model: explanations, code, diffs, summaries, plans, comments, and reports. |
| Cached tokens | Context that can potentially be reused instead of being treated as entirely fresh input every time. |
This matters because many AI-assisted development workflows are repetitive.
The model may need the same project goal, architecture notes, coding conventions, file map, acceptance criteria, and task breakdown across multiple interactions.
If every prompt re-sends all of that as fresh context, usage can grow quickly.
A useful way to think about it is:
Fresh context = usually more expensive
Reusable context = potentially cheaper
Unbounded chat history = harder to reason about
Structured working context = easier to control
Cached tokens are not magic. You should not assume every repeated token will be cached, or that every interface will expose exactly how caching is being applied. Structuring your context does not guarantee cached-token treatment, but it does make repeated context easier to control, reason about, and potentially reuse.
From a cost-modelling perspective, it is useful to think about how much repeated context you are asking the model to process.
The goal is not to obsess over every token like a nervous protocol droid.
The goal is to understand the shape of your usage.
How Many Tokens Do You Actually Get?
This is where the change becomes more concrete.
For the table below, I am using a rough coding-assistant blend of:
80% input tokens
20% output tokens
These estimates are based on the listed GitHub model pricing at the time of writing and a simplified 80/20 input/output blend. Real usage will vary based on model choice, cached-token behaviour, tool calls, retries, generated output, and how much context is included in each interaction.
Agentic workflows may include repeated context, cached tokens, tool calls, retries, generated diffs, test output, and summarisation steps. Treat these numbers as directional estimates, not guarantees.
Approximate Total Blended Tokens Before Included Credits Run Out
| Model | Pro: 1,000 credits / $10 | Business: 1,900 credits / $19 | Pro+ / Enterprise: 3,900 credits / $39 | Business promo: 3,000 credits / $30 | Enterprise promo: 7,000 credits / $70 |
|---|---|---|---|---|---|
| GPT-5 mini | ~16.7M tokens | ~31.7M | ~65.0M | ~50.0M | ~116.7M |
| GPT-5.4 nano | ~24.4M | ~46.3M | ~95.1M | ~73.2M | ~170.7M |
| GPT-5.4 mini | ~6.7M | ~12.7M | ~26.0M | ~20.0M | ~46.7M |
| GPT-5.4 | ~2.0M | ~3.8M | ~7.8M | ~6.0M | ~14.0M |
| GPT-5.5 | ~1.0M | ~1.9M | ~3.9M | ~3.0M | ~7.0M |
| Claude Sonnet 4.6 | ~1.85M | ~3.52M | ~7.22M | ~5.56M | ~12.96M |
| Claude Opus 4.7 | ~1.11M | ~2.11M | ~4.33M | ~3.33M | ~7.78M |
| Gemini 2.5 Pro | ~3.33M | ~6.33M | ~13.0M | ~10.0M | ~23.3M |
| Grok Code Fast 1 | ~21.7M | ~41.3M | ~84.8M | ~65.2M | ~152.2M |
The point of this table is not to make everyone panic-count tokens.
Normal chat is probably fine for many developers.
The bigger risk is high-context usage: large repository analysis, repeated code review automation, big agent loops, and repeated “go analyse the whole thing” workflows.
That is where token usage stops feeling abstract and starts behaving like an actual meter.
What Does That Mean in Real Work?
Two prompts can have wildly different costs.
A tiny syntax question and a whole-repository migration task are not the same kind of interaction, even if both technically start with a prompt.
Example Usage Estimates
| Scenario | Token assumption | GPT-5.4 mini cost | Claude Sonnet 4.6 cost | GPT-5.5 cost |
|---|---|---|---|---|
| Small chat | 8k input / 1k output | ~1.05 credits | ~3.9 credits | ~7 credits |
| Medium coding chat | 25k input / 3k output | ~3.23 credits | ~12 credits | ~21.5 credits |
| Large agent loop | 100k input / 10k output | ~12 credits | ~45 credits | ~80 credits |
| Heavy repo task | 1M input / 100k output | ~120 credits | ~450 credits | ~800 credits |
How Far 1,000 Credits Goes
| Scenario | Pro 1,000 credits using GPT-5.4 mini | Pro 1,000 credits using Claude Sonnet 4.6 | Pro 1,000 credits using GPT-5.5 |
|---|---|---|---|
| Small chat | ~950 runs | ~256 runs | ~143 runs |
| Medium coding chat | ~310 runs | ~83 runs | ~47 runs |
| Large agent loop | ~83 runs | ~22 runs | ~12 runs |
| Heavy repo task | ~8 runs | ~2 runs | ~1 run |
This is why the subscription price alone does not tell the whole story.
The effective amount of high-end model usage included in that price depends heavily on model choice and workflow shape.
That does not make heavy usage wrong. It just makes it something worth understanding.
Patterns for Controlled Context Usage
The mental shift from “unlimited tokens” to “tokens cost money” does not mean less AI-assisted development. It means being more intentional about how you use context.
These patterns help engineers shape their workflows to be deliberate about what context gets sent to the model, when it gets sent, and how it gets reused. Different teams will find different patterns that work for them. The goal is not to use AI less, but to use it better.
Keep Context Bounded
One common trap with coding agents is treating them like a giant magical bucket.
We throw in the whole repository, a long rambling prompt, several vague requirements, and then ask the model to “just figure it out.”
Sometimes that works. Sometimes it is wildly inefficient.
A more controlled pattern is to give the model a stable, reusable working structure.
For example:
/spec.md
/tasks.md
/decisions.md
Or, for a slightly more formal project:
/docs/ai/spec.md
/docs/ai/tasks.md
/docs/ai/context.md
/docs/ai/decisions.md
This is not about adding ceremony for the sake of it.
The goal is to stop every interaction from becoming a brand-new excavation of the entire codebase.
Example: spec.md
# Feature Spec: Billing Usage Dashboard
## Goal
Build a dashboard that allows engineering managers to understand GitHub Copilot usage across teams.
## Users
- Engineering managers
- Platform engineers
- Finance stakeholders
## Functional Requirements
- Show monthly AI Credit usage by team
- Show usage by model
- Show usage by feature type where available
- Highlight teams trending toward budget limits
- Export monthly usage summary as CSV
## Non-Functional Requirements
- Must not expose individual developer prompts
- Must support future billing model changes
- Must be deployable into the existing internal platform
## Constraints
- Frontend: Next.js
- Backend: .NET
- Database: PostgreSQL
- Authentication: Entra ID
This gives the model stable context to refer back to. It also gives humans something to review, which is annoyingly useful in software engineering.
Example: tasks.md
# Tasks
## Phase 1: Data Model
- [ ] Define usage record schema
- [ ] Define team aggregation schema
- [ ] Define monthly credit allocation schema
## Phase 2: API
- [ ] Create endpoint for monthly usage summary
- [ ] Create endpoint for usage by model
- [ ] Create endpoint for usage trend warnings
## Phase 3: UI
- [ ] Build summary cards
- [ ] Build model usage table
- [ ] Build team trend chart
## Phase 4: Validation
- [ ] Add unit tests
- [ ] Add integration tests
- [ ] Validate with sample billing export
With this in place, your prompts can become smaller and more focused:
Using docs/ai/spec.md and docs/ai/tasks.md, implement Phase 1 only.
Do not start Phase 2 yet.
Update tasks.md when complete.
That is often better than:
Here is my entire idea again from scratch. Please build the first part of it.
Tiny bit less heroic. Much more economical.
Separate Stable Context from Active Work
Another useful pattern is to separate stable context from active task context.
| Context type | Example | Change frequency |
|---|---|---|
| Stable context | Architecture overview, coding standards, system boundaries, business goal | Low |
| Active task context | Current implementation step, failing test, target file, immediate change | High |
| Decision history | Architecture decisions, trade-offs, rejected options | Medium |
| Generated output | Code, summaries, diffs, test results | High |
This helps because not all context has the same lifespan.
Your system architecture probably should not change every five minutes. Your current failing test might.
A cleaner structure might look like this:
Stable:
- spec.md
- architecture.md
- coding-standards.md
Active:
- tasks.md
- current-issue.md
- failing-test.md
Historical:
- decisions.md
- changelog.md
Then you can prompt the model in a way that makes the boundaries clear:
Use spec.md, architecture.md, and coding-standards.md as stable context.
Use tasks.md to determine the next task.
Focus only on the next incomplete task.
After making changes, update tasks.md and decisions.md if needed.
That prompt is boring.
Boring is good.
Boring is cheap.
Boring is how we avoid flinging premium-model tokens into the Sarlacc pit.
Do Not Use Chat History as Your Only Project Memory
This is not to say chat history is useless.
It is useful. It is convenient. It is often where the thinking happens.
But long chat history is a poor substitute for project memory.
If every new interaction depends on the model remembering a huge winding conversation, you may be relying on accumulated context that includes old assumptions, abandoned decisions, stale requirements, and irrelevant side quests.
A more durable approach is:
Important context goes into files.
Temporary thinking stays in chat.
Decisions get recorded.
Tasks get updated.
The agent works from the repo, not from vibes.
This is especially useful for longer-running agentic work.
If the model is going to inspect files, make edits, run tests, and revise its plan, then the source of truth should ideally live inside the project workspace where it can be read consistently.
Not buried 37 messages back in a chat thread next to a half-finished rant about YAML.
Although, to be fair, the YAML rant was probably justified.
Reset Contexts Deliberately
One of the easiest ways to accidentally increase token usage is to let a chat or agent session grow forever.
Every time you continue working in the same conversation, the model may need to carry forward more prior context: previous prompts, previous answers, old assumptions, partial plans, abandoned ideas, generated code, explanations, and side discussions.
Some of that context is useful. Some of it is just space debris.
Long-running context can become expensive, noisy, and occasionally misleading. The model may keep paying attention to something that was true three hours ago but is no longer relevant.
A deliberate reset can help.
That does not mean “start from zero every time.” It means moving the important context into stable project files, then starting a fresh session from those files.
A reset prompt might look like this:
We are starting a fresh implementation session.
Use the following files as the source of truth:
- /.ai/spec.md
- /.ai/tasks.md
- /.ai/context.md
- /.ai/decisions.md
Ignore previous chat history.
Work only on the next incomplete task in tasks.md.
Keep the change small and update tasks.md when complete.
The point is to make the repo, not the chat thread, the durable memory.
Chat is for interaction. Files are for continuity.
Manage the Context Window Like a Budget
The context window is not just a technical limit.
Under token-based billing, it is also part of your cost profile.
A huge context window can be useful when the task genuinely requires it. But larger context is not automatically better. In practice, it can make the model slower, more expensive, and more distracted.
The risky pattern looks like this:
Here is the whole repository, the whole chat history, the entire architecture, every requirement, every error, and every idea I have had so far.
Please fix the problem.
That may feel efficient because it is one prompt.
It is often not efficient.
A more focused pattern is:
Here is the goal.
Here are the relevant files.
Here is the current task.
Here is the failing test or error.
Please solve this bounded step.
This is the shift:
Big one-shot prompt:
Large initial context + uncertain scope + high token spike + harder validation
Iterative workflow:
Bounded context + smaller task + easier review + lower waste
For real engineering work, smaller loops are often better:
Plan.
Implement one piece.
Validate.
Adjust.
Move to the next piece.
That gives you more control, makes errors easier to catch, and avoids paying for a giant pile of irrelevant context on every turn.
Again, this is not a universal rule.
Sometimes a large-context prompt is exactly the right tool. Architecture discovery, migration assessment, and cross-cutting analysis may need a wide lens.
The point is to make that choice deliberately, not by accident.
Avoid the Initial Token Spike
A common mistake is front-loading everything into the first prompt.
For example:
Read the whole codebase, understand the architecture, identify all problems, design the new solution, implement the changes, write tests, update docs, and explain everything.
That prompt is not just ambitious.
It can also be expensive.
It asks the model or agent to gather a huge amount of context before the task has a clear execution boundary. You are paying for exploration before you have shaped the work.
A staged version may be easier to control:
Step 1:
Read the project structure and summarise the relevant areas for the billing dashboard feature.
Do not make code changes.
Step 2:
Based on that summary, update /.ai/context.md with the relevant files and constraints.
Step 3:
Create an implementation plan in /.ai/tasks.md.
Step 4:
Implement only the first task.
This approach still uses tokens.
Of course it does. There is no Jedi mind trick for free inference.
But it uses them with intent.
You are letting the model build a smaller working map before asking it to act. You are also creating reusable context files that can carry forward into later sessions without relying on a sprawling chat transcript.
Compact Before You Continue
Before continuing a long-running session, it can be useful to ask the model to compact the important context into a file or short summary.
For example:
Before we continue, summarise the current state into /.ai/current-state.md.
Include:
- What has been completed
- What files were changed
- What decisions were made
- What still needs to be done
- Any known issues or failing tests
Keep it concise.
Then the next session can start from that compacted state:
Use /.ai/spec.md, /.ai/tasks.md, /.ai/decisions.md, and /.ai/current-state.md.
Continue from the next incomplete task.
Do not rely on previous chat history.
This gives you the benefits of continuity without dragging every previous token forward.
It also makes handover easier. Another developer, or another agent session, can pick up the work without needing to read the entire conversation.
Keep Iteration Loops Small
A good AI-assisted development loop should feel more like normal engineering than like a wish-granting spell.
A useful loop might be:
1. Define the goal.
2. Identify the relevant files.
3. Make a small change.
4. Run or request tests.
5. Review the result.
6. Update the task list.
7. Continue.
A less useful loop might be:
1. Ask for everything.
2. Wait.
3. Hope.
4. Receive 900 lines of changes.
5. Panic-review.
6. Ask the model to fix the fixes.
The second loop is how you can end up with high token usage and low confidence.
The first loop is more controlled, more reviewable, and usually easier to trust.
And boring, controlled, and trustworthy is exactly what we want when AI is touching production code.
Create an AI Working Folder
For teams using Copilot heavily, it may be worth adding a small AI working folder to repositories.
Something like:
/.ai/
spec.md
tasks.md
context.md
decisions.md
prompts.md
Where:
| File | Purpose |
|---|---|
spec.md | What we are building and why |
tasks.md | The current execution plan |
context.md | System overview, constraints, links to relevant files |
decisions.md | Important choices made during implementation |
prompts.md | Reusable prompt patterns for common workflows |
Example reusable prompt:
# Implement Next Task
Use the files in /.ai/ as project context.
Instructions:
- Read spec.md and tasks.md.
- Identify the next incomplete task.
- Implement only that task.
- Keep the change as small as practical.
- Run or suggest relevant tests.
- Update tasks.md when complete.
- Add to decisions.md only if an important implementation choice was made.
This is not something every repository needs.
A tiny utility library probably does not need an AI command centre. Please do not turn a three-file repo into the Jedi Archives.
But for larger work, this kind of structure can reduce repeated prompting and make the work easier to review.
It also helps move AI-assisted development from individual productivity hack toward team-level engineering practice.
Part 2 Takeaway
The practical lesson is simple:
Context is no longer invisible.
It has a cost profile.
That does not mean developers should be afraid of using context. Good context is what makes these tools useful.
But it does mean we should be more deliberate with it.
Use stable files where they help. Keep prompts bounded when you can. Reset deliberately. Compact long-running work. Prefer iterative implementation when the task allows it.
None of this is a universal law.
Different teams will find different patterns that work for them.
The goal is not to use AI less.
The goal is to use it better.
In Part 3, we will look at the agent in the room: large agentic workflows, why they need to be treated differently, and why some workloads may belong outside the normal Copilot billing context.
Sources
-
GitHub Blog — GitHub Copilot is moving to usage-based billing: https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/
-
GitHub Docs — Usage-based billing for individuals: https://docs.github.com/en/copilot/concepts/billing/usage-based-billing-for-individuals
-
GitHub Docs — Usage-based billing for organizations and enterprises: https://docs.github.com/en/copilot/concepts/billing/usage-based-billing-for-organizations-and-enterprises
-
GitHub Docs — Models and pricing for GitHub Copilot: https://docs.github.com/en/copilot/reference/copilot-billing/models-and-pricing
-
GitHub Docs — Preparing for your move to usage-based billing: https://docs.github.com/en/copilot/how-tos/manage-and-track-spending/prepare-for-your-move-to-usage-based-billing
-
GitHub Docs — Preparing your organization for usage-based billing: https://docs.github.com/en/copilot/how-tos/manage-and-track-spending/prepare-for-usage-based-billing