GitHub Copilot Return of the Agentic Workflow


Where We Left Off

In Part 1, we looked at the GitHub Copilot billing changes.

In Part 2, we looked at tokens, cached tokens, context windows, and practical ways to avoid unnecessary usage.

Now we need to talk about the bigger issue.

The agent in the room.

Because there is a big difference between:

Help me understand this method.

And:

Analyse this entire legacy application, create a migration plan, modify the code, run tests, fix the failures, document the changes, and generate a client-facing report.

From a certain point of view, both are AI-assisted development.

From another point of view, one is a developer asking for help, and the other is an automated engineering workload wearing a very convincing robe.

That distinction matters.


The Agent in the Room

Most of the billing discussion around Copilot will naturally focus on normal developer interactions: chat, code explanations, refactoring help, pull request summaries, CLI commands, and coding assistance inside the IDE.

That is important.

But it is not the whole story.

The real token burn risk is large agentic workflows.

By that, I mean workflows where an AI system performs multi-step work across a codebase:

Analyse this repository.
Build a migration plan.
Inspect the dependency graph.
Identify risky areas.
Modify the code.
Run tests.
Read the failures.
Fix the failures.
Update documentation.
Generate a report.
Repeat until complete.

That type of workflow can be incredibly valuable.

It can also be incredibly expensive.

More importantly, it may not belong inside the same billing and governance model as day-to-day developer assistance.

This is not because Copilot is the wrong tool.

It is because the question changes once the workload changes.


A Certain Point of View

Copilot is a developer productivity tool.

From one point of view, it is exactly where many AI-assisted coding tasks should happen. It sits in the editor, the terminal, pull requests, and the developer flow. It understands repository context, helps with small changes, and keeps assistance close to where the work is happening.

That is powerful.

But from another point of view, a large agentic workflow is not just “a bigger Copilot chat.”

It may be a structured compute workload with a project budget, a defined input set, a workflow plan, model routing, audit requirements, retry behaviour, approval gates, output artefacts, and measurable delivery outcomes.

At that point, the problem is less about developer assistance and more about workload orchestration.

Same galaxy.

Different mission.


Not All AI Usage Belongs in the Same Channel

This is the part that can get uncomfortable.

When people say “we use AI for development,” they can mean very different things.

They might mean:

I use Copilot Chat to explain unfamiliar code.

They might also mean:

We run a multi-agent workflow that analyses a legacy system, produces a modernisation assessment, generates tickets, proposes code changes, and creates an executive report.

Both are valid.

They are not the same kind of work.

A developer assistance workflow can reasonably live in Copilot.

An automated engineering workload may need its own architecture, budget, observability, and provider integration.

That does not mean avoiding Copilot.

It means being intentional about where each tool fits.

Or, to put it another way:

Use the right source of AI power for the job.

The Force is useful.

A thermal exhaust port is also useful.

Please do not confuse them.


Treat Large Agentic Workflows as Exceptions

For most developers, the goal should not be to bypass Copilot billing.

The goal should be to use Copilot well.

But large, repeatable, agentic workflows should be treated as exceptions that need explicit design.

Examples include:

WorkflowWhy it may need exception handling
Whole-repo modernisation assessmentLarge context, repeated analysis, high token volume
Automated dependency migrationIterative code changes, test loops, repeated failures
Large-scale test generationHigh output tokens and repeated file inspection
Legacy application analysisBroad codebase exploration and summarisation
Multi-agent architecture reviewParallel agents can multiply token usage quickly
AI-generated documentation refreshMay touch large parts of the repository
Security remediation campaignsPotentially high-value, but high-volume

These workloads can absolutely be worth doing.

But they should be planned as project workloads, not hidden inside someone’s monthly Copilot allowance.

Otherwise, a team can accidentally burn through shared AI Credits doing work that should have had its own project budget.

That is not good governance.

It is also not fair to the developers using Copilot for normal day-to-day work.


Build Dedicated Harnesses for Heavy Agentic Work

For serious agentic engineering workflows, it may make sense to build or use a dedicated harness that connects directly to approved model providers. That could mean Azure OpenAI or Azure AI Foundry, the OpenAI API, Anthropic, AWS Bedrock, Google Vertex AI, or another approved provider already accepted inside your organisation.

The point is not “Copilot bad, direct API good.”

That would be a lazy take, and frankly the galaxy already has enough of those.

The point is also not to bypass governance or pretend the cost disappears somewhere else. This is about moving high-volume automated work into a place where cost, data flow, approvals, and observability are explicit.

The point is control.

A dedicated harness gives you more control over the things that matter once the workflow becomes large, repeatable, and expensive. You can attach the cost to a project instead of an individual developer pool. You can track usage at the provider level. You can route simpler tasks to cheaper models and reserve stronger models for work that genuinely needs them. You can decide exactly what files and metadata are sent, reuse stable context deliberately, keep audit logs, record run history, introduce approval gates, control parallelism, and make deliberate choices about data boundaries, regions, retention, and network paths.

That matters a lot for consulting, platform engineering, and modernisation work. A single assessment might involve a large codebase, multiple passes of analysis, generated findings, proposed remediation steps, test runs, documentation, and report generation. At that point, you do not want the cost model to be mysterious.

You want to know what the run cost, which model was used, how much context was sent, how much output was generated, how many retries happened, which steps were valuable, and which steps were wasteful.

That is engineering telemetry.

And if we are going to treat AI as part of the software delivery system, it needs telemetry like the rest of the software delivery system.


Copilot for Developers, Harnesses for Workloads

A useful mental model is that Copilot is for individual and team developer assistance, while dedicated agent harnesses are for structured, repeatable, high-volume engineering workloads.

There will be overlap, of course. A developer might use Copilot to help design the workflow, write the harness, review the output, or manually perform parts of the implementation. Copilot can still be part of the story.

But the big expensive loop should not necessarily happen inside an interactive chat session.

Good Copilot use might be asking for help understanding a failing test, explaining a service registration, generating a small refactor, summarising a pull request, or writing the next task from a spec.

A better fit for a dedicated harness might be analysing a 500k-line legacy app, running a full modernisation assessment, generating a dependency migration plan across 40 projects, running five agents in parallel to compare remediation strategies, or producing a client-facing technical assessment report from repository analysis.

This distinction matters because the second category is not casual usage. It is compute work, and compute work needs a budget.


Avoid Creating Shadow AI Infrastructure

There is a governance trap here too.

When teams realise large agentic workflows may be expensive through Copilot, some people will be tempted to build unofficial scripts, use personal API keys, or run experiments against whatever model endpoint they can access.

That is how you end up with shadow AI infrastructure.

The answer should not be to have everyone build their own agent harness with their own keys. The better answer is to provide an approved path for heavy AI engineering workloads.

That approved path should include clear guidance on approved providers, budget ownership, logging, observability, security review, data handling, prompt and output retention, human approval gates, and when to use Copilot versus a dedicated harness.

This keeps the organisation from accidentally turning AI adoption into a mess of personal tokens, mystery endpoints, unmanaged data flow, and surprise invoices.

Because nothing says “enterprise transformation” like discovering your modernisation strategy is running through someone’s forgotten test key on a Friday afternoon.


Practical Guidance: How to Avoid Burning the Token Farm

The practical guidance is not especially dramatic. Use cheaper models where appropriate for quick questions, syntax help, small refactors, and explanations. Reserve frontier models for genuinely complex reasoning, architecture trade-offs, gnarly debugging, and high-value agentic work. Avoid repeatedly attaching huge context when a smaller targeted prompt would work.

For larger work, break tasks into stages: understand, plan, modify, validate. Reset long-running contexts deliberately. Keep stable project information in files rather than relying only on chat history. Watch for agent loops that retry, re-read, or regenerate too much.

For organisations, the next step is visibility. Monitor usage by team, repository, model, and workflow type. Set budgets carefully, because a hard cap can protect spend but can also interrupt useful work. Most importantly, treat large agentic workflows as project workloads with their own cost model.

This is not about making developers scared of using AI tools. That would be the wrong outcome.

The goal is to stop treating AI usage like an invisible free resource.


Final Takeaway: The Token Meter Has Arrived

As always, we will adapt.

This is not the end of GitHub Copilot. It is not the end of AI development tools. It is not the moment where we all dramatically throw our keyboards into the sea and return to hand-carving Java enterprise applications from stone tablets.

It is a reset.

For the last few years, a lot of AI usage has felt abstract. We asked questions, generated code, reviewed pull requests, summarised logs, explored repositories, and kicked off increasingly ambitious agentic workflows without always needing to think too hard about the underlying cost.

That era is changing.

The tokens were always being consumed. The GPUs were always burning. The cost was always real. It was just hidden behind a simpler pricing model.

Now that usage is becoming more visible, we have a chance to ask better questions. Do I need the most expensive model for this task? Am I sending too much context? Could this be done in smaller, more focused iterations? Should this workflow live inside Copilot, or should it be a dedicated agentic workload with its own budget and controls?

Most importantly: am I using AI deliberately, or am I just throwing tokens at ambiguity?

The goal should not be to use AI less. The goal should be to use it better.

That means better context, better task boundaries, better model selection, better observability, and better understanding of when a cheap model is enough, when a frontier model is justified, and when a large agentic process needs to be treated as a real engineering workload.

This is not a collapse. It is a correction. And honestly, probably a necessary one.

The next stage of AI-assisted development will not just be about who can generate the most code the fastest. It will be about who can use these tools effectively, sustainably, and economically without turning every repository into a token bonfire with syntax highlighting.

So yes, take a breath.

It is going to be ok.

The token meter has arrived.

Now we learn how to build with it.


Sources