The standardisation work for a new service pipeline used to take one of our senior engineers about three hours of focused time. GitHub Actions workflow, build matrix, secret wiring, the linting and test gates, the deploy stage that matches every other service we run, the bits of YAML that always get copy-pasted from a sibling repo and then patched up after the first failed run. Not difficult work, but the kind where a tired engineer makes one small typo at four o’clock and spends the next forty minutes tracing why nothing is firing.
A few weeks ago we put a new internal supplier-adapter-v2 service through the same process. The brief to Claude Code went out at half past nine. We pointed it at our pipeline conventions document, the existing supplier-adapter-v1 repo, the deploy targets in our infrastructure-as-code, and the security-scanning policy we apply on every public-facing service. Then we wrote the brief. Twenty-five minutes later we had a merged PR, all checks green, supplier-adapter-v2 building and deploying through the same pipeline shape as the rest of the estate. The engineer who would normally have done the three hours of YAML-wrangling reviewed the diff, asked for two changes, watched them go in, and approved.
That is the kind of swing the past twelve months of agentic AI tooling has produced for our delivery, but only after we changed how we work with the tool. We run three or four of those new-service pipelines a month across our client estate. The quarterly arithmetic comes out at roughly thirty hours of senior engineering time recovered from one workflow alone. The interesting part is what the rest of the team had to do, and stop doing, to make a saving like that repeatable.
Why prompt engineering is the wrong frame for agentic AI
The phrase “prompt engineering” was useful when the only thing we could do with a model was send it a question and read the answer. As AI-assisted development matured, the role evolved far beyond basic prompting into workflow orchestration, context management, and AI collaboration—similar to the shift explained in how to become an AI prompt engineer. With an agent that can read your repo, write code into a branch, and open a pull request, the frame stops fitting. The work shifts from engineering a question to commissioning a piece of work.
We have argued in print that managing an AI agent requires many of the same skills as managing a human being. That phrasing reflects the actual practice we run on a Tuesday morning. The mental shift is from “this is a chatbot I query” to “this is a junior engineer I delegate to.” A chatbot you ask a question of and grade the answer. A junior engineer you brief, point at the relevant prior art, set loose, review the output of, and feed back to. The work the senior puts in is upstream and downstream of the typing.
Once that frame is in your head, the team-level practice falls out of it. You stop trying to write the perfect prompt as a one-shot exercise and start thinking about what the agent actually needs to know to do the job. You stop being surprised when an under-briefed task comes back wrong.
Context first, prompt second
The first thing we changed in our daily practice was the order of operations. We stopped opening a fresh terminal and typing a request, and started by gathering context.
Before we ask Claude Code to add a feature, refactor a service, or build a pipeline, we point it at the relevant directories, the conventions document, the existing equivalent service, the data contract it has to honour. Sometimes that is a single file. More often it is a half-dozen, plus a paragraph of plain English about why the work matters and what the boundaries are. Treat the prompt as a brief. The same care a senior engineer would put into a new contractor’s first task on a Monday morning is roughly the right amount of care.
We use a perspective test internally. Before sending the brief, the engineer reads it back and asks themselves: “If I were asked to perform this task given only the context I’ve just provided, could I succeed?” If the honest answer is no, the brief is not finished. That single check has cut more failed runs than any change to the model, the IDE plugin, or the tool chain. The cost of the check is about ninety seconds. The cost of skipping it is a wasted hour of generated code that nobody can use because the agent did not know which version of an internal library to import.
Junior engineers complain when their seniors hand them a vague task. They are right to. The agent does not complain; it produces convincingly-shaped output that does not work, and you find out at review time. The fix is the same one a good engineering manager learns on the human side: write the brief properly the first time.
Iterate with the agent rather than around it
The second change cuts against most engineers’ instinct. When the first attempt comes back wrong, the default reflex is to take over. Open the file, fix the bug, move on. With another human in pair-programming this works, because both of you watch the change land and learn from it. With an agent, taking over breaks the productivity loop entirely, and you spend more time fixing the agent’s output than you would have spent writing it from scratch.
The skill we have had to train into the team is staying in the conversation. When the first attempt is wrong, describe what is wrong, specifically. “The function name does not match our naming convention; we use camelCase for internal helpers, see helpers.ts in the same folder.” “The error path swallows the exception; we log and rethrow, see the pattern in supplier-adapter-v1.” “The test covers the happy path only; we expect coverage on the timeout case and the malformed-payload case, both documented in the brief.”
Then let the agent fix it. The second attempt is usually closer. When it is not, you go round again, more specifically. Each loop takes seconds. Manual fixing breaks the loop and you are back to writing the code yourself, which is the thing the agent was supposed to remove from your day.
This was the hardest behavioural change to embed. Our seniors have twenty years of muscle memory for “I’ll just fix it myself, it’ll be quicker.” Sometimes that instinct is right. More often, by the third or fourth attempt, you realise that staying in the conversation would have got you home in half the time.
Human review stays in the loop
Code from Claude Code goes through the same review process as code from any human team member. There is no shortcut here, and there should not be one. The senior engineer who briefed the agent reviews the resulting branch. The PR gets a second pair of eyes from someone who was not in the original conversation. The CI pipeline runs. The deploy gates fire as normal.
Treat that review step as the only thing keeping the rest of the practice honest. Without rigorous review, the agent’s output becomes a black box. Bugs accumulate. Subtle architectural drift creeps in: a slightly different way of structuring an error class here, a slightly different naming convention there, all individually defensible, all collectively corrosive. Six months in you have a codebase that nobody can read because it was written by an agent who was given different briefs every week and told to use its judgement.
For us this dovetails with how we run commercials. We quote fixed-price on every project. Predictable output quality is the only way that pricing model stays viable. If the agent’s code goes into production without proper review and the bug surfaces three weeks later in a client environment, it is our cost to fix, not the client’s. That focuses the mind on review discipline in a way that hourly billing simply does not. Treat the review step as the part that makes AI-assisted delivery commercially safe, rather than as a tax on it.
What this changes for code review and CI-CD
When the typing is no longer the bottleneck, the shape of an engineer’s day shifts. The work that used to fill the middle of the afternoon, the actual writing of well-understood code, compresses into something closer to overhead. The work on either side of it expands.
Code review changes character. A reviewer who used to scan for typos, off-by-one errors, and obvious style violations now spends most of their time on architecture. Does this change belong in this service or somewhere else? Does the abstraction match the rest of the system? Is the test coverage on the right cases, or is it covering the easy cases because they were easy to generate? The agent will produce code that compiles, passes the linters, and clears the unit tests it wrote for itself. The architectural questions remain human work, and our reviewers have to be more senior, not less, to do that job well.
CI-CD pipelines change ownership. When standardising a new pipeline shape takes twenty-five minutes instead of three hours, the pipeline stops being a once-a-quarter project and becomes routine standard work. We update them more often, retire patterns more aggressively, and the conventions document gets read more often because more engineers need to know what is in it. A vague conventions document becomes a measurable productivity drag, because the agent reads it every time it touches a new service.
Pair-programming dynamics shift too. We still pair, but the second human in the pair is increasingly the reviewer rather than the co-author. The agent is the co-author. Two engineers brief, one reviews, one integrates. It took us a few weeks of awkwardness to settle into.
The team-side change
Rolling this practice out across a small senior engineering team did not happen by sending round a memo. It happened through deliberate, repeated, slightly tedious practice.
We hold a short weekly session where engineers bring one example of where the agent worked well and one example of where it did not. The good examples become reference briefs that the rest of the team can study. The bad examples get dissected: was the brief too thin, was the context wrong, did the engineer take over too early, was the review step rushed. Neither set of examples is treated as a triumph or a failure. They are training material.
We have written internally about the three-step framework that emerged from this. Context first, prompt second. Iterate with the agent. Human review stays in the loop. Three sentences, but each one represents a behavioural change that does not stick on the first attempt. Engineers have to be coached into briefing properly, into staying in the conversation, into treating review as architectural rather than cosmetic. We run a Claude AI training programme around exactly this shift in working practice for client teams who want to roll the same change out across their own engineering function. The technical material is the easier half. The harder half is unlearning the habits that made you a good engineer in 2020 but get in your way in 2026.
Teams looking to strengthen their understanding of agentic workflows and real-world AI collaboration can also explore this Claude AI course Kickstarter for practical implementation guidance.
The teams that adopt this practice well are the ones whose seniors are willing to be visibly clumsy with the new tool in front of their juniors. The teams that struggle are the ones where seniors try the tool in private, conclude it does not work for their style, and go back to typing the code themselves. The behavioural shift is downstream of permission, and permission has to come from the top of the engineering function.
Where this kind of practice lives
This way of working is most natural inside small senior engineering teams that own their delivery end to end. We see it most often inside the small specialist bespoke software firms in London running this engineering practice, where the work runs on a fortnightly demo cadence with a senior pair leading delivery and the wider engineering team behind them for review, deployment and launch. These firms build full custom platforms commissioned from the ground up alongside smaller focused internal applications, and both kinds of work come out of the same client conversations depending on what the client actually needs. The smaller applications retire overlapping subscriptions or replace a stitched-together off-the-shelf workflow. The full custom platforms run major parts of a client’s business and replace the stack that they have outgrown.
The shape of the team is what makes the agentic practice work. A senior pair fronting delivery means the brief-writing skill sits where it needs to sit, with the people who understand the architecture. A wider engineering team behind them for review means the review step is rigorous regardless of who pressed enter on the prompt. A fixed-price quote on every project means there is a real commercial reason to keep output quality consistent rather than letting the agent’s failure cases leak into client time. None of those structural choices were made because of AI. All of them turn out to make AI-assisted delivery more workable than it would otherwise be.
Larger system integrators with offshore delivery teams and time-and-materials billing have a different optimisation surface. The agent is still useful to them, but the team-side change is harder, because briefing standards, review discipline, and architectural ownership are spread across more people in more places. Smaller specialist firms have the advantage that the same handful of engineers brief, build, review and ship, and the feedback loop on what works closes within a sprint.
What we got right, what we are still working on
Looking back at the year, the thing we got right early was treating the agent as a team member rather than a feature of the IDE. Once you frame it that way, the right practices follow without too much argument. The thing we got right late was the brief-writing discipline. For the first three or four months, we underestimated how much of an engineer’s productivity gain came from learning to write a good brief, and how little came from getting better at the prompt. We were measuring the wrong thing.
What we are still working on is the long-running engagement. Most of our work runs eight to sixteen weeks, fixed-price, with a senior pair fronting delivery and the wider engineering team behind them. The agentic practice slots cleanly into that shape. What is harder is the longer-running internal work, where the brief shifts every couple of weeks and the conventions document drifts under the pressure of new requirements. We have not yet found a stable answer for keeping the agent’s context fresh over a six-month engagement without one engineer’s day disappearing into context-curation. That is the open problem we are sitting with going into the second half of the year.
The advice we would give a dev lead trying to make the same shift in 2026 is unfashionable. Forget the prompt library. Spend the first month writing a single excellent conventions document and rebuilding the team’s habit of pointing at it before they ask for code. Watch your seniors brief the agent. Critique the briefs the way you would critique a junior’s first task. Make the perspective test mandatory for two weeks even when it slows people down, because that is the bit that builds the muscle. The productivity gain shows up in week six, not week one, and only if the discipline shows up in week one.