Pulse: building autonomous teams with less process and more impact
Ninety percent of the code we ship at Twistag is spec-driven and AI-generated by design. The other ten percent is where our engineers spend their judgment. That ratio is the fastest we have ever shipped, and it is what autonomous engineering looks like at this volume — but only if the system around the engineer keeps pace. Pulse is that system: our AI delivery methodology, with five delivery agents that move work from brief to production, an eight-check agentic review layer that gates every pull request, and a unified dashboard that turned five disconnected tools into one screen. The result is smaller teams shipping more, with less process and more impact.
The diagnosis: review broke at AI volume
Shipping is fast now. Things that used to take a sprint take a couple of days. The part nobody wants to talk about is what that volume does to the people reviewing it.
When most of what an engineer is reading was not written by another human, the reading itself changes. Eyes skim. Attention drifts toward formatting because the formatting always looks tidy. AI does not write bad code most of the time — it writes plausible code. Code that passes the tests, looks fine, and quietly reintroduces the auth pattern the team killed six months ago.
"If your AI usage went up 10x and your review process didn't, you don't have a faster team. You have a slower incident waiting to happen."
— Fred Sarmento, founder, Twistag
We saw that incident coming early and decided not to wait for it to cost a client. For years our stack was Snyk, Sonar, Codacy and LinearB. Each one solved a slice. None of them were built for a world where most of the code is not written by a person. And on top of that, none of them talked to each other.
This is where João came in.
"Five tools. None of them talked to each other. One morning we said enough."
— João Belo, VP of Engineering, Twistag
What we built has three pillars. The first delivers the work. The second reviews the work. The third tells us whether either of the first two is doing its job. Together they are Pulse.
What Pulse is, and what it isn't
Pulse is a proprietary AI delivery methodology built around senior engineers. It is not an autopilot, a code generator, or a wrapper around someone else's tooling. It is the operating layer Twistag uses to deliver client work, designed for the reality that the volume of generated code now outpaces the human capacity to read it line by line.
Three pillars carry the methodology:
| Pillar | What it does | Who owns the final call |
|---|---|---|
| Five delivery agents | Convert intent into shipped software across spec, architecture, code, quality and deployment | Senior engineer |
| Eight-check agentic review layer | Gate every pull request against the patterns tired reviewers miss | Senior engineer (after the layer clears) |
| Pulse Dashboard | Surface quality, security, cost, activity, people, adoption and ops in one screen | Engineering leadership |
The agents handle volume. The review layer enforces consistency at that volume. The dashboard tells leadership where judgment is most needed. The engineer still owns every call — approves, overrides, pushes back. What Pulse changes is what review looks like at this scale.
Pillar one: the five delivery agents
The five agents map to the phases of the delivery lifecycle. Each one removes a specific kind of friction that slows engineers down.
| Agent | Phase | What it removes |
|---|---|---|
| Spec | Discovery → Specification | Ambiguous requirements, missing acceptance criteria, undocumented edge cases |
| Arch | Architecture | Late-binding architecture debates, undocumented trade-offs, scale assumptions made in PR comments |
| Code | Implementation | Boilerplate, scaffolding, repetitive convention enforcement |
| Guard | Quality assurance | Drift between what was specified and what was built, regressions, performance surprises |
| Ship | Deployment | Manual rollback planning, environment drift, post-deploy monitoring set up after the incident |
Spec turns briefs, calls and stakeholder conversations into structured specifications with acceptance criteria and edge cases written down before a single line of code is written. Engineers start sprints with clarity, not questions.
Arch evaluates that specification against the project's existing architecture, technical constraints, and scale requirements. It proposes implementation paths, flags trade-offs, and documents decisions. The architecture conversation happens before the pull request, not during code review.
Code pairs with engineers during development. It generates implementation scaffolding, writes tests alongside features, and enforces project conventions across the codebase so senior engineers move at higher velocity on the parts that actually need them.
Guard runs continuous quality analysis across code, tests, and deployment configuration. It catches regressions, security issues, and performance degradation before they reach staging. Every merge request meets the same standard regardless of who wrote it.
Ship orchestrates deployment pipelines, environment configuration, and release sequencing. It handles rollback planning, feature flags, and post-deployment monitoring setup. What used to take a dedicated DevOps cycle now runs as part of every sprint.
The agents do not replace engineers. They remove the work that slows engineers down so the senior people Twistag hires spend their time on architecture decisions, complex problem-solving, and client collaboration — not on boilerplate, documentation, or deployment checklists.
That is the delivery side. Now the part that broke at AI volume: review.
Pillar two: the eight-check agentic review layer
The agentic review layer sits between Code and a human approver. It is not a linter. It is not a security scanner with a pretty UI. It is eight specialised checks, each one tuned to the failure modes that show up when most of the code in a pull request was not typed by a person. Every PR that touches a client environment passes through this layer before a human signs off.
The eight checks are sequential. A failure halts the pipeline; the engineer either fixes the issue or files an explicit override that lands in the audit trail. No silent passes.
| # | Check | What it catches | Why it matters at AI volume |
|---|---|---|---|
| 1 | Spec conformance | Drift between the diff and the agreed spec or acceptance criteria | AI tends to over-deliver: it adds a flag here, a helper there, none of it asked for. Conformance keeps scope honest. |
| 2 | Security regression | Known vulnerable patterns, secrets in code, deprecated auth or authorization patterns reintroduced | Plausible code can pass tests and quietly bring back a pattern the team killed months ago. This is the check that keeps that from shipping. |
| 3 | Convention drift | House style violations: naming, error handling, logging, file structure, module boundaries | AI writes code that looks tidy in isolation but fights the project's conventions in aggregate. Drift compounds across files. |
| 4 | Anti-pattern repetition | The same poor pattern echoed across multiple files in the same PR | One bad pattern is a code smell. The same bad pattern in eight files is a future refactor on the engineering budget. |
| 5 | Performance pitfalls | N+1 queries, hot-loop allocations, missing indexes, unbounded recursion, accidental quadratic work | Tests pass at test scale. Performance fails at production scale. The check models the difference. |
| 6 | Test integrity | Tautological assertions, tests that verify the implementation rather than the contract, missing edge cases | AI is excellent at writing tests that confirm whatever it just wrote. The check forces tests back onto the spec. |
| 7 | Dependency & supply-chain hygiene | New packages, version pinning, license risk, transitive CVEs, unused additions | A new dependency is a permanent decision made in a five-second autocomplete. The check makes that decision explicit. |
| 8 | Data & API safety | Breaking API contract changes, unsafe migrations, PII handling, schema compatibility | The change that breaks a client integration is rarely the one a human reviewer notices at 5pm on Friday. The check does. |
Each check produces a structured verdict — pass, fail with required fix, or pass-with-warning — and writes the result back to the PR with the offending lines highlighted. A senior engineer then reads the verdicts, not the diff cold.
"AI doesn't write bad code, mostly. It writes plausible code. Passes the tests, looks fine, and quietly reintroduces an auth pattern the team killed six months ago."
— Fred Sarmento, founder, Twistag
Two principles guard the layer itself. First, the human still owns every call. The review layer surfaces what to look at; it does not approve or merge. Second, every override is logged, attributed, and visible on the dashboard. If a team is overriding the same check repeatedly, that is signal — either the check is wrong, or the team is taking on debt with its eyes open. Either way, leadership sees it.
Pillar three: the Pulse Dashboard
The eight checks fixed half of the review problem. The other half was visibility. Five tools, five tabs, five different opinions about what a healthy project looks like, and code review that dragged into Friday because the picture was always one tab away.
So we built one dashboard. Seven categories. Every angle, one screen.
"Judgment goes where it matters. For the first time, we can see where that is."
— João Belo, VP of Engineering, Twistag
| Category | What it surfaces | The decision it informs |
|---|---|---|
| Quality | Escaped defects per release, P0/P1 bug count, regression rate, agentic-review pass rate per check, code-review escape rate | Where reviewer time pays back, which of the eight checks needs tightening |
| Security | Open CVEs by severity, time-to-patch, secrets-exposure incidents, license risk, dependency drift, authz regressions caught at the agentic layer | When to escalate, what to ship, what to hold |
| Cost & ROI | Cost per shipped feature, AI inference spend per squad, infra cost trend, build and test minutes consumed, hours saved by Pulse against baseline | Where Pulse pays back, where it doesn't, where the budget is being spent on motion rather than progress |
| Activity | PRs opened and merged, deploy frequency, lead time for changes, review turnaround, work-in-progress, cycle time | Bottlenecks, flow health, whether the team is actually shipping or just busy |
| People | Workload distribution, on-call rotation load, focus-time vs. meeting-time, after-hours commit activity, knowledge concentration | Where burnout is brewing, where to redistribute, where one person knows too much that nobody else does |
| Adoption | Feature usage by released capability, activation rate of new releases, customer engagement on Pulse-shipped work, retention by cohort | Whether the things we ship are the things that get used |
| Ops | Uptime, error rate, P95 and P99 latency, alert volume, MTTR, MTTD, on-call incident count | Production health, where to invest reliability work next |
The categories are not independent. A spike in Activity (PRs flying through) without a corresponding spike in Adoption (customers actually using what shipped) is a warning. A green Quality board with a red People board means the team is paying for that quality with sleep. The dashboard is wired so leadership can read those relationships in seconds, not in three Slack threads.
The dashboard is also the audit trail. Every override on the agentic review layer, every escaped defect, every infra cost spike is attributed and timestamped. If a client asks why we shipped a particular release on a particular day, the answer is on the dashboard.
What changed for engineers, and for clients
Two shifts came out of all of this, and they reinforce each other.
For engineers, judgment moved up the stack. They are no longer reading every diff cold at 5pm because the agentic layer has already pre-read it for them. They are reading the verdicts, the overrides, the high-signal sections the layer flagged. The work is still demanding — arguably more so, because every interaction is non-trivial — but the volume of low-signal review is gone. The engineers Twistag hires shipped enough systems before joining to know when "looks fine" isn't fine. AI just made that instinct more valuable, not less.
For clients, two things change. Smaller teams deliver more: a five-person Twistag squad with Pulse produces what traditionally requires eight to ten engineers. And consistency stops depending on individual heroics — the agentic review layer enforces the same eight checks on sprint one and sprint twenty, regardless of which engineer is on duty.
The numbers we watch closely are the unglamorous ones. Override rate per check, by team, over time. Escape rate of issues that cleared the agentic layer and reached production. Time from PR opened to merge, separated by whether the PR touched a client environment. None of these numbers are headline metrics — they are the metrics that tell us whether Pulse is actually working or whether we are fooling ourselves with motion.
Three principles behind Pulse
Three principles hold the system together. They are also the parts a buyer in another industry can take and apply tomorrow.
One: assume the code was not written by a human, and review accordingly. When you stop expecting human authorship, you stop relying on the heuristics that depend on it (intent, comments, naming choices) and start building checks that work on the artefact regardless of who or what produced it. Most review processes still assume human authorship. Most code now isn't.
Two: every override is data, not friction. A check that fails and gets overridden is the most useful event in the system, because it tells you whether the check is calibrated, whether the team is taking on debt knowingly, and whether a particular pattern is becoming load-bearing. Overrides are routed to the dashboard, not buried in a CI log.
Three: one screen for engineering health beats five tabs. The cost of context-switching between tools is paid by the people whose judgment you most want focused. Pulling the most decision-relevant signal from each tool into one view is not about prettiness — it is about putting judgment in the place where the trade-offs are visible.
None of this works without experienced product engineers in the loop. The tooling is sharp, but the final call still belongs to someone who has shipped enough systems to know when plausible code is hiding a real problem. Twistag's hiring bar exists because of this, not in spite of it. AI made that bar more important, not less.
What this means for buyers
For organisations evaluating engineering partners, the question is no longer whether a team uses AI. Everyone uses AI. The question is whether AI is embedded in the delivery system — with review, visibility, and accountability around it — or bolted on top of a process that was designed for ten percent of today's code volume.
Pulse is the system. The five delivery agents move work forward. The eight-check review layer keeps it production-grade. The dashboard tells us where judgment is being spent and whether it is paying back. Twistag built it because the alternative was waiting for the slow incident to land. We would rather not.
Let's talk


