A Chrome extension scoring every draft against a UK utility's own guidelines — used by 6,000 employees

A Chrome extension that lives inside the tools 6,000 employees already use, scoring every draft against the company's own communication guidelines in real time

UK Water Utility Company
18 hrs
saved per week across the comms team
3,000+
communications scored every month
5
scoring dimensions tuned to the utility's standards

The challenge

A major UK water utility had a problem that's easier to ignore than fix. Six thousand employees, thousands of communications going out every month, and no reliable way to keep any of it consistent. Internal guidelines existed. Nobody followed them under deadline pressure.

Water utilities communicate under scrutiny in a way most industries don't. A service disruption notice that sounds defensive, a regulatory filing with ambiguous phrasing, a public announcement that reads like a different company wrote it from last month's — all of it erodes public trust slowly and visibly. Some of that inconsistency is more than a brand problem. The wrong sentence in a regulatory bulletin can be treated as an admission of liability.

The communications team was spending a meaningful chunk of every week correcting drafts that shouldn't have needed correcting. The obvious response was to hire more reviewers, which doesn't fix anything. It just adds a person to catch what the previous person missed. The actual issue was that the standards for "good" communication lived inside the heads of a handful of senior editors. Take those editors out of the loop and quality visibly dropped.

What we learned
The cost wasn't reputationalThe wrong phrasing in a regulated bulletin gets treated as an admission of fault, not a brand stumble
More reviewers compounds the problemEvery new reviewer needs the same tribal knowledge senior editors carry — adding people doesn't add standards
The bottleneck was knowledge, not capacityWhen standards live in editors' heads, they don't survive deadlines no matter how many people are on the team

The solution

We started with two weeks of discovery. Not workshops with a polished output deck — actual working sessions with comms leads, IT, and the people writing these documents every day. We came out knowing we were building a compliance system, not a productivity tool. Those are different products with different scoring requirements, different prompt engineering, and different definitions of success.

What we built is a Chrome extension that runs inside any browser-based tool the team already uses — email clients, document editors, internal content systems. It captures text as people write, sends it to our analysis backend on Google Cloud Platform, and returns scoring and suggestions within seconds. It's invisible until it activates. Employees keep working as they always have. Suggestions appear as an overlay when content needs attention. This is one of those architectural choices that decides whether a tool gets adopted or quietly uninstalled. The research on AI tool adoption is consistent: if you make someone leave their current application to get a suggestion, most won't bother. Twistag has seen this pattern across enough integrations to treat it as a hard constraint.

The scoring engine is the heart of the system. We worked with the utility's communications team to build a labelled training dataset from hundreds of real communications, calibrating an OpenAI-based pipeline to the difference between a regulatory filing and a community update, and between plain language that's acceptable and phrasing that sounds like an admission of fault. Content is scored across five dimensions: clarity, empathy, actionability, tone, and accuracy. Each dimension produces a score; below threshold, the system generates specific rewrite suggestions. The employee decides what to do with them. The AI doesn't publish anything.

The harder engineering work was the prompt layer. Generic AI writing tools fail in regulated environments because they optimise for fluency. A sentence that reads smoothly might still contain phrasing a water industry regulator reads as an admission of liability. We built a custom prompt engineering layer that encodes the utility's compliance requirements, terminology preferences, and context-dependent tone rules into the model's instruction set. An improvement to clarity can't introduce compliance risk. The two are evaluated together. When the utility's guidelines evolve or regulations change, we update the prompt configuration. No model retraining. No redeployment.

What this shaped
Compliance system, not productivity toolThe framing decision shaped every architecture choice that came after — scoring, prompts, what success looked like
Meet people where they already workA Chrome extension inside email and document editors got adopted; a standalone tool wouldn't have
Calibration beats generalisationA custom prompt layer trained on the utility's communications outperforms any generic tone model — at a fraction of the retraining cost — fine-tuning makes that nearly impossible

The impact

Six weeks into the pilot, the assistant was processing more than 3,000 communications a month. The communications team got back around 18 hours a week — hours that went into proactive stakeholder outreach and strategy work senior editors had been hired to do but couldn't, because they were busy reviewing first drafts.

The number that tends to get attention is the compliance one: zero non-compliant language flags in regulatory bulletins during the pilot. Not "significantly reduced." Zero. For a utility under regulatory scrutiny, that's a different category of result than a productivity improvement. It's risk removed.

What this proved
Risk removed, not productivity gainedFor a regulated utility, eliminating compliance flags is a different category of result than saving hours
Adoption follows invisibilitySix thousand employees absorbed the tool without training rollouts because it lived inside what they were already using
Calibrated trust scalesThe system scaled because the standard it enforced was the utility's own — a framework agreed upon during discovery, not invented after

Technologies used

  • OpenAI
  • Python
  • GCP
  • Next.js

related case studies

Explore more case studies

next step

Have a similar challenge?

Tell us where you're stuck. We'll come back with a one-page outline of how we'd approach it.