Building a browser and IDE-based AI code generation tool used by 20,000 engineers, pre-Cursor and Claude Code

An AI-powered code generation platform that grew to 20,000+ engineers across 56 languages — built years before the market consolidated, then deliberately shut down.

$Refraction$

20,000+

engineers on the platform

programming languages with idiomatic output

2.9M

lines of code generated

The challenge

Every software engineering team has the same category of work: necessary, important, mechanical. Writing unit tests for existing functions. Adding documentation to undocumented code. Refactoring legacy patterns into modern idioms. Converting modules between languages when projects require it. These tasks aren't intellectually stimulating. They consume time that should go toward architecture decisions, system design, hard problems.

Inside Twistag's own engineering team, the cost was visible. Senior engineers writing boilerplate test cases. Code reviews flagging the same documentation gaps every sprint. Language conversions between client projects done manually, one file at a time. Skilled developers were doing work an AI system could handle faster and more consistently.

In early 2022, the tools to address this didn't really exist. GitHub Copilot was in limited preview. Cursor hadn't launched. The category was nascent — a few autocomplete tools offering inline suggestions, nothing comprehensive. Large language models had demonstrated genuine capability with code in research settings. Nobody had built a product that translated that capability into the full range of tasks developers actually needed day-to-day.

The quality bar for developer tools is higher than most software. Engineers examine AI-generated code with more scrutiny than almost any other user group examines AI output. A tool that produced plausible-looking but subtly incorrect code would lose trust immediately and wouldn't get a second chance. The output had to be idiomatic — Python that reads like Python, TypeScript that respects strict mode, Rust that handles ownership correctly.

What we learned

Mechanical work eats senior hours	Test boilerplate, doc gaps, language conversions — necessary work that no engineer should be doing manually.
Engineers scrutinise hardest	Of any user group, developers test AI output most adversarially — confidence lost on day one rarely returns.
Idiom matters as much as correctness	Code that runs but reads wrong erodes trust faster than occasional buggy output that's easy to spot.

The solution

We started with our own problem. The initial version was a web-based tool for generating unit tests and documentation from existing code blocks, used internally by Twistag engineers for several months before we thought about external distribution. Honest feedback from developers who used the tool daily and had no reason to be polite about its shortcomings shaped the quality bar that eventually made the product viable externally.

The most consequential technical decision was a multi-provider LLM abstraction layer rather than coupling to a single AI provider. Different LLMs showed different strengths across languages and task types. One produced better Python test generation. Another handled TypeScript documentation more naturally. A third performed well on language conversion. The abstraction layer routed each request to the optimal provider for that combination of language, framework, and task. It also provided resilience as the LLM market moved fast through 2023 and 2024 — provider outages, pricing changes, and capability shifts were all real events that a single-provider architecture would have turned into platform-level problems.

The platform expanded through deliberate iteration based on usage data, not based on what seemed impressive to build. Phase one: code refactoring, unit test generation, and documentation creation — the three most common pain points. Phase two added bug detection, language conversion, CSS transformation, CI/CD pipeline generation, regex generation, SQL query writing, and diagram generation from code structure. Each feature had a measurable use case before we built it. Phase three brought the product into editors with VS Code and JetBrains extensions — developers selected a code block, right-clicked, invoked any platform action without leaving the editor. Phase four addressed team use: shared code generation history, team management, unified billing — what enterprise adoption required.

Supporting 56 programming languages wasn't a matter of setting a parameter and letting the LLM handle it. We invested heavily in language-specific prompt engineering so generated output followed each language's conventions. Python with list comprehensions and type hints where the idiom called for them. TypeScript respecting strict mode. Rust handling ownership and borrowing correctly. The supported list covered mainstream and specialised ecosystems — Python, JavaScript, TypeScript, Java, C#, C++, Ruby, Go, Rust, Swift, Kotlin, PHP, Scala, R, MATLAB, Haskell, Elixir, Dart, and dozens more.

What this shaped

Provider diversity beats provider perfection	Route each request to the best model for the task — single-provider bets always lose.
Per-language prompts, not translation	Python should sound like Python and Rust should respect ownership. Generic prompts produce generic output.
Ship what users need first	Build from real usage — features that sound impressive on a roadmap rarely earn their cost.

The impact

By 2024, the platform was serving 20,000+ engineers across teams at Accenture, Amazon, Google, Red Bull, Uber, Rakuten, and many smaller organisations. More than 2.9 million lines of code had been generated through it. Then we made the strategic decision to shut it down as the market consolidated around competitors with fundamentally different capital structures.

This case study isn't about a product that failed. It's about a deliberate strategic decision and what it created: deep, first-hand expertise in LLM integration, prompt engineering at scale, developer experience design, and the operational reality of running AI in production. That expertise is now the foundation of every AI product Twistag builds for clients.

What this proved

Production teaches faster than research	Twenty thousand engineers using the tool taught us more about LLM integration than months of internal work.
Knowing when to stop	A clean shutdown decision teaches your team to read market signals — knowledge that compounds beyond one product.
Platform bets are temporary	Industry consolidation makes any LLM-tooling moat short-lived — extract the engineering knowledge and apply it elsewhere.

Technologies used

OpenAI
Anthropic Claude
Next.js
React

Building a browser and IDE-based AI code generation tool used by 20,000 engineers, pre-Cursor and Claude Code

The challenge

What we learned

The solution

What this shaped

The impact

What this proved

Technologies used

Explore more case studies

Automating regulatory and supplier diligence for European cosmetic brands. 75% saved per audit

Building a production-ready marketing attribution platform in 11 months

Have a similar challenge?

Tell us where you're stuck. We'll come back with a one-page outline of how we'd approach it.