Building a browser and IDE-based AI code generation tool used by 20,000 engineers, pre-Cursor and Claude Code
An AI-powered code generation platform that grew to 20,000+ engineers across 56 languages — built years before the market consolidated, then deliberately shut down.

The challenge
Every software engineering team has the same category of work: necessary, important, mechanical. Writing unit tests for existing functions. Adding documentation to undocumented code. Refactoring legacy patterns into modern idioms. Converting modules between languages when projects require it. These tasks aren't intellectually stimulating. They consume time that should go toward architecture decisions, system design, hard problems.
Inside Twistag's own engineering team, the cost was visible. Senior engineers writing boilerplate test cases. Code reviews flagging the same documentation gaps every sprint. Language conversions between client projects done manually, one file at a time. Skilled developers were doing work an AI system could handle faster and more consistently.
In early 2022, the tools to address this didn't really exist. GitHub Copilot was in limited preview. Cursor hadn't launched. The category was nascent — a few autocomplete tools offering inline suggestions, nothing comprehensive. Large language models had demonstrated genuine capability with code in research settings. Nobody had built a product that translated that capability into the full range of tasks developers actually needed day-to-day.
The quality bar for developer tools is higher than most software. Engineers examine AI-generated code with more scrutiny than almost any other user group examines AI output. A tool that produced plausible-looking but subtly incorrect code would lose trust immediately and wouldn't get a second chance. The output had to be idiomatic — Python that reads like Python, TypeScript that respects strict mode, Rust that handles ownership correctly.
What we learned
| Mechanical work eats senior hours | Test boilerplate, doc gaps, language conversions — necessary work that no engineer should be doing manually. |
| Engineers scrutinise hardest | Of any user group, developers test AI output most adversarially — confidence lost on day one rarely returns. |
| Idiom matters as much as correctness | Code that runs but reads wrong erodes trust faster than occasional buggy output that's easy to spot. |
The solution
We started with our own problem. The initial version was a web-based tool for generating unit tests and documentation from existing code blocks, used internally by Twistag engineers for several months before we thought about external distribution. Honest feedback from developers who used the tool daily and had no reason to be polite about its shortcomings shaped the quality bar that eventually made the product viable externally.
The most consequential technical decision was a multi-provider LLM abstraction layer rather than coupling to a single AI provider. Different LLMs showed different strengths across languages and task types. One produced better Python test generation. Another handled TypeScript documentation more naturally. A third performed well on language conversion. The abstraction layer routed each request to the optimal provider for that combination of language, framework, and task. It also provided resilience as the LLM market moved fast through 2023 and 2024 — provider outages, pricing changes, and capability shifts were all real events that a single-provider architecture would have turned into platform-level problems.
The platform expanded through deliberate iteration based on usage data, not based on what seemed impressive to build. Phase one: code refactoring, unit test generation, and documentation creation — the three most common pain points. Phase two added bug detection, language conversion, CSS transformation, CI/CD pipeline generation, regex generation, SQL query writing, and diagram generation from code structure. Each feature had a measurable use case before we built it. Phase three brought the product into editors with VS Code and JetBrains extensions — developers selected a code block, right-clicked, invoked any platform action without leaving the editor. Phase four addressed team use: shared code generation history, team management, unified billing — what enterprise adoption required.
Supporting 56 programming languages wasn't a matter of setting a parameter and letting the LLM handle it. We invested heavily in language-specific prompt engineering so generated output followed each language's conventions. Python with list comprehensions and type hints where the idiom called for them. TypeScript respecting strict mode. Rust handling ownership and borrowing correctly. The supported list covered mainstream and specialised ecosystems — Python, JavaScript, TypeScript, Java, C#, C++, Ruby, Go, Rust, Swift, Kotlin, PHP, Scala, R, MATLAB, Haskell, Elixir, Dart, and dozens more.
What this shaped
| Provider diversity beats provider perfection | Route each request to the best model for the task — single-provider bets always lose. |
| Per-language prompts, not translation | Python should sound like Python and Rust should respect ownership. Generic prompts produce generic output. |
| Ship what users need first | Build from real usage — features that sound impressive on a roadmap rarely earn their cost. |
The impact
By 2024, the platform was serving 20,000+ engineers across teams at Accenture, Amazon, Google, Red Bull, Uber, Rakuten, and many smaller organisations. More than 2.9 million lines of code had been generated through it. Then we made the strategic decision to shut it down as the market consolidated around competitors with fundamentally different capital structures.
This case study isn't about a product that failed. It's about a deliberate strategic decision and what it created: deep, first-hand expertise in LLM integration, prompt engineering at scale, developer experience design, and the operational reality of running AI in production. That expertise is now the foundation of every AI product Twistag builds for clients.
What this proved
| Production teaches faster than research | Twenty thousand engineers using the tool taught us more about LLM integration than months of internal work. |
| Knowing when to stop | A clean shutdown decision teaches your team to read market signals — knowledge that compounds beyond one product. |
| Platform bets are temporary | Industry consolidation makes any LLM-tooling moat short-lived — extract the engineering knowledge and apply it elsewhere. |
Technologies used
- OpenAI
- Anthropic Claude
- Next.js
- React

