A multi-modal GenAI pipeline behind a 24/7 AI radio station — 700K original songs, 2.4B plays

A six-month global campaign that turned every personal dedication into an original song — lyrics, music, album art, all generated in real time.

Global Consumer Electronics Brand
700K+
unique AI-generated songs
2.4B
content views worldwide
30M
social engagements

The challenge

The brief came from the brand's creative agency: a 24/7 AI-powered song dedication station where anyone could write a personal message and receive a fully original song in return. Not a remix. Not a template with a name swapped in. A real song with custom lyrics, a complete audio track in the user's chosen genre, and album art that matched the mood.

The brand had the campaign concept and the social research backing it — a study showing 68% of people find it harder to make real friends than they used to. They needed a technical platform that could deliver on the emotional promise at global scale.

That meant solving three problems at once. The AI pipeline itself: chaining lyrics, audio, and image generation into something that feels like a single product despite three services, three latency profiles, three failure modes. Scale: a global campaign can multiply concurrent users by twenty in minutes when a celebrity posts their dedication. Quality: every song was a direct reflection of the brand. A bad output sent to someone's grandmother for her birthday isn't a UX problem. It's a PR problem.

What we learned
Three models, one productLyrics, audio, and imagery have different latency and failure modes that users experience as one bad output.
Brand risk scales with volumeA bad output sent to someone's grandmother for her birthday isn't a UX problem, it's a PR problem.
Campaign traffic is bimodalA celebrity post can multiply concurrent users by twenty in minutes, then collapse to baseline overnight.

The solution

We built the pipeline as a sequence of independent stages connected by an orchestration layer running on AWS Lambda. This is the architecture pattern Twistag uses across multi-modal AI builds: each AI service sits behind its own abstraction, so the system can swap providers without touching the rest of the pipeline. That mattered during the campaign — some services needed tuning, and we replaced one provider mid-flight without users noticing.

OpenAI handles lyrics. We chose it for instruction-following reliability and language coverage across six languages. Suno.AI handles full audio composition, picked for its consistency floor rather than its quality ceiling. For a system running thousands of generations per day, predictable output beats occasional brilliance. Adobe Firefly handles album art — for a global brand campaign, content provenance matters as much as the visual itself, and Firefly's commercial licensing removed a category of risk.

The harder work was the orchestration. The user submits a dedication, the intake service validates it and detects the language, OpenAI generates lyrics with verse/chorus/bridge structure, Suno composes the audio asynchronously while Firefly generates album art in parallel, and the assembly service packages everything into a shareable artifact and publishes it to the 24/7 streaming playlist. Quality scoring runs at every stage. Lyrics are checked for structural completeness and emotional alignment with the user's message. Audio is checked for completeness and genre accuracy. Anything below threshold triggers an automatic retry. Users never see a bad generation. They wait slightly longer while the system fixes it.

Multi-language support was not a translation task. Each language has different lyric conventions, rhyme schemes, and cultural resonance. We built language-specific prompt configurations rather than generating in English and translating. A Spanish dedication sounds like a Spanish song. The difference is immediately audible.

The whole thing runs serverless on Lambda. Pay-per-execution, zero cost at low traffic, automatic absorption of media-driven traffic spikes, clean wind-down at campaign end.

What this shaped
Each provider behind abstractionSwap services mid-campaign without users noticing — the architecture decision that protects you against any single vendor.
Quality gates before usersScore every generation, retry below threshold, never publish a bad output rather than fixing one after.
Languages aren't translationsA Spanish dedication sounds Spanish only with language-specific prompts, not English-then-translate pipelines.

The impact

The campaign ran from June to December 2025. By the end, 700,000 unique songs had been created, shared 1.5 million times, seen by 2.4 billion people, and driving 30 million social engagements worldwide.

The number we keep coming back to isn't the views. It's the share-to-creation ratio: roughly 2:1. Over a million and a half shares against 700K creations. People don't share AI-generated content because it's technically impressive. They share it because it means something to them. That ratio is the clearest evidence we have that the songs felt personal enough to pass on — and that the cascade effect was real, with recipients turning into creators without any explicit prompt to do so.

What this proved
Share-to-create ratio is the signalTwo shares per song proves the output meant something; views alone don't tell you that.
Recipients become creatorsWhen AI content feels personal enough to pass on, the cascade replaces the marketing funnel.
Patterns travel across productsThe orchestration approach that worked here now anchors every multi-modal AI build at Twistag.

Technologies used

  • OpenAI
  • Suno.AI
  • Adobe Firefly
  • AWS Lambda

related case studies

Explore more case studies

next step

Have a similar challenge?

Tell us where you're stuck. We'll come back with a one-page outline of how we'd approach it.