Industry

HappyHorse: Alibaba's Mystery AI Video Model That Topped Every Leaderboard Overnight

A mysterious model called HappyHorse appeared out of nowhere on AI video leaderboards, crushing Seedance 2.0 and every other competitor. Days later, Alibaba claimed ownership. Here's the full story — and why solo founders should pay attention.

OPC Community

Community Team

Apr 10, 2026 8 min read

On April 8, 2026, something unusual happened on the Artificial Analysis Video Arena — the most trusted blind-test leaderboard for AI video generation models. A model nobody had heard of, submitted under the name 'HappyHorse-1.0', appeared at the very top of both the text-to-video and image-to-video rankings. No launch event. No technical blog post. No company name attached. Just raw performance that left the AI community scrambling to figure out who built it.

Within 48 hours, HappyHorse had accumulated over 31,000 evaluation samples and achieved an ELO score of 1,389 in text-to-video and 1,403 in image-to-video — crushing the previous champion, ByteDance's Seedance 2.0, by a significant margin. The internet went wild. Stock prices moved. And then, on April 10, the mystery was solved: Alibaba stepped forward and claimed ownership.

“No launch event. No press release. Just a model that quietly appeared at #1 and refused to move.”

The mystery: Who built the horse?

For two days, the AI community played detective. The anonymous submission sparked intense speculation across Chinese tech circles, Zhihu threads, and investor group chats. The leading theories pointed to two suspects: Kuaishou (the company behind Kling AI) and Alibaba.

The Kuaishou theory made surface-level sense — Kling had been the previous generation's darling, and HappyHorse's capabilities felt like a natural evolution. But the Alibaba theory had stronger evidence. Insiders pointed to a newly formed team inside Alibaba's Taotian Group (the e-commerce arm that includes Taobao and Tmall), operating under a secretive R&D unit called the 'Future Life Lab'.

On April 10, an official Weibo account named 'HappyHorse_AI' went live, confirming what many had suspected: HappyHorse was built by Alibaba's ATH Innovation Business Unit. The team is led by Zhang Di — notably, the former Vice President of Kuaishou and the original technical lead behind Kling AI. He brought key talent from the Kling team to Alibaba, and HappyHorse is the first fruit of that move.

What makes HappyHorse technically impressive

HappyHorse isn't just another incremental improvement in AI video generation. It introduces several architectural innovations that set it apart from the competition:

Unified audio-video generation

Most AI video models generate video first and then add audio as a separate post-processing step. HappyHorse takes a fundamentally different approach: it generates synchronized video and audio simultaneously from a single text or image prompt. One Transformer handles both video and audio tokens in a unified self-attention mechanism. The result is video with perfectly synced sound effects, ambient audio, dialogue, and even Foley — no post-production dubbing needed.

Sandwich Transformer architecture

The model uses a 40-layer Transformer in what the team calls a 'sandwich layout.' The first and last 4 layers use modality-specific projections (handling the unique characteristics of text, image, video, and audio), while the middle 32 layers share parameters across all modalities. This elegant design allows the model to deeply understand cross-modal relationships while keeping the parameter count efficient at just 15 billion — remarkably small for a model that tops every leaderboard.

Blazing fast inference

Thanks to a custom inference runtime called MagiCompiler and a reduced 8-step denoising process (without classifier-free guidance), HappyHorse achieves extraordinary speed: approximately 2 seconds for a 5-second video at 256p, and about 38 seconds for full 1080p output on an H100 GPU. For comparison, many competing models take minutes for the same output quality.

Multilingual lip-sync

HappyHorse supports 7 languages out of the box — English, Mandarin, Cantonese, Japanese, Korean, German, and French — with industry-leading word error rates for lip-synced dialogue. This is a massive deal for content creators who serve multilingual audiences.

The numbers that matter

ELO 1,389 in text-to-video (Artificial Analysis), #1 globally
ELO 1,403 in image-to-video, leading #2 by 50+ points
#1 on all four Artificial Analysis video leaderboards (text-to-video, image-to-video, with audio, without audio)
15B parameters — smaller than many competitors that rank far lower
1080p output at 16:9 or 9:16, generating 5–8 second clips
8-step denoising — most models require 20–50 steps
7 languages with native lip-sync support

Open source and API access

In a move that shocked many in the industry, the team announced that HappyHorse 1.0 will be fully open-sourced — model weights, code, and all. The GitHub repository and Hugging Face model page are already live, and the team has confirmed that a cloud API will be available starting April 30, with usage-based pricing.

This is a significant strategic play by Alibaba. By open-sourcing a model that's currently #1 in the world, they're making a bet that ecosystem adoption and developer goodwill will be more valuable than keeping the technology proprietary. It's the same playbook that made Meta's LLaMA models so influential in the LLM space — and it could reshape the AI video generation landscape in the same way.

The bigger picture: China's AI video generation war

HappyHorse's emergence turns the AI video generation space into a three-way race among China's tech giants. ByteDance has Seedance (built on top of their Doubao / Jimeng ecosystem). Kuaishou has Kling, which was the breakout hit of 2025. And now Alibaba has HappyHorse — built by the very people who created Kling in the first place.

The talent migration story is particularly fascinating. Zhang Di didn't just leave Kuaishou — he brought core members of the Kling technical team with him to Alibaba. It's a reminder that in the AI race, the most valuable asset isn't data or compute — it's the people who know how to push the frontier.

Meanwhile, OpenAI's Sora, which generated enormous hype in early 2024, has been left behind. HappyHorse outperforms Sora 2 on every benchmark, and it's open source. The speed at which Chinese labs are iterating on video generation is staggering.

Why solo founders should care

If you're a solo founder or indie builder, HappyHorse matters for several practical reasons:

Product demos and marketing videos: Generate professional-quality product videos from text descriptions in seconds. No need to hire a videographer or learn After Effects.
Social media content at scale: Create short-form video content for TikTok, Instagram Reels, and YouTube Shorts with synchronized audio — in 7 languages.
Prototype visual experiences: If you're building anything that involves video (education, entertainment, marketing tools), you can now prototype with state-of-the-art generation for free.
Cost advantage: Open-source means you can run it on your own GPU or use the API at usage-based pricing — no enterprise contracts or minimum commitments.
Multilingual content: One prompt, multiple languages with native lip-sync. Perfect for solo founders targeting global markets.

The AI video generation space is moving at a breathtaking pace. What required a production studio 18 months ago can now be done by a single person with a text prompt. HappyHorse isn't just a benchmark winner — it's another brick in the wall of tools that make the one-person company more powerful than ever.

Key takeaways

HappyHorse 1.0, built by Alibaba's ATH unit (led by ex-Kuaishou/Kling team lead Zhang Di), is now the #1 AI video generation model globally
It generates video and audio simultaneously using a unified 15B-parameter Transformer — a first in the industry
The model is fully open-source with an API launching April 30
It outperforms ByteDance's Seedance 2.0, Kuaishou's Kling 3.0, and OpenAI's Sora 2 on all major benchmarks
For solo founders, this means professional video content creation is now essentially free and instant

The AI video generation war is heating up, and the biggest winner is the independent creator. Build solo, not alone — and now, with a horse.

Share this article

X LinkedIn Facebook Email

Join the OPC Community

Connect with solo founders worldwide. Get daily insights, curated opportunities, and peer support.

Join the Waitlist

HappyHorse: Alibaba's Mystery AI Video Model That Topped Every Leaderboard Overnight

The mystery: Who built the horse?

What makes HappyHorse technically impressive

Unified audio-video generation

Sandwich Transformer architecture

Blazing fast inference

Multilingual lip-sync

The numbers that matter

Open source and API access

The bigger picture: China's AI video generation war

Why solo founders should care

Key takeaways

More from the Blog

ChatGPT 5.5 Explained: What Solo Founders Actually Need to Know

DeepSeek V Explained: The Open-Source Model Reshaping Solo Founder Economics

You're Not Weird for Building Alone

Explore OPC City Chapters

Taipei

New York

London

Singapore

Join the OPC Community