Ismat Samadov
  • Tags
  • About
15 min read/1 views

Qwen 3.5 Is Quietly Beating Every Western Open-Source Model — And Nobody Noticed

Alibaba's Qwen hit 1B+ downloads, beats GPT-5.2 on instruction following, and costs 13x less than Claude. The open-source AI race is over.

AILLMOpen SourceMachine Learning

Related Articles

AI Agents in Production: 94% Fail Before Week Two

14 min read

The xz-utils Backdoor Was a Preview — Software Supply Chain Security Is Broken

13 min read

OpenAI, Anthropic, Databricks: The Largest AI IPO Wave in History Is Coming

17 min read

Enjoyed this article?

Get new posts delivered to your inbox. No spam, unsubscribe anytime.

On this page

  • The Numbers That Should Have Made Headlines
  • What Qwen 3.5 Actually Is
  • The Benchmark Smackdown
  • The 9B Model That Embarrassed OpenAI
  • Llama 4: The Fall From Grace
  • Mistral: The European Middle Child
  • The Apache 2.0 Advantage
  • Why Nobody in the West Noticed
  • The $53 Billion Behind the Models
  • How to Actually Use Qwen 3.5 Today
  • Running Locally
  • Via API
  • For Coding
  • Fine-Tuning
  • Decision Framework: When to Use Qwen 3.5
  • The Risks Nobody Talks About
  • What I Actually Think
  • Sources

© 2026 Ismat Samadov

RSS

In December 2025, Alibaba's Qwen family of AI models had more monthly downloads on HuggingFace than the next eight model families combined -- Meta Llama, DeepSeek, OpenAI, Mistral, Nvidia, Zhipu.AI, Moonshot, and MiniMax. All of them. Together. And the Western AI press barely mentioned it.

By January 2026, Qwen had crossed 700 million cumulative downloads. By April, it passed one billion. The model family has spawned over 180,000 derivative models -- more than Google and Meta combined. And Qwen 3.5, the February 2026 flagship, beats GPT-5.2 on instruction following while running at 13x lower cost than Claude Sonnet 4.6.

This isn't a "rising contender" story. Qwen already won the open-source model race. Most people just haven't realized it yet.


The Numbers That Should Have Made Headlines

Here's the headline nobody wrote: Chinese open-source AI models went from 1.2% of global usage in late 2024 to roughly 30% by end of 2025. Qwen drove most of that growth. And the trajectory hasn't slowed.

Let me put the adoption numbers in context:

MetricQwenMeta LlamaMistral
Cumulative HuggingFace downloads (Q1 2026)1 billion+~600 million~150 million
Derivative models on HuggingFace180,000+~90,000~30,000
Enterprise users90,000+Not disclosedNot disclosed
LicenseApache 2.0Meta Community (700M MAU cap)Apache 2.0
Languages supported20112+~20

Qwen overtook Llama in cumulative downloads by October 2025. That's not a typo. Meta -- the company that staked its open-source AI reputation on Llama -- got passed six months ago.

And it wasn't close.


What Qwen 3.5 Actually Is

Qwen 3.5 dropped on February 16, 2026. The flagship model -- Qwen3.5-397B-A17B -- has 397 billion total parameters but only activates 17 billion per forward pass. That's the magic of its sparse Mixture-of-Experts architecture: 512 total experts, 10 routed plus 1 shared activated per token.

The full family spans eight models, from 0.8B to 397B:

ModelTotal ParamsActive ParamsArchitectureContext Window
Qwen3.5-397B-A17B397B17BMoE + Gated DeltaNet262K native / 1M extended
Qwen3.5-122B-A10B122B10BMoE + Gated DeltaNet262K / 1M
Qwen3.5-35B-A3B35B3BMoE + Gated DeltaNet262K / 1M
Qwen3.5-27B27B27BDense262K / 1M
Qwen3.5-9B9B9BDense262K
Qwen3.5-4B4B4BDense262K
Qwen3.5-2B2B2BDense262K
Qwen3.5-0.8B0.8B0.8BDense262K

The key innovation is Gated Delta Networks -- a hybrid attention mechanism that mixes linear attention (Gated DeltaNet) with traditional attention across a repeating layer pattern. This delivers 8.6x to 19x faster decoding than the previous Qwen3-Max architecture. Faster inference at lower cost with competitive accuracy. That's the trifecta every model maker promises and almost nobody delivers.

All models are released under Apache 2.0 -- the most permissive open-source license in the AI space. No usage caps. No branding requirements. No restrictions on commercial deployment. Use it however you want.


The Benchmark Smackdown

I'm going to show you the numbers that most comparison articles either cherry-pick or ignore. Here's Qwen 3.5's flagship (397B-A17B) against the best closed models:

CategoryBenchmarkQwen 3.5 (397B)GPT-5.2Claude Opus 4.6Gemini 3 Pro
Instruction FollowingIFBench76.575.458.0--
Instruction FollowingMultiChallenge67.657.954.264.2
MathAIME 202691.396.793.3--
MathHMMT Feb 202594.899.4----
CodingSWE-bench Verified76.480.080.976.2
CodingLiveCodeBench v683.6------
KnowledgeMMLU-Pro87.8----89.8
VisionMathVision88.683.0--86.6
VisionMathVista (mini)90.3----87.9
VisionOCRBench93.1----90.4
MultilingualMMMLU88.5----90.6
Tool UseBFCL-V472.2----55.5 (GPT-5 mini)

Source: Qwen 3.5 Complete Guide, DataCamp, NVIDIA NIM

Read that table carefully. An open-source model -- free, Apache 2.0, run-it-on-your-own-hardware -- is beating GPT-5.2 on instruction following, beating it on vision benchmarks, and trading blows on coding. Yes, GPT-5.2 still wins on pure math reasoning. Claude Opus 4.6 still leads on agentic tasks and SWE-bench. But an open-weight model competing at this level with $200/month proprietary APIs? That's the story.

The gap between open and closed just collapsed. And it was a Chinese lab that closed it.


The 9B Model That Embarrassed OpenAI

Here's the stat that should have been on every tech blog's front page: Qwen3.5-9B outperforms OpenAI's GPT-OSS-120B -- a model thirteen times its size -- on multiple benchmarks:

BenchmarkQwen3.5-9BGPT-OSS-120B (13x larger)
MMLU-Pro82.580.8
GPQA Diamond81.780.1
IFEval91.588.9
MMMLU (multilingual)81.278.2
HMMT Feb 202583.276.7
C-Eval88.276.2

This model runs on a single consumer GPU. You can load it in 10-16 GB of RAM with Q4 quantization and get roughly 30 tokens per second on an AMD Ryzen AI Max+395. On your laptop. For free. Beating a model that costs money to call via API and needs a server farm to run.

VentureBeat called it "a major milestone for compact language models." Simon Willison described the Qwen family as "exceptionally good" and highlighted the 2B model as a breakthrough in efficiency.

The implications are uncomfortable for every company charging per-token API fees. If a 9B model running locally can beat a 120B model running in the cloud, the pricing model of commercial AI providers starts looking fragile.


Llama 4: The Fall From Grace

To understand why Qwen won, you need to understand how Llama lost.

Meta's Llama 4 launch was a disaster. The company submitted a specially crafted, non-public variant of Llama 4 Maverick to the LMArena leaderboard -- a version optimized for "conversationality" that produced verbose, emoji-filled responses. The public release was nothing like it. When LMArena tested the actual release version, it ranked #32 -- not #1 as initially claimed.

A departing Meta AI Chief confirmed: "Results were fudged."

That wasn't the only problem:

IssueDetail
Benchmark manipulationNon-public "conversational" variant submitted to LMArena
Coding ability16% on Ader Polyglot benchmark
Logical reasoningLackluster compared to GPT-4o and DeepSeek R1
License restrictionsMeta Community License requires branding above 700M MAU
Trust damageLMArena officially reprimanded Meta

Meanwhile, Meta is now developing two proprietary models codenamed "Mango" and "Avocado" -- a potential retreat from the open-source strategy that made Llama famous in the first place.

The community noticed. Nathan Lambert wrote that Qwen 3 "dethroned Llama" on the LocalLlama subreddit -- the first time that had ever happened. He called it "the new open standard."


Mistral: The European Middle Child

Mistral isn't doing badly. It's just disappearing.

Mistral Small 4 (March 2026) is genuinely impressive -- it outperforms GPT-OSS-120B on LiveCodeBench while producing 20% less output, which means lower token costs. The engineering is solid. But market share tells a different story.

According to OpenRouter data, Mistral's API market share is roughly 2%, down from a peak of 10% the prior year. Usage tripled in absolute terms, but relative share collapsed because everyone else grew faster.

Mistral's problem isn't quality. It's scope. The company supports about 20 languages vs. Qwen's 201. Its model family is smaller. Its derivative ecosystem is a fraction of Qwen's. And with $3.05 billion in total funding compared to Alibaba's $53 billion AI infrastructure commitment, the resource gap is unbridgeable.

Mistral will survive as a solid European alternative. But it's not competing for the crown anymore.


The Apache 2.0 Advantage

Licensing sounds boring until it determines whether you can actually use a model in production.

LicenseQwen 3.5Llama 4DeepSeek V3Mistral Small 4
TypeApache 2.0Meta CommunityMITApache 2.0
Commercial useUnrestrictedRestricted above 700M MAUUnrestrictedUnrestricted
ModificationUnrestrictedRequires attributionUnrestrictedUnrestricted
Branding requirementNoneRequired above thresholdNoneNone
SublicensingAllowedLimitedAllowedAllowed

Source: AI Magicx comparison, ComputingForGeeks

Llama's license looks open until you read the fine print. If your product crosses 700 million monthly active users, you need a separate commercial agreement with Meta. Most startups will never hit that cap. But the restriction creates legal uncertainty that enterprise legal teams hate. Apache 2.0 doesn't have that problem. You ship it. You're done.

For a Fortune 500 deploying AI across multiple products, the difference between "unrestricted" and "unrestricted with conditions" matters. This is one reason 90,000+ enterprises are already on Qwen. Singapore chose Qwen to power its national AI program. That's a sovereign government betting on a Chinese open-source model over every Western alternative.


Why Nobody in the West Noticed

Three reasons.

First: the China discount. Western tech media reflexively discounts Chinese AI achievements. When DeepSeek R1 launched in January 2026 and sent Nvidia's stock tumbling, the initial reaction from many commentators was "it must be distilled from GPT-4" rather than "China just built something competitive for a fraction of the cost." That same skepticism applies to Qwen, except Qwen doesn't have a single viral moment -- it's a steady drumbeat of releases that individually don't make headlines but collectively represent a tectonic shift.

Second: the team crisis. Right as Qwen 3.5 was peaking, the team imploded. Junyang Lin, the technical lead who drove Qwen's open-source strategy, resigned on March 4, 2026. Multiple core team members followed -- the head of post-training, the head of Qwen Code (who'd already left for Meta in January). Bloomberg reported internal conflicts over organizational restructuring and resource allocation. Alibaba's CEO held an emergency all-hands meeting.

The departures dominated the Qwen coverage. Instead of "Qwen 3.5 beats GPT-5.2 on instruction following," the headline became "Alibaba's AI team is falling apart." The benchmarks got buried.

Third: the frontier fixation. Western AI discourse is obsessed with frontier models -- the absolute best capability, regardless of cost. GPT-5. Claude Opus. Gemini Ultra. The question is always "which model is smartest?" not "which model delivers the best value?" Qwen 3.5 isn't the smartest model on Earth. It's the smartest model you can run for free. That distinction doesn't generate clicks, but it matters far more for the 99% of developers who can't afford $200/month API bills.


The $53 Billion Behind the Models

This isn't a scrappy startup punching above its weight. Alibaba announced $53 billion in AI and cloud infrastructure investment over three years -- described as China's largest-ever computing project financed by a single private business. For context, that's half the initial Stargate AI plan promoted by the U.S. government.

They're building custom silicon too. Alibaba's chip design arm T-Head shipped the Zhenwu 810E AI chip, an inference and training processor designed to compete with Nvidia's China-specific offerings. Over 470,000 AI chips shipped, with 60%+ deployed externally. The CEO admitted the chips "still lag behind foreign counterparts" -- but when US export restrictions block Nvidia's best GPUs from reaching China, "good enough and guaranteed supply" beats "best but maybe banned."

Morgan Stanley raised their Alibaba price target, projecting Alibaba Cloud revenue would double by 2028. AI-related product revenue has grown at triple-digit rates for six consecutive quarters.

Qwen isn't Alibaba's side project. It's the spearhead of a corporate transformation.


How to Actually Use Qwen 3.5 Today

Enough analysis. Here's the practical guide.

Running Locally

The 9B model is the sweet spot for most developers. Install via Ollama:

ollama pull qwen3.5:9b
ollama run qwen3.5:9b

Hardware requirements: 10-16 GB RAM with Q4 quantization. Any modern GPU with 12+ GB VRAM, or run it on CPU (slower but works).

For the 27B dense model (stronger, needs more hardware):

ollama pull qwen3.5:27b
ollama run qwen3.5:27b

This needs 20-32 GB RAM. An RTX 4090 or M2 Ultra handles it comfortably.

Via API

Alibaba Cloud offers the full lineup at aggressive pricing:

TierInput (per 1M tokens)Output (per 1M tokens)
Flash (small models)$0.10$0.40
Plus (flagship)$1.20--

For comparison, Claude Sonnet 4.6 runs $3 input / $15 output per 1M tokens. That's a 13x cost difference on the flagship alone.

For Coding

Qwen3-Coder-480B-A35B is the dedicated coding model -- 480 billion total parameters, 35 billion active, trained on 7.5 trillion tokens (70%+ code). It scores 66.5% on SWE-bench Verified and is compatible with OpenClaw, Claude Code, and Cline as a backend.

Fine-Tuning

Apache 2.0 means you can fine-tune without restrictions. The community has built robust tooling:

pip install unsloth

Unsloth provides optimized fine-tuning for all Qwen3.5 variants. vLLM and SGLang handle inference serving. The ecosystem is mature enough for production -- though expect occasional compatibility quirks with newer quantization formats.

Decision Framework: When to Use Qwen 3.5

Use Qwen 3.5 when:

  • You need strong multilingual support (201 languages, nothing else comes close)
  • You're deploying on-premises or in air-gapped environments
  • License terms matter (Apache 2.0 clears any legal review)
  • Budget is a constraint (13x cheaper than Claude on API, free locally)
  • Instruction following is critical (best-in-class on IFBench)
  • You need vision + text in a single model

Use something else when:

  • You need absolute best math reasoning (GPT-5.2 still leads on AIME/HMMT)
  • You need best-in-class agentic coding (Claude Opus 4.6 leads SWE-bench and Tau2-Bench)
  • Latency matters more than cost (Qwen's thinking mode inflates time-to-first-token)
  • You need long-context performance (Gemini 3 Pro leads LongBench v2)
  • Data sovereignty prohibits Chinese-origin models (a real constraint for some enterprises)

The Risks Nobody Talks About

I'd be dishonest if I didn't flag the concerns.

The team crisis is real. Losing your tech lead, your head of post-training, and your head of code development within months isn't a speed bump. It's an organizational wound. Alibaba responded by hiring Zhou Hao from DeepMind and putting the CTO in direct control. That might work. It might also signal a shift away from the open-source philosophy that made Qwen successful. Bloomberg reported that Lin had warned of a gap with OpenAI before stepping down.

Data privacy is a concern. Alibaba Cloud's API collects prompt and completion data that can be used to improve models. If you're running via API (not locally), your data flows through Chinese infrastructure. For many enterprises, especially in regulated industries, that's a non-starter regardless of model quality.

The overthinking problem. The biggest production complaint about Qwen 3.5 is excessive reasoning tokens. The model generates verbose internal reasoning before answering, which inflates costs and latency. This is tunable, but it's a real friction point for production deployments where response time matters.

Geopolitical risk. US-China tensions could affect Qwen's availability. Export controls, sanctions, or policy changes could restrict access to Alibaba Cloud APIs for Western companies. Running locally mitigates this -- Apache 2.0 means you own the weights -- but it's a consideration for teams building long-term dependencies.


What I Actually Think

The open-source AI race is over and Qwen won. Not temporarily. Not on a technicality. Won.

Look at the facts: most downloaded model family on Earth. Most derivative models. Most permissive license among the top three. Competitive with GPT-5.2 on instruction following. A 9B model that embarrasses models thirteen times its size. Available in 201 languages when Llama covers 12. Enterprise adoption at 90,000+ companies. A sovereign nation choosing it over every Western alternative.

And the Western AI industry's response has been... crickets. Or, worse, "yeah but it's Chinese."

Here's what I think happened. The Western AI narrative is built around a simple story: American companies build the best AI. OpenAI, Anthropic, Google, Meta -- they spend the most, attract the best talent, and produce the best models. Chinese companies copy or steal (see: the distillation wars). Open-source is a nice-to-have that Llama provides as a marketing strategy.

Qwen 3.5 breaks that narrative completely. It's not distilled from Western models. It's built on a novel architecture (Gated DeltaNet + MoE) that doesn't exist in any Western model. It's trained on 36 trillion tokens. It's backed by $53 billion in infrastructure investment. And it's beating Western models on benchmarks that matter -- instruction following, vision, multilingual, tool use -- while being free and unrestricted.

The uncomfortable truth? The Western open-source AI story was always about control, not openness. Meta released Llama with a license that restricts large-scale use. OpenAI released GPT-OSS with 120 billion parameters -- big enough to benchmark well, small enough to not threaten GPT-5. Google open-sourced Gemma while keeping Gemini proprietary. Every Western "open" model comes with an asterisk.

Qwen ships Apache 2.0. No asterisk. No 700M user cap. No "community license" that a lawyer has to interpret. Just... use it.

I don't think Qwen 3.5 is the best model ever made. Claude Opus 4.6 is better at agentic coding. GPT-5.2 is better at hard math. Gemini 3 Pro has better long-context handling. But "best at everything" was never the right metric for open-source. The right metric is: what's the best model I can deploy without asking anyone's permission?

The answer, right now, is Qwen. And the gap between Qwen and its open-source competitors isn't closing. It's widening.

Whether the team crisis derails this trajectory, whether Alibaba's pivot toward enterprise monetization compromises the open-source ethos, whether geopolitics makes Qwen toxic for Western companies -- those are real uncertainties. But as of April 2026, the facts are clear: the most important open-source AI project on the planet is being built in Hangzhou, not Menlo Park.

And the Western AI industry needs to stop pretending otherwise.


Sources

  1. Xinhua -- Qwen leads global open-source AI with 700 million downloads
  2. ElectroIQ -- 40+ Qwen AI Statistics
  3. HuggingFace -- State of Open Source, Spring 2026
  4. Qwen 3.5: The Complete Guide -- Benchmarks, Local Setup
  5. DataCamp -- Qwen3.5: Features, Access, and Benchmarks
  6. NVIDIA NIM -- Qwen3.5-397B Model Card
  7. VentureBeat -- Alibaba's Qwen3.5-9B beats OpenAI's GPT-OSS-120B
  8. Simon Willison -- Something is afoot in the land of Qwen
  9. SCMP -- Chinese open-source models account for 30% of global AI usage
  10. Open Source Foru -- Alibaba's Qwen Overtakes Western Rivals
  11. Slashdot -- Departing Meta AI Chief Confirms Llama 4 Benchmark Manipulation
  12. Neowin -- Unmodified Llama 4 Maverick Ranks #32
  13. CoderSera -- Why Llama 4 is a Disaster
  14. WinBuzzer -- Meta Solidifies Open Source Retreat with Proprietary Models
  15. Nathan Lambert / Interconnects -- Qwen 3: The New Open Standard
  16. Gend.co -- Mistral AI targets 1B euro revenue in 2026
  17. AI Magicx -- Qwen 3.5 vs Llama vs Mistral
  18. ComputingForGeeks -- Open Source LLM Comparison Table 2026
  19. SCMP -- Alibaba commits $53 billion for AI infrastructure
  20. Yahoo Finance -- Alibaba Plans to Spend $53 Billion on AI
  21. EE Times -- Alibaba Unveils Own AI Chip
  22. The Register -- Alibaba has made 470,000 AI chips
  23. TechCrunch -- Alibaba's Qwen tech lead steps down
  24. VentureBeat -- Did Alibaba just kneecap its Qwen AI team?
  25. Pandaily -- Alibaba Approves Qwen Lead Resignation
  26. Reuters -- DeepSeek Sends Shockwave Through AI Industry
  27. Fortune -- China could be the big winner in the AI race
  28. Alibaba Cloud Blog -- Qwen3.6-Plus
  29. Digital Commerce 360 -- Alibaba ties AI push to cloud growth
  30. Medium -- Qwen 3.5 Explained: Architecture, Upgrades
  31. HuggingFace -- Qwen3.5-397B-A17B Model Card
  32. Artificial Analysis -- Qwen3-Coder-480B
  33. Cloud Summit EU -- Mistral AI $14 billion valuation
  34. MIT Technology Review -- What's Next for Chinese Open-Source AI