Qwen 3 VL 235B A22B

Overview

Qwen3-VL-235B A22B is Alibaba’s flagship MoE vision-language model. It takes images (docs, charts, screenshots, photos) plus text and returns grounded answers with strong OCR, layout understanding, and multi-image reasoning. The MoE routing activates ~22B parameters per token for frontier quality at practical latency, with long context, function/tool calling, and reliable JSON outputs.

Description

Qwen3-VL-235B A22B pairs a high-capacity language backbone with a state-of-the-art vision encoder so it can “look, read, and reason” in one pass. It handles dense documents, tables, diagrams, dashboards, and natural images, keeping small text legible and layouts intact while following detailed instructions. Multi-image prompts remain coherent—characters, UI states, or pages stay consistent—and answers can point back to specific regions for grounded explanations. The A22B MoE design routes each token through a small subset of experts, delivering the accuracy you expect from a very large model while controlling latency and cost in production. It’s instruction-tuned for clean, controllable behavior, streams tokens for interactive use, and returns schema-true JSON so it drops neatly into RAG pipelines, agent frameworks, and automation. With long-context support and solid Chinese–English coverage, teams use it for enterprise document automation, analytics over charts and dashboards, screenshot and UI understanding, multimodal search, and developer assistants that reason directly from images—getting flagship VLM quality with deployment efficiency that scales.

About Alibaba

Chinese e-commerce and cloud leader behind Taobao, Tmall, and Alipay.

Website: alibaba.com

View Company Profile

Related Models

Last updated: October 14, 2025

Overview

Description

About Alibaba

Related Models

GPT-Neo

Vicuna

Mistral Small 3

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool