Overview
Qianfan-VL 70B is Baidu’s large vision-language model on the Qianfan platform. It ingests images (docs, charts, screenshots, photos) with text and produces grounded answers, featuring strong OCR and layout understanding, long context, tool/function calling, streaming, and reliable JSON outputs for multimodal RAG and enterprise apps.
Description
Qianfan-VL 70B pairs a 70-billion-parameter language core with a high-quality vision encoder so it can “look, read, and reason” in one pass. It handles dense documents and tables, diagrams, dashboards, and natural imagery, keeping small text legible and layouts intact while following precise instructions. Multi-image prompts stay coherent across pages or UI states, and responses can be formatted as schema-true JSON for downstream automations. The model supports long contexts for multi-page PDFs and image sequences, streams tokens for responsive UX, and uses native function calling so agents can crop regions, fetch metadata, or query retrieval backends mid-answer. Running on Baidu’s Qianfan stack, it slots cleanly into production with consistent APIs, guardrails, observability, and private networking options. Teams use Qianfan-VL 70B for document automation, chart and dashboard analysis, screenshot and UI understanding, multimodal search and RAG, and developer assistants that reason directly from images—getting flagship-level visual intelligence with enterprise-grade deployment.
About Baidu
Baidu is a Chinese multinational technology company specializing in internet-related services, products, and artificial intelligence.
Industry:
Internet
Company Size:
10001+
Location:
Beijing, CN
View Company Profile