TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

MiniMax VL 01

By MiniMax
New Text Gen 7
Released: January 14, 2025

Overview

MiniMax-VL-01 is a vision-language model that reads images and text together. It handles OCR, charts, screenshots, and real-world photos, then answers in natural text or structured JSON. It supports long context, function calling, and streaming for multimodal RAG and assistants.

Description

MiniMax-VL-01 pairs a compact vision encoder with a strong language backbone so it can look, read, and reason in one pass. You can supply scans, tables, diagrams, UI screenshots, or product photos alongside a prompt, and the model extracts details, follows instructions, and returns grounded explanations or schema-true JSON. It keeps multi-image threads coherent, points to relevant regions when needed, and maintains context across long prompts. For production use it supports function calls, streaming tokens, and easy integration with retrieval so outputs stay verifiable. Typical applications include document automation, dashboard and chart interpretation, screenshot and UI understanding, multimodal search, and developer copilots that reason directly from images while keeping latency and cost practical.

About MiniMax

MiniMax is a Chinese AI company (Shanghai) focused on developing multimodal foundation models across text, image, audio, video, and music.

Website: minimax.io
View Company Profile

Related Models

Last updated: October 15, 2025