TAAFT
Free mode
100% free
Freemium
Free Trial
Create tool

HyperCLOVA X SEED Vision

New Text Gen 3
Released: April 22, 2025

Overview

HyperCLOVA X SEED Vision (3B) is NAVER’s lightweight multimodal model capable of understanding images and text (plus video frames) and answering in text. It supports visual question answering, chart/diagram interpretation, basic OCR, and works well with long-text context. It balances capability with efficiency.

Description

HyperCLOVA X SEED Vision Instruct-3B builds on the SEED line by adding vision understanding alongside textual instruction following. It combines a 3.2B-parameter language model with a ~0.43B vision encoder, using a SigLIP-based module for images plus a connector (“C-Abstractor”) that merges vision and language features across a grid system. It accepts image or video input (processed into frames), with visual resolution about 378×378 per grid and support for up to roughly 1.29 million total pixels across grids for richer visual content.

Inputs can include a mix of images/videos + text and questions, up to long contexts (≈ 16K tokens) so you can handle extended prompts or document/image + text combos. Because it's instruction-tuned and reinforced, it also supports supervised fine-tuning and some vision-specific RLHF to improve alignment and responsiveness. The model is optimized for efficiency: fewer visual tokens per frame in video mode to reduce compute, OCR-free processing where possible, and performance focused especially on visual reasoning and image understanding rather than maximal generative richness.

In benchmarks it performs well on Korean culture/language tasks and on multimodal vision benchmarks (VQA, diagram/chart/image tests), though not always at the level of much larger VL models. It’s a strong option for apps that need solid visual understanding in a lightweight model—image-based Q&A, dashboard/charts summarization, screenshot or document assistance, or mixed media chat agents—particularly in Korean contexts.

About Naver Corporation

Naver is a South Korean online platform operator, known for its search engine, e-commerce platform, and various internet services.

Industry: Internet
Company Size: 5001-10000
Location: Seongnam, Gyeonggi-do, KR
View Company Profile

Related Models

Last updated: October 3, 2025