PaliGemma

PaliGemma

Model family: Gemma

PaliGemma is a lightweight open vision-language model inspired by PaLI-3 and built from open components such as SigLIP and Gemma. It takes images and text prompts as input and outputs text, supporting tasks such as image captioning, image and short-video understanding, object detection, visual question answering, and reading text embedded in images.

Overview

PaliGemma is Google’s open vision-language model that accepts images plus text and outputs text for captioning, visual question answering, OCR-style tasks, and detection.

📜OCR

About Google DeepMind

AI research lab within Google developing frontier models, scientific AI systems, reinforcement learning methods, and products that support Google services and research.

Industry: Artificial Intelligence

Company Size: 6000

Location: London, England, GB

Website: deepmind.com

View Company Profile

Last updated: July 21, 2026

Go to section

Search

Overview

About Google DeepMind

Other models from this family

Related Models

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: