TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

OmniVoice

By Xiaomi
OmniVoice is a state-of-the-art multilingual TTS model from the Xiaomi Next-gen Kaldi team, built for zero-shot speech generation across more than 600 languages. Public project materials describe it as using a novel diffusion language model-style architecture over discrete audio tokens, with support for three main modes: voice cloning from reference audio, voice design from speaker attributes, and automatic voice generation without a reference clip. The project also highlights very fast inference, with reported real-time factors as low as 0.025, and an Apache 2.0 open-source release.
New Multimodal Gen 3
Released: April 2, 2026

Overview

OmniVoice is a multilingual zero-shot text-to-speech model built for voice cloning, voice design, and general speech synthesis at massive language scale. It supports more than 600 languages, uses a diffusion language model-style architecture, and is positioned for high-quality speech generation with fast inference.

About Xiaomi

Consumer electronics and smart device company making smartphones, wearables, IoT products, home appliances, smart TVs, scooters, and connected lifestyle hardware.

Industry: Consumer Electronics
Company Size: 43690
Location: Beijing, CN
Website: mi.com
View Company Profile

Tools using OmniVoice

No tools found for this model yet.

Last updated: April 7, 2026
0 AIs selected
Clear selection
#
Name
Task