OmniVoice

OmniVoice

OmniVoice is a state-of-the-art multilingual TTS model from the Xiaomi Next-gen Kaldi team, built for zero-shot speech generation across more than 600 languages. Public project materials describe it as using a novel diffusion language model-style architecture over discrete audio tokens, with support for three main modes: voice cloning from reference audio, voice design from speaker attributes, and automatic voice generation without a reference clip. The project also highlights very fast inference, with reported real-time factors as low as 0.025, and an Apache 2.0 open-source release.

Overview

OmniVoice is a multilingual zero-shot text-to-speech model built for voice cloning, voice design, and general speech synthesis at massive language scale. It supports more than 600 languages, uses a diffusion language model-style architecture, and is positioned for high-quality speech generation with fast inference.

🔊Text to speech 🎤Voice changing 🗣️Voice cloning 🌐Multilingual communication

About Xiaomi

Consumer electronics and smart device company making smartphones, wearables, IoT products, home appliances, smart TVs, scooters, and connected lifestyle hardware.

Industry: Consumer Electronics

Company Size: 43690

Location: Beijing, CN

Website: mi.com

View Company Profile

Tools using OmniVoice

No tools found for this model yet.

Last updated: April 7, 2026

Go to section

Search

Overview

About Xiaomi

Tools using OmniVoice

Related Models

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: