TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

VoxCPM2

By OpenBMB
VoxCPM2 is OpenBMB’s latest tokenizer-free TTS model, built to generate continuous speech representations directly through an end-to-end diffusion autoregressive architecture rather than discrete speech tokens. The repository describes it as a 2B-parameter model based on MiniCPM-4, trained on more than 2 million hours of multilingual speech data. It supports 30 languages, natural-language voice design, controllable voice cloning from short reference clips, and “ultimate cloning” with transcript-guided continuation, while outputting 48 kHz audio. The repo also reports real-time streaming with RTF as low as about 0.3 on an RTX 4090, or about 0.13 with Nano-VLLM.
New Multimodal Gen 3
Released: April 6, 2026

Overview

VoxCPM2 is OpenBMB’s open-source tokenizer-free multilingual text-to-speech model for natural speech generation, voice design, and controllable voice cloning. It is a 2B-parameter model trained on over 2 million hours of speech, supports 30 languages, and produces 48 kHz studio-quality audio with real-time streaming capability.

About OpenBMB

OpenBMB is short for Open Lab for Big Model Base. The goal of OpenBMB is to build the model base and toolkit for large-scale pre-trained language models.

Company Size: 100
Location: Beijing, CN
Website: openbmb.cn
View Company Profile

Tools using VoxCPM2

No tools found for this model yet.

Last updated: April 7, 2026
0 AIs selected
Clear selection
#
Name
Task