Overview
Parakeet TDT is NVIDIA’s low-latency, streaming speech-to-text model in the Parakeet family—a transformer-transducer architecture tuned for high accuracy, robust noise handling, and real-time partial transcripts, deployable on GPUs from edge to data center.
Description
Parakeet TDT is a modern ASR model built around a transformer-transducer design, which combines an acoustic encoder with a lightweight prediction/decoding network to produce word-accurate transcripts with minimal delay. It’s trained for robustness across accents, microphones, and noisy environments, and it preserves long-form coherence while still responding interactively in streaming mode. The model emits stable partials and finalized text with punctuation, casing, and inverse-text normalization, and it can return timestamps needed for alignment or downstream analytics. Domain adaptation is straightforward: you can fine-tune on industry jargon or custom corpora so recognition improves for contact centers, medical dictation, meetings, or embedded commands. For production it integrates with NVIDIA’s inference stack, running efficiently with TensorRT and scaling as a NIM microservice; quantization options help fit tighter latency or memory budgets, making it practical on embedded Jetson devices as well as multi-GPU servers. If you need fast, reliable ASR that maintains quality under real-world conditions and slots cleanly into agent pipelines, Parakeet TDT is the workhorse in the Parakeet lineup.
About NVIDIA
No company description available.
Industry:
Computer Hardware Manufacturing
Company Size:
10001+
Location:
Santa Clara, California, US
Website:
nvidia.com
Related Models
Last updated: September 22, 2025