Overview
Megatron-Turing NLG is a 530B-parameter Transformer co-developed by NVIDIA and Microsoft. Trained on a large curated text corpus, it delivers strong performance in open-ended generation, long-form QA, summarization, and few-shot learning, and it helped pioneer large-scale training with Megatron and DeepSpeed.
Description
Megatron-Turing NLG, often called MT-NLG 530B, is a dense decoder-only language model built to explore the limits of scale. It was trained on a broad mix of high-quality web text, books, and other public sources, with careful filtering to improve fluency and coverage. The model uses Megatron and DeepSpeed for tensor, pipeline, and data parallelism so it can train and serve across many GPUs while keeping throughput practical. In use, it handles few-shot and zero-shot prompts with steady quality, produces coherent long-form text, and supports tasks like summarization, question answering, code and data reasoning, and style-controlled writing. MT-NLG also served as a reference system for large-model engineering, influencing later work on efficient parallelism, inference optimization, and fine-tuning methods, and it is commonly accessed through optimized runtimes in the NVIDIA and Azure ecosystems.
About Microsoft
Microsoft is a technology company that offers a wide range of software, cloud computing services, hardware, and artificial intelligence solutions.
Industry:
Software Development
Company Size:
10001+
Location:
Redmond, Washington, US
View Company Profile