Papers
-
ReCoSplat: Autoregressive Feed-Forward Gaussian Splatting Using Render-and-CompareNVIDIA / Hong Kong University of Science and Technology, Shanghai Jiao Tong University, Swiss Federal Institute of Technology in Zurich, University of California, Merced
-
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data
-
Scalable Training of Mixture-of-Experts Models with Megatron Core
-
Scalable Training of Mixture-of-Experts Models with Megatron Core
-
AI+HW 2035: Shaping the Next Decade
-
FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling
-
Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline
-
RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots
-
ManipulationNet: An Infrastructure for Benchmarking Real-World Robot Manipulation with Physical Skill Challenges and Embodied Multimodal Reasoning
-
V1 : Unifying Generation and Self-Verification for Parallel Reasoners
-
ROBOMETER: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons
-
CuTe Layout Representation and Algebra
-
Mode Seeking meets Mean Seeking for Fast Long Video Generation
-
VGG-T3: Offline Feed-Forward 3D Reconstruction at Scale
-
Test-Time Training with KV Binding Is Secretly Linear Attention
-
Toward the Thermodynamic Limit: Neural Operators for Non-equilibrium Dynamics of Mott Insulators
-
El Agente Gráfico: Structured Execution Graphs for Scientific Agents
-
EgoScale: Scaling Dexterous Manipulation with Diverse Egocentric Human Data
-
World Action Models are Zero-shot Policies
-
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning
-
iGRPO: Self-Feedback–Driven LLM Reasonin
-
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos
-
Learning to Discover at Test Time
-
Pretraining Large Language Models with NVFP4
-
3D-GENERALIST: Vision-Language-Action Models for Crafting 3D Worlds
-
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
-
Describe Anything: Detailed Localized Image and Video Captioning
-
One-Minute Video Generation with Test-Time Training
-
Data Scaling Laws for End-to-End Autonomous Driving
-
Cosmos World Foundation Model Platform for Physical AI
-
NVLM: Open Frontier-Class Multimodal LLMs
-
Nemotron-4-340B-Instruct
-
ChipNeMo: Domain-Adapted LLMs for Chip Design
-
Eureka: Human-Level Reward Design via Coding Large Language Models
-
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
-
Neuralangelo: High-Fidelity Neural Surface Reconstruction
-
Voyager: An Open-Ended Embodied Agent with Large Language Models
-
Reflexion: Language Agents with Verbal Reinforcement Learning
-
FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators
-
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
