Papers
-
Where can AI be used? Insights from a deep ontology of work activitiesFeatured
-
Developments in Artificial Intelligence markets: New indicators based on model characteristics, prices and providersFeatured
-
DanceOPD: On-Policy Generative Field Distillation
-
Autodata: An agentic data scientist to create high quality synthetic data
-
MJEPA: A Simple and Scalable Joint-Embedding Predictive Architecture for Audio-Visual Learning
-
NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?
-
Qwen-AgentWorld: Language World Models for General Agents
-
Sesame: Structure-Aware Molecular Generation via Spatial Density-Map Conditioning
-
You Don't Need to Run Every Eval
-
Tapered Language Models
-
Agent-as-a-Router: Agentic Model Routing for Coding Tasks
-
Unlimited OCR Works
-
Tmax: A simple recipe for terminal agents
-
Cloak: Zero-Shot Cross-Embodiment Manipulation by Masking the End-Effector from the VLA
-
DataClaw0: Agentic Tailoring Multimodal Data from Raw Streams
-
Inverting the Bellman Equation: From $Q$-Values to World Models
-
Sakana Fugu Technical Report
-
MemoryWAM: Efficient World Action Modeling with Persistent Memory
-
Reinforcement Learning Towards Broadly and Persistently Beneficial Models
-
S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence
-
Reference-Driven Multi-Speaker Audio Scene Generation from In-the-Wild Priors
-
Agentic Robot Policy Self-Improvement in the Real World
-
What Must Generalist Agents Remember?
-
Do as I Do: Dexterous Manipulation Data from Everyday Human Videos
-
MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction
-
Data-Forcing Distillation: Restoring Diversity and Fidelity in Few-Step Video Generation
-
EgoInfinity: A Web-Scale 4D Hand-Object Interaction Data Engine for Any-View Robot Retargeting and Video-to-Action Robot Learning
-
Looped World Models
-
Variable-Width Transformers
-
VisualClaw: A Real-Time, Personalized Agent for the Physical World
-
MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide Generation with Multi-turn Local Revision
-
Greed Is Learned: Visible Incentives as Reward-Hacking Triggers
-
VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models
-
Latent Thought Flow: Efficient Latent Reasoning in Large Language Models
-
TopoRetarget: Interaction-Preserving Retargeting for Dexterous Manipulation
-
DreamX-World 1.0: A General-Purpose Interactive World Model
-
Human Universal Grasping
-
ART-Glove: Articulated Tactile Glove for Contact-Grounded Dexterous Interaction Capture
-
Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation
-
You Don't Need Strong Assumptions: Visual Representation Learning via Temporal Differences
-
SimWeaver: Zero-Shot RGB Sim-to-Real for Deformable Manipulation
-
Steering Autoregressive Vision-Language-Action Policies via Action Token Intervention
-
Universal Manipulation Exoskeleton: Learning Compliant Whole-body Policies with Real-time Torque Feedback
-
Efficient On-Device Diffusion LLM Inference with Mobile NPU
-
VHDLSuite: Unified Pipeline for LLM VHDL Generation with Data Synthesis and Evaluation
-
$μ_0$: A Scalable 3D Interaction-Trace World Model
-
Surflo: Consistent 3D Surface Flow Model with Global State
-
MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling
-
MiniMax Sparse Attention
-
Toward Generalist Autonomous Research via Hypothesis-Tree Refinement
