Papers
-
Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining
-
Haitao Lin
-
How to Train Your Deep Research Agent? Prompt, Reward, and Policy Optimization in Search-R1
-
Event-Triggered Gossip for Distributed Learning
-
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning
-
Discovering Multiagent Learning Algorithms with Large Language Models
-
HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation
-
ASTRA-bench: Evaluating Tool-Use Agent Reasoning and Action Planning with Personal User Context
-
Wink: Recovering from Misbehaviors in Coding Agents
-
Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space
-
Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera ControlStanford University
-
The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning
-
SARAH: Spatially Aware Real-time Agentic Humans
-
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
-
Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment
-
El Agente Gráfico: Structured Execution Graphs for Scientific Agents
-
Unified Latents (UL): How to train your latents
-
Multi-agent cooperation through in-context co-player inference
-
Factored Latent Action World Models
-
Tuning-free Visual Effect Transfer across Videos
-
EVMbench: Evaluating AI Agents on Smart Contract Security
-
EgoScale: Scaling Dexterous Manipulation with Diverse Egocentric Human Data
-
Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching
-
jina-embeddings-v5-text: Task-Targeted Embedding Distillation
-
World Action Models are Zero-shot Policies
-
On Surprising Effectiveness of Masking Updates in Adaptive Optimizers
-
GLM-5: from Vibe Coding to Agentic Engineering
-
Image Generation with a Sphere Encoder
-
GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training
-
OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention
-
BitDance: Scaling Autoregressive Generative Models with Binary Tokens
-
Experiential Reinforcement Learning
-
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
-
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
-
Hippocampus: An Efficient and Scalable Memory Module for Agentic AI
-
Joint Time Series Chain: Detecting Unusual Evolving Trend across Time Series
-
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation
-
Unifying Ranking and Generation in Query Auto-Completion via Retrieval-Augmented Generation and Multi-Objective Alignment
-
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning
-
WizardLM: Empowering large pre-trained language models to follow complex instructions
-
Florence: A New Foundation Model for Computer Vision
-
Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution
-
GISA: A Benchmark for General Information-Seeking Assistant
-
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
-
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning
-
Think like a Scientist: Physics-guided LLM Agent for Equation DiscoveryUC San Diego
-
Intelligent AI Delegation
-
AdaptEvolve: Improving Efficiency of Evolutionary AI Agents through Adaptive Model Selection
-
HAIC: Humanoid Agile Object Interaction Control via Dynamics-Aware World Model
-
LUVE : Latent-Cascaded Ultra-High-Resolution Video Generation with Dual Frequency Experts
