Papers
-
MASFactory: A Graph-centric Framework for Orchestrating LLM-Based Multi-Agent Systems with Vibe Graphing
-
Preventing Learning Stagnation in PPO by Scaling to 1 Million Parallel Environments
-
EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation
-
MOSIV: Multi-Object System Identification from Videos
-
ViewFusion: Structured Spatial Thinking Chains for Multi-View Reasoning
-
Sensitivity-Aware Retrieval-Augmented Intent Clarification
-
Agnostic learning in (almost) optimal time via Gaussian surface area
-
Improved high-dimensional estimation with Langevin dynamics and stochastic weight averaging
-
ResearchEnvBench: Benchmarking Agents on Environment Synthesis for Research Code Execution
-
StruVis: Enhancing Reasoning-based Text-to-Image Generation via Thinking with Structured Vision
-
ViroGym: Realistic Large-Scale Benchmarks for Evaluating Viral Proteins
-
Occlusion-Aware SORT: Observing Occlusion for Robust Multi-Object Tracking
-
Ensemble Learning with Sparse Hypercolumns
-
Heterogeneous Decentralized Diffusion Models
-
FontUse: A Data-Centric Approach to Style- and Use-Case-Conditioned In-Image Typography
-
Stabilizing Reinforcement Learning for Diffusion Language Models
-
Learning to Generate via Understanding: Understanding-Driven Intrinsic Rewarding for Unified Multimodal Models
-
GenHOI: Towards Object-Consistent Hand-Object Interaction with Temporally Balanced and Spatially Selective Object Injection
-
Devil is in Narrow Policy: Unleashing Exploration in Driving VLA Models
-
Probing Visual Concepts in Lightweight Vision-Language Models for Automated Driving
-
TempoSyncDiff: Distilled Temporally-Consistent Diffusion for Low-Latency Audio-Driven Talking Head Generation
-
Transforming Omnidirectional RGB-LiDAR data into 3D Gaussian Splatting
-
Text-Driven Emotionally Continuous Talking Face Generation
-
Lifelong Embodied Navigation Learning
-
StreamVoiceAnon+: Emotion-Preserving Streaming Speaker Anonymization via Frame-Level Acoustic Distillation
-
Lyapunov Probes for Hallucination Detection in Large Foundation Models
-
Offline Materials Optimization with CliqueFlowmer
-
Experiences Build Characters: The Linguistic Origins and Functional Impact of LLM Personality
-
DeepSight: Bridging Depth Maps and Language with a Depth-Driven Multimodal Model
-
Enhancing Neural Video Compression of Static Scenes with Positive-Incentive Noise
-
Enhancing Instruction Following of LLMs via Activation Steering with Dynamic Rejection
-
ButterflyViT: 354$\times$ Expert Compression for Edge Vision Transformers
-
Latent Diffusion-Based 3D Molecular Recovery from Vibrational Spectra
-
Making Implicit Premises Explicit in Logical Understanding of Enthymemes
-
Dynamic Momentum Recalibration in Online Gradient Learning
-
FedARKS: Federated Aggregation via Robust and Discriminative Knowledge Selection and Integration for Person Re-identification
-
Diffusion Language Models Are Natively Length-Aware
-
A Hazard-Informed Data Pipeline for Robotics Physical Safety
-
Place-it-R1: Unlocking Environment-aware Reasoning Potential of MLLM for Video Object Insertion
-
Spatial Colour Mixing Illusions as a Perception Stress Test for Vision-Language Models
-
Predictive Coding Graphs are a Superset of Feedforward Neural Networks
-
Longitudinal NSCLC Treatment Progression via Multimodal Generative Models
-
Property-driven Protein Inverse Folding With Multi-Objective Preference Alignment
-
VLM-RobustBench: A Comprehensive Benchmark for Robustness of Vision-Language Models
-
Ensemble Graph Neural Networks for Probabilistic Sea Surface Temperature Forecasting via Input Perturbations
-
Efficient Vector Search in the Wild: One Model for Multi-K Queries
-
Do Compact SSL Backbones Matter for Audio Deepfake Detection? A Controlled Study with RAPTOR
-
Reflective Flow Sampling Enhancement
-
FreeOcc: Training-free Panoptic Occupancy Prediction via Foundation Models
-
A Semi-Supervised Framework for Breast Ultrasound Segmentation with Training-Free Pseudo-Label Generation and Label Refinement
