Papers
-
TeamHOI: Learning a Unified Policy for Cooperative Human-Object Interactions with Any Team Size
-
AutoTraces: Autoregressive Trajectory Forecasting via Multimodal Large Language ModelsSoutheast University
-
MJ1: Multimodal Judgment via Grounded VerificationHaize Labs
-
CMMR-VLN: Vision-and-Language Navigation via Continual Multimodal Memory RetrievalBeihang University, Northeastern University
-
Aero-Promptness: Drag-Aware Aerodynamic Manipulability for Propeller-driven VehiclesSapienza University of Rome, University of Twente
-
SmartThinker: Progressive Chain-of-Thought Length Calibration for Efficient Large Language Model ReasoningBeihang University, Shanghai Jiao Tong University
-
Amortizing Maximum Inner Product Search with Learned Support Functions
-
ViSA-Enhanced Aerial VLN: A Visual-Spatial Reasoning Enhanced Framework for Aerial Vision-Language Navigation
-
It's Time to Get It Right: Improving Analog Clock Reading and Clock-Hand Spatial Reasoning in Vision-Language ModelsIncheon National University, McGill University
-
PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation AgentsHuawei Research, Multimedia Laboratory at The Chinese University of Hong Kong, Nankai University
-
FedMomentum: Preserving LoRA Training Momentum in Federated Fine-TuningPurdue University, Queen’s University Belfast, Rice University, Shanghai Jiao Tong University, Stevens Institute of Technology
-
Alignment-Process-Outcome: Rethinking How AIs and Humans CollaborateGeorge Mason University, Hong Kong University of Science and Technology, Simon Fraser University
-
Missing No More: Dictionary-Guided Cross-Modal Image Fusion under Missing InfraredHefei University of Technology, Kunming University of Science and Technology
-
VSDiffusion: Taming Ill-Posed Shadow Generation via Visibility-Constrained DiffusionEast China University of Science and Technology
-
AffordGrasp: Cross-Modal Diffusion for Affordance-Aware Grasp SynthesisShanghai Engineering Research Center of Intelligent Vision and Imaging, ShanghaiTech University, University of Science and Technology of China
-
Capacity-Aware Mixture Law Enables Efficient LLM Data OptimizationShanghai Qizhi Institute, Tsinghua University
-
Not Like Transformers: Drop the Beat Representation for Dance Generation with Mamba-Based Diffusion ModelDaegu Gyeongbuk Institute of Science and Technology, Ulsan National Institute of Science and Technology
-
ConflictBench: Evaluating Human-AI Conflict via Interactive and Visually Grounded Environments
-
DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial AttentionSeoul National University
-
Controllable Complex Human Motion Video Generation via Text-to-Skeleton CascadesMunich Center for Machine Learning, Murdoch University, Technical University of Munich, The University of Western Australia
-
QualiTeacher: Quality-Conditioned Pseudo-Labeling for Real-World Image RestorationDuke University, École Polytechnique Fédérale de Lausanne, National University of Singapore, Sun Yat-sen University, Tsinghua University
-
GCGNet: Graph-Consistent Generative Network for Time Series Forecasting with Exogenous VariablesEast China Normal University
-
Solution to the 10th ABAW Expression Recognition Challenge: A Robust Multimodal Framework with Safe Cross-Attention and Modality Dropout
-
CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling
-
S2S-FDD: Bridging Industrial Time Series and Natural Language for Explainable Zero-shot Fault DiagnosisZhejiang University
-
Examining the Role of YouTube Production and Consumption Dynamics on the Formation of Extreme IdeologiesUniversity of Iowa
-
Speed3R: Sparse Feed-forward 3D Reconstruction ModelsBaidu AMU, The University of Hong Kong
-
See and Switch: Vision-Based Branching for Interactive Robot-Skill ProgrammingCzech Institute of Informatics, Robotics, and Cybernetics, Czech Technical University
-
Stabilized Fine-Tuning with LoRA in Federated Learning: Mitigating the Side Effect of Client Size and Rank via the Scaling FactorAgency for Science, Technology and Research, Singapore, Beijing University of Posts and Telecommunications
-
ImageEdit-R1: Boosting Multi-Agent Image Editing via Reinforcement Learning
-
Adversarial Domain Adaptation Enables Knowledge Transfer Across Heterogeneous RNA-Seq DatasetsIBISC Laboratory, University Evry, University Paris-Saclay
-
Enhancing Cross-View UAV Geolocalization via LVLM-Driven Relational ModelingCity University of Hong Kong, Zhejiang University of Technology
-
Evaluating Generative Models via One-Dimensional Code Distributions
-
Deterministic Differentiable Structured Pruning for Large Language ModelsAnt Group, Tsinghua University
-
In-Context Reinforcement Learning for Tool Use in Large Language ModelsNational University of Singapore, Salesforce AI Research, University of California, Berkeley, University of California, Santa Cruz
-
Synthetic Defect Image Generation for Power Line Insulator Inspection Using Multimodal Large Language ModelsWayne State University
-
AgentOS: From Application Silos to a Natural Language-Driven Data EcosystemArizona State University, Clemson University, Duke University, University of Kansas
-
PlayWorld: Learning Robot World Models from Autonomous PlayPrinceton University
-
AtomVLA: Scalable Post-Training for Robotic Manipulation via Predictive Latent World ModelsINFIFORCE Intelligent Technology / Huazhong University of Science and Technology, The University of Hong Kong, Tsinghua University
-
Scale Space DiffusionUniversity of Maryland
-
Agentic Critical TrainingUniversity of Maryland
-
RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic FeedbackNational University of Singapore, Shanghai AI Lab
-
PostTrainBench: Can LLM Agents Automate LLM Post-Training?ELLIS Institute Tübingen, Max Planck Institute for Intelligent Systems, Tübingen AI Center, University of Tübingen
-
\$OneMillion-Bench: How Far are Language Agents from Human Experts?Beijing Institute for General Artificial Intelligence
-
How Far Can Unsupervised RLVR Scale LLM Training?Peking University, Shanghai AI Lab, Shanghai Jiao Tong University, Tsinghua University, University of Illinois Urbana-Champaign, Xi'an Jiaotong University
-
AI Agent Traps
-
Context-Enriched Natural Language Descriptions of Vessel Trajectories
-
From Garbage to Gold: A Data-Architectural Theory of Predictive Robustness
-
Sparsity and Out-of-Distribution GeneralizationUniversity of Texas
-
Feed m Birds with One Scone: Accelerating Multi-task Gradient Balancing via Bi-level Optimization
