Papers
-
DyQ-VLA: Temporal-Dynamic-Aware Quantization for Embodied Vision-Language-Action ModelsBeijing Normal University, Peking University, South China University of Technology
-
Long-Short Term Agents for Pure-Vision Bronchoscopy Robotic AutonomyShanghai Chest Hospital, Shanghai Jiao Tong University
-
Beyond Heuristic Prompting: A Concept-Guided Bayesian Framework for Zero-Shot Image RecognitionCity University of Hong Kong, Harbin Institute of Technology, Nanyang Technological University
-
Geometric Transformation-Embedded Mamba for Learned Video CompressionXi’an Jiaotong University
-
Ares: Adaptive Reasoning Effort Selection for Efficient LLM AgentsCenter for Advanced AI, Accenture, University of California, Santa Barbara
-
Rel-MOSS: Towards Imbalanced Relational Deep Learning on Relational DatabasesCentral South University, National Super Computing Center, School of Information and Communication Technology, The Hong Kong Polytechnic University
-
Enhancing Unregistered Hyperspectral Image Super-Resolution via Unmixing-based Abundance Fusion LearningBeijing Institute of Technology, Hangzhou Dianzi University
-
RLPR: Radar-to-LiDAR Place Recognition via Two-Stage Asymmetric Cross-Modal Alignment for Autonomous DrivingBeijing Institute of Technology, Shanghai Jiao Tong University
-
Robust Transfer Learning with Side InformationUniversity of Central Florida
-
Semantic Risk Scoring of Aggregated Metrics: An AI-Driven Approach for Healthcare Data GovernanceUniversity of Texas
-
SPREAD: Subspace Representation Distillation for Lifelong Imitation LearningQueensland University of Technology, Robotics and Autonomous Systems Group
-
IMSE: Intrinsic Mixture of Spectral Experts Fine-tuning for Test-Time Adaptation
-
SWE-Fuse: Empowering Software Agents via Issue-free Trajectory Learning and Entropy-aware RLVR TrainingAnt Group, The Chinese University of Hong Kong
-
A Hybrid Vision Transformer Approach for Mathematical Expression RecognitionVietnam National University Ho Chi Minh City, Viettel Cyberspace Center
-
BRIDGE: Benchmark for multi-hop Reasoning In long multimodal Documents with Grounded EvidenceThe University of Melbourne, The University of Western Australia
-
Text to Automata Diagrams: Comparing TikZ Code Generation with Direct Image SynthesisMarshall University, West Virginia State University
-
$L^3$:Scene-agnostic Visual Localization in the WildHunan University
-
AI Agents, Language, Deep Learning and the Next Revolution in ScienceUniversity of Chinese Academy of Sciences, University of the Witwatersrand
-
ELLMob: Event-Driven Human Mobility Generation with Self-Aligned LLM FrameworkInstitute of Science Tokyo, The University of Osaka, The University of Tokyo
-
VisualAD: Language-Free Zero-Shot Anomaly Detection via Vision TransformerNational University of Defense Technology, School of Artificial Intelligence, Anhui University, State Key Laboratory of Opto-Electronic Information Acquisition and Protection Technology, Anhui University
-
RL unknotter, hard unknots and unknotting numberAustralian Research Council, Neapolis University Pafos, The University of Sydney
-
PSTNet: Physically-Structured Turbulence NetworkHong Kong University of Science and Technology, University of Technology Sydney
-
SGG-R$^{\rm 3}$: From Next-Token Prediction to End-to-End Unbiased Scene Graph GenerationBeijing University of Posts and Telecommunications, Peking University
-
Local Constrained Bayesian OptimizationChinese Academy of Sciences, National University of Singapore
-
Listening with the Eyes: Benchmarking Egocentric Co-Speech Grounding across Space and TimeBeijing Jiaotong University, Foundation Model Research Center, Institute of Automation, Chinese Academy of Science, Harbin Institute of Technology, Tencent Robotics X, The Hong Kong University of Science and Technology
-
Advancing Automated Algorithm Design via Evolutionary Stagewise Design with LLMsHuawei Noah’s Ark Lab, Nanjing University
-
Adaptive Collaboration with Humans: Metacognitive Policy Optimization for Multi-Agent LLMs with Continual LearningUniversity of Southern California
-
VORL-EXPLORE: A Hybrid Learning Planning Approach to Multi-Robot Exploration in Dynamic EnvironmentsKarlsruhe Institute of Technology, Nanjing University, The Chinese University of Hong Kong, The University of Western Australia
-
Scaling Machine Learning Interatomic Potentials with Mixtures of ExpertsDP Technology / AI for Science Institute, National Key Laboratory of Computational Physics, Institute of Applied Physics and Computational Mathematics, Peking University
-
OSExpert: Computer-Use Agents Learning Professional Skills via ExplorationStevens Institute of Technology, University of Illinois Urbana-Champaign
-
Emergence is Overrated: AGI as an Archipelago of ExpertsAustralian National University, Machine Intelligence and Normative Theory Lab, Research School of Social Sciences
-
Extend Your Horizon: A Device-Agnostic Surgical Tool Tracking Framework with Multi-View Optimization for Augmented RealityJohns Hopkins University, University of Arkansas
-
On the Feasibility and Opportunity of Autoregressive 3D Object DetectionBoston University, Cornell University, Stanford University, The Ohio State University
-
TeamHOI: Learning a Unified Policy for Cooperative Human-Object Interactions with Any Team Size
-
AutoTraces: Autoregressive Trajectory Forecasting via Multimodal Large Language ModelsSoutheast University
-
MJ1: Multimodal Judgment via Grounded VerificationHaize Labs
-
CMMR-VLN: Vision-and-Language Navigation via Continual Multimodal Memory RetrievalBeihang University, Northeastern University
-
Aero-Promptness: Drag-Aware Aerodynamic Manipulability for Propeller-driven VehiclesSapienza University of Rome, University of Twente
-
SmartThinker: Progressive Chain-of-Thought Length Calibration for Efficient Large Language Model ReasoningBeihang University, Shanghai Jiao Tong University
-
Amortizing Maximum Inner Product Search with Learned Support Functions
-
ViSA-Enhanced Aerial VLN: A Visual-Spatial Reasoning Enhanced Framework for Aerial Vision-Language Navigation
-
It's Time to Get It Right: Improving Analog Clock Reading and Clock-Hand Spatial Reasoning in Vision-Language ModelsIncheon National University, McGill University
-
PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation AgentsHuawei Research, Multimedia Laboratory at The Chinese University of Hong Kong, Nankai University
-
FedMomentum: Preserving LoRA Training Momentum in Federated Fine-TuningPurdue University, Queen’s University Belfast, Rice University, Shanghai Jiao Tong University, Stevens Institute of Technology
-
Alignment-Process-Outcome: Rethinking How AIs and Humans CollaborateGeorge Mason University, Simon Fraser University, The Hong Kong University of Science and Technology
-
Missing No More: Dictionary-Guided Cross-Modal Image Fusion under Missing InfraredHefei University of Technology, Kunming University of Science and Technology
-
VSDiffusion: Taming Ill-Posed Shadow Generation via Visibility-Constrained DiffusionEast China University of Science and Technology
-
AffordGrasp: Cross-Modal Diffusion for Affordance-Aware Grasp SynthesisShanghai Engineering Research Center of Intelligent Vision and Imaging, ShanghaiTech University, University of Science and Technology of China
-
Capacity-Aware Mixture Law Enables Efficient LLM Data OptimizationShanghai Qizhi Institute, Tsinghua University
-
Not Like Transformers: Drop the Beat Representation for Dance Generation with Mamba-Based Diffusion ModelDaegu Gyeongbuk Institute of Science and Technology, Ulsan National Institute of Science and Technology
