Papers
-
Vessel-Aware Deep Learning for OCTA-Based Detection of AMDStony Brook University
-
LucidNFT: LR-Anchored Multi-Reward Preference Optimization for Generative Real-World Super-ResolutionHong Kong University of Science and Technology
-
Energy-Driven Adaptive Visual Token Pruning for Efficient Vision-Language ModelsHong Kong University of Science and Technology
-
Unify the Views: View-Consistent Prototype Learning for Few-Shot SegmentationTongji University
-
Who We Are, Where We Are: Mental Health at the Intersection of Person, Situation, and Large Language ModelsOslo Metropolitan University, Stony Brook University, University of Texas
-
Domain-Adaptive Model Merging across Disconnected ModesNanchang University, Peking University, Southeast University, Tongji University
-
OVGGT: O(1) Constant-Cost Streaming Visual Geometry TransformerNational Taiwan University, National Taiwan University of Science and Technology
-
Omni-Masked Gradient Descent: Memory-Efficient Optimization via Mask Traversal with Improved ConvergencePeking University
-
Exploring Open-Vocabulary Object Recognition in Images using CLIPIwate Prefectural University
-
Skeleton-to-Image Encoding: Enabling Skeleton Representation Learning via Vision-Pretrained ModelsHebei University of Technology, KTH Royal Institute of Technology, Lancaster University, Nanyang Technological University, Shenzen MSU-BIT University, VinUniversity
-
CR-QAT: Curriculum Relational Quantization-Aware Training for Open-Vocabulary Object DetectionIncheon National University, Korea Advanced Institute of Science & Technology, University of Seoul
-
PROBE: Probabilistic Occupancy BEV Encoding with Analytical Translation Robustness for 3D Place Recognition
-
Imagine How To Change: Explicit Procedure Modeling for Change CaptioningAalto University, Chinese Academy of Sciences, Sichuan University, University of Chinese Academy of Sciences
-
Breaking Smooth-Motion Assumptions: A UAV Benchmark for Multi-Object Tracking in Complex and Adverse ConditionsXidian University
-
Towards High-resolution and Disentangled Reference-based Sketch ColorizationThe University of Tokyo, Waseda University
-
An Interactive Multi-Agent System for Evaluation of New Product Concepts
-
HarvestFlex: Strawberry Harvesting via Vision-Language-Action Policy Adaptation in the WildBeijing Academy of Agriculture and Forestry Sciences, ShanghaiTech University
-
Agent Hunt: Bounty Based Collaborative Autoformalization With LLM AgentsAI4REASON Institute, Chalmers University of Technology, The University of Melbourne, University of Gothenburg
-
Technical Report: Automated Optical Inspection of Surgical InstrumentsNational University of Computer and Emerging Sciences Islamabad
-
Rank-Factorized Implicit Neural Bias: Scaling Super-Resolution Transformer with FlashAttentionUniversity of Seoul
-
TADPO: Reinforcement Learning Goes Off-roadCarnegie Mellon University
-
Track-SQL: Enhancing Generative Language Models with Dual-Extractive Modules for Schema and Context Tracking in Multi-turn Text-to-SQLGuangdong Laboratory of Artificial Intelligence and Digital Economy, Guangdong University of Technology, Peng Cheng Laboratory, Shantou University
-
MM-ISTS: Cooperating Irregularly Sampled Time Series Forecasting with Multimodal Vision-Text LLMsAcademy of Sciences Hong Kong, East China Normal University, The Hong Kong Polytechnic University
-
RePer-360: Releasing Perspective Priors for 360$^\circ$ Depth Estimation via Self-Modulation
-
Restoring Linguistic Grounding in VLA Models via Train-Free Attention RecalibrationFudan University, Singapore Management University, Tsinghua University
-
Demystifying KAN for Vision Tasks: The RepKAN ApproachSejong University
-
EvoESAP: Non-Uniform Expert Pruning for Sparse MoEMohamed bin Zayed University of Artificial Intelligence, Westlake University, Zhejiang University
-
MASFactory: A Graph-centric Framework for Orchestrating LLM-Based Multi-Agent Systems with Vibe GraphingBeijing University of Posts and Telecommunications, Shanghai Jiao Tong University
-
Preventing Learning Stagnation in PPO by Scaling to 1 Million Parallel Environments
-
EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation
-
MOSIV: Multi-Object System Identification from VideosInsta360 / Carnegie Mellon University, ETH Zurich, Georgia Tech, Harvard University, University of California, University of Illinois Urbana-Champaign
-
ViewFusion: Structured Spatial Thinking Chains for Multi-View ReasoningHong Kong University of Science and Technology, University of California, University of Queenland
-
Sensitivity-Aware Retrieval-Augmented Intent ClarificationUniversity of Amsterdam
-
Agnostic learning in (almost) optimal time via Gaussian surface areaETH Zurich, University of Amsterdam
-
Improved high-dimensional estimation with Langevin dynamics and stochastic weight averagingHarvard University, Princeton University, University of California
-
ResearchEnvBench: Benchmarking Agents on Environment Synthesis for Research Code ExecutionFudan University, Jilin University, Nanjing University, OpenMOSS, Shanghai Innovation Institution, Shanghai Key Laboratory of Multimodal Embodied AI, Wuhan University
-
StruVis: Enhancing Reasoning-based Text-to-Image Generation via Thinking with Structured VisionAntGroup / East China Normal University, Hong Kong University of Science and Technology, Shanghai Jiao Tong University
-
ViroGym: Realistic Large-Scale Benchmarks for Evaluating Viral Proteins
-
Occlusion-Aware SORT: Observing Occlusion for Robust Multi-Object TrackingChinese Academy of Sciences, Sichuan University
-
Ensemble Learning with Sparse HypercolumnsDublin City University
-
Heterogeneous Decentralized Diffusion ModelsBagel Lab
-
Improved Constrained Generation by Bridging Pretrained Generative Models
-
FontUse: A Data-Centric Approach to Style- and Use-Case-Conditioned In-Image TypographyUniversity of Tsukuba
-
Stabilizing Reinforcement Learning for Diffusion Language Models
-
Learning to Generate via Understanding: Understanding-Driven Intrinsic Rewarding for Unified Multimodal ModelsBaidu / Chinese Academy of Sciences, Peking University, Sun Yat-sen University, University of Chinese Academy of Sciences
-
GenHOI: Towards Object-Consistent Hand-Object Interaction with Temporally Balanced and Spatially Selective Object Injection
-
Devil is in Narrow Policy: Unleashing Exploration in Driving VLA Models
-
Probing Visual Concepts in Lightweight Vision-Language Models for Automated DrivingUniversity of Limerick
-
TempoSyncDiff: Distilled Temporally-Consistent Diffusion for Low-Latency Audio-Driven Talking Head GenerationGargi Memorial Institute of Technology, Variable Energy Cyclotron Centre
-
Transforming Omnidirectional RGB-LiDAR data into 3D Gaussian SplattingState University of New York
