Papers
-
Beyond Geometry: Artistic Disparity Synthesis for Immersive 2D-to-3D
-
Pano3DComposer: Feed-Forward Compositional 3D Scene Generation from Single Panoramic Image
-
InfoGatherer: Principled Information Seeking via Evidence Retrieval and Strategic Questioning
-
The World Won't Stay Still: Programmable Evolution for Agent Benchmarks
-
CORE-Seg: Reasoning-Driven Segmentation for Complex Lesions via Reinforcement Learning
-
DeepFact: Co-Evolving Benchmarks and Agents for Deep Research Factuality
-
Stock Market Prediction Using Node Transformer Architecture Integrated with BERT Sentiment Analysis
-
Design Experiments to Compare Multi-armed Bandit Algorithms
-
BlackMirror: Black-Box Backdoor Detection for Text-to-Image Models via Instruction-Response Deviation
-
Learning Next Action Predictors from Human-Computer Interaction
-
Weak-SIGReg: Covariance Regularization for Stable Deep Learning
-
RAC: Rectified Flow Auto Coder
-
Towards Driver Behavior Understanding: Weakly-Supervised Risk Perception in Driving Scenes
-
Addressing the Ecological Fallacy in Larger LMs with Human ContextStony Brook University, Vanderbilt University
-
Beyond Static Frames: Temporal Aggregate-and-Restore Vision Transformer for Human Pose EstimationZhejiang Gongshang University
-
A Persistent-State Dataflow Accelerator for Memory-Bound Linear Attention Decode on FPGAUniversity of Southern California
-
FTSplat: Feed-forward Triangle Splatting NetworkNankai University
-
Implicit Style Conditioning: A Structured Style-Rewrite Framework for Low-Resource Character ModelingGuangdong University of Finance
-
OD-RASE: Ontology-Driven Risk Assessment and Safety Enhancement for Autonomous DrivingChubu University
-
Facial Expression Recognition Using Residual Masking NetworkHo Chi Minh City University of Technology
-
SLER-IR: Spherical Layer-wise Expert Routing for All-in-One Image RestorationSichuan University, University of California, San Diego
-
XAI for Coding Agent Failures: Transforming Raw Execution Traces into Actionable InsightsIslington College
-
Adaptive Radial Projection on Fourier Magnitude Spectrum for Document Image Skew EstimationHo Chi Minh City University of Technology, Vietnam National University Ho Chi Minh City
-
Vessel-Aware Deep Learning for OCTA-Based Detection of AMDStony Brook University
-
LucidNFT: LR-Anchored Multi-Reward Preference Optimization for Generative Real-World Super-ResolutionThe Hong Kong University of Science and Technology
-
Energy-Driven Adaptive Visual Token Pruning for Efficient Vision-Language ModelsThe Hong Kong University of Science and Technology
-
Unify the Views: View-Consistent Prototype Learning for Few-Shot SegmentationTongji University
-
Who We Are, Where We Are: Mental Health at the Intersection of Person, Situation, and Large Language ModelsOslo Metropolitan University, Stony Brook University, University of Texas
-
Domain-Adaptive Model Merging across Disconnected ModesNanchang University, Peking University, Southeast University, Tongji University
-
OVGGT: O(1) Constant-Cost Streaming Visual Geometry TransformerNational Taiwan University, National Taiwan University of Science and Technology
-
Omni-Masked Gradient Descent: Memory-Efficient Optimization via Mask Traversal with Improved ConvergencePeking University
-
Exploring Open-Vocabulary Object Recognition in Images using CLIPIwate Prefectural University
-
Skeleton-to-Image Encoding: Enabling Skeleton Representation Learning via Vision-Pretrained ModelsHebei University of Technology, KTH Royal Institute of Technology, Lancaster University, Nanyang Technological University, Shenzen MSU-BIT University, VinUniversity
-
CR-QAT: Curriculum Relational Quantization-Aware Training for Open-Vocabulary Object DetectionIncheon National University, Korea Advanced Institute of Science & Technology, University of Seoul
-
PROBE: Probabilistic Occupancy BEV Encoding with Analytical Translation Robustness for 3D Place Recognition
-
Imagine How To Change: Explicit Procedure Modeling for Change Captioning
-
Breaking Smooth-Motion Assumptions: A UAV Benchmark for Multi-Object Tracking in Complex and Adverse Conditions
-
Towards High-resolution and Disentangled Reference-based Sketch Colorization
-
An Interactive Multi-Agent System for Evaluation of New Product Concepts
-
HarvestFlex: Strawberry Harvesting via Vision-Language-Action Policy Adaptation in the Wild
-
Agent Hunt: Bounty Based Collaborative Autoformalization With LLM Agents
-
Technical Report: Automated Optical Inspection of Surgical Instruments
-
Rank-Factorized Implicit Neural Bias: Scaling Super-Resolution Transformer with FlashAttention
-
TADPO: Reinforcement Learning Goes Off-road
-
Track-SQL: Enhancing Generative Language Models with Dual-Extractive Modules for Schema and Context Tracking in Multi-turn Text-to-SQL
-
MM-ISTS: Cooperating Irregularly Sampled Time Series Forecasting with Multimodal Vision-Text LLMs
-
RePer-360: Releasing Perspective Priors for 360$^\circ$ Depth Estimation via Self-Modulation
-
Restoring Linguistic Grounding in VLA Models via Train-Free Attention Recalibration
-
Demystifying KAN for Vision Tasks: The RepKAN Approach
-
EvoESAP: Non-Uniform Expert Pruning for Sparse MoE
