Papers
-
Well Log-Guided Synthesis of Subsurface Images from Sparse Petrography Data Using cGANs
-
MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants
-
OTPL-VIO: Robust Visual-Inertial Odometry with Optimal Transport Line Association and Adaptive Uncertainty
-
Understanding the Interplay between LLMs' Utilisation of Parametric and Contextual Knowledge: A keynote at ECIR 2025
-
When to Lock Attention: Training-Free KV Control in Video Diffusion
-
FreqCycle: A Multi-Scale Time-Frequency Analysis Method for Time Series Forecasting
-
No evaluation without fair representation : Impact of label and selection bias on the evaluation, performance and mitigation of classification models
-
DiffWind: Physics-Informed Differentiable Modeling of Wind-Driven Object Dynamics
-
VarSplat: Uncertainty-aware 3D Gaussian Splatting for Robust RGB-D SLAM
-
KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization
-
GNNs for Time Series Anomaly Detection: An Open-Source Framework and a Critical Evaluation
-
Logics-Parsing-Omni Technical Report
-
EsoLang-Bench: Evaluating Genuine Reasoning in Large Language Models via Esoteric Programming Languages
-
Improving 3D Foot Motion Reconstruction in Markerless Monocular Human Motion Capture
-
On Catastrophic Forgetting in Low-Rank Decomposition-Based Parameter-Efficient Fine-Tuning
-
Automatic Cardiac Risk Management Classification using large-context Electronic Patients Health Records
-
Fusing Semantic, Lexical, and Domain Perspectives for Recipe Similarity Estimation
-
AutoViVQA: A Large-Scale Automatically Constructed Dataset for Vietnamese Visual Question Answering
-
ESAinsTOD: A Unified End-to-End Schema-Aware Instruction-Tuning Framework for Task-Oriented Dialog Modeling
-
ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning
-
Physics-informed neural operator for predictive parametric phase-field modelling
-
DRIFT: Dual-Representation Inter-Fusion Transformer for Automated Driving Perception with 4D Radar Point Clouds
-
TemporalDoRA: Temporal PEFT for Robust Surgical Video Question Answering
-
Mousse: Rectifying the Geometry of Muon with Curvature-Aware Preconditioning
-
TriFusion-SR: Joint Tri-Modal Medical Image Fusion and SR
-
ProGS: Towards Progressive Coding for 3D Gaussian Splatting
-
Evaluation of LLMs in retrieving food and nutritional context for RAG systems
-
OOD-MMSafe: Advancing MLLM Safety from Harmful Intent to Hidden Consequences
-
From Phase Prediction to Phase Design: A ReAct Agent Framework for High-Entropy Alloy Discovery
-
MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models
-
Does the Question Really Matter? Training-Free Data Selection for Vision-Language SFT
-
AutoAgent: Evolving Cognition and Elastic Memory Orchestration for Adaptive Agents
-
GSStream: 3D Gaussian Splatting based Volumetric Scene Streaming System
-
FrameDiT: Diffusion Transformer with Frame-Level Matrix Attention for Efficient Video Generation
-
RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation
-
ES-dLLM: Efficient Inference for Diffusion Large Language Models by Early-Skipping
-
A Multi-Prototype-Guided Federated Knowledge Distillation Approach in AI-RAN Enabled Multi-Access Edge Computing System
-
EXPLORE-Bench: Egocentric Scene Prediction with Long-Horizon Reasoning
-
FetalAgents: A Multi-Agent System for Fetal Ultrasound Image and Video Analysis
-
$M^2$-Occ: Resilient 3D Semantic Occupancy Prediction for Autonomous Driving with Incomplete Camera Inputs
-
Let's Reward Step-by-Step: Step-Aware Contrastive Alignment for Vision-Language Navigation in Continuous Environments
-
ENIGMA-360: An Ego-Exo Dataset for Human Behavior Understanding in Industrial Scenarios
-
Upper Generalization Bounds for Neural Oscillators
-
LAP: A Language-Aware Planning Model For Procedure Planning In Instructional Videos
-
Beyond Fine-Tuning: Robust Food Entity Linking under Ontology Drift with FoodOntoRAG
-
LogoDiffuser: Training-Free Multilingual Logo Generation and Stylization via Letter-Aware Attention Control
-
PanoAffordanceNet: Towards Holistic Affordance Grounding in 360° Indoor Environments
-
Ego: Embedding-Guided Personalization of Vision-Language Models
-
VCR: Variance-Driven Channel Recalibration for Robust Low-Light Enhancement
-
Removing the Trigger, Not the Backdoor: Alternative Triggers and Latent Backdoors
