Papers
-
F2HDR: Two-Stage HDR Video Reconstruction via Flow Adapter and Physical Motion Modeling
-
UNICORN: Ultrasound Nakagami Imaging via Score Matching and Adaptation for Assessing Hepatic Steatosis
-
Directional Routing in Transformers
-
Workflow-Aware Structured Layer Decomposition for Illustration Production
-
Masked BRep Autoencoder via Hierarchical Graph Transformer
-
Video-CoE: Reinforcing Video Event Prediction via Chain of Events
-
Relevance Feedback in Text-to-Image Diffusion: A Training-Free And Model-Agnostic Interactive Framework
-
LLM as Graph Kernel: Rethinking Message Passing on Text-Rich Graphs
-
FAR-Drive: Frame-AutoRegressive Video Generation in Closed-Loop Autonomous Driving
-
Intelligent Control of Differential Drive Robots Subject to Unmodeled Dynamics with EKF-based State Estimation
-
RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting
-
KGS-GCN: Enhancing Sparse Skeleton Sensing via Kinematics-Driven Gaussian Splatting and Probabilistic Topology for Action Recognition
-
Ultra-Early Prediction of Tipping Points: Integrating Dynamical Measures with Reservoir Computing
-
Spiking Layer-Adaptive Magnitude-based Pruning
-
FairMed-XGB: A Bayesian-Optimised Multi-Metric Framework with Explainability for Demographic Equity in Critical Healthcare Data
-
Bridging Scene Generation and Planning: Driving with World Model via Unifying Vision and Motion Representation
-
GT-PCQA: Geometry-Texture Decoupled Point Cloud Quality Assessment with MLLM
-
Pansharpening for Thin-Cloud Contaminated Remote Sensing Images: A Unified Framework and Benchmark Dataset
-
Omni IIE Bench: Benchmarking the Practical Capabilities of Image Editing Models
-
Learning Question-Aware Keyframe Selection with Synthetic Supervision for Video Question Answering
-
SFedHIFI: Fire Rate-Based Heterogeneous Information Fusion for Spiking Federated Learning
-
CyCLeGen: Cycle-Consistent Layout Prediction and Image Generation in Vision Foundation Models
-
Lightweight User-Personalization Method for Closed Split Computing
-
GeoNVS: Geometry Grounded Video Diffusion for Novel View Synthesis
-
This Is Taking Too Long -- Investigating Time as a Proxy for Energy Consumption of LLMs
-
Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework
-
Voronoi-based Second-order Descriptor with Whitened Metric in LiDAR Place Recognition
-
Why Agents Compromise Safety Under Pressure
-
Anchoring Emotions in Text: Robust Multimodal Fusion for Mimicry Intensity Estimation
-
Beyond Benchmark Islands: Toward Representative Trustworthiness Evaluation for Agentic AI
-
MMSpec: Benchmarking Speculative Decoding for Vision-Language Models
-
Exposing Cross-Modal Consistency for Fake News Detection in Short-Form Videos
-
OrgForge: A Multi-Agent Simulation Framework for Verifiable Synthetic Corporate Corpora
-
Thermal Image Refinement with Depth Estimation using Recurrent Networks for Monocular ORB-SLAM3
-
How Log-Barrier Helps Exploration in Policy Optimization
-
MONET: Modeling and Optimization of neural NEtwork Training from Edge to Data Centers
-
Edit2Interp: Adapting Image Foundation Models from Spatial Editing to Video Frame Interpolation with Few-Shot Learning
-
Pretraining and Benchmarking Modern Encoders for Latvian
-
Empowering Chemical Structures with Biological Insights for Scalable Phenotypic Virtual Screening
-
Clue Matters: Leveraging Latent Visual Clues to Empower Video Reasoning
-
TrajFlow: Nation-wide Pseudo GPS Trajectory Generation with Flow Matching Models
-
Molecular Identifier Visual Prompt and Verifiable Reinforcement Learning for Chemical Reaction Diagram Parsing
-
Riemannian Motion Generation: A Unified Framework for Human Motion Representation and Generation via Riemannian Flow Matching
-
Consequentialist Objectives and Catastrophe
-
Reference-Free Omnidirectional Stereo Matching via Multi-View Consistency Maximization
-
MER-Bench: A Comprehensive Benchmark for Multimodal Meme Reappraisal
-
Describing Agentic AI Systems with C4: Lessons from Industry Projects
-
One CT Unified Model Training Framework to Rule All Scanning Protocols
-
Training-free Detection of Generated Videos via Spatial-Temporal Likelihoods
-
VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining
