Papers
-
Inevitable Encounters: Backdoor Attacks Involving Lossy Compression
-
TransDex: Pre-training Visuo-Tactile Policy with Point Cloud Reconstruction for Dexterous Manipulation of Transparent Objects
-
On Interpolation Formulas Describing Neural Network Generalization
-
Zero-Forgetting CISS via Dual-Phase Cognitive Cascades
-
Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs
-
GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent
-
Scribe Verification in Chinese manuscripts using Siamese, Triplet, and Vision Transformer Neural Networks
-
Step-CoT: Stepwise Visual Chain-of-Thought for Medical Visual Question Answering
-
Dual-Strategy Improvement of YOLOv11n for Multi-Scale Object Detection in Remote Sensing Images
-
Benchmarking the Energy Cost of Assurance in Neuromorphic Edge Robotics
-
SCoCCA: Multi-modal Sparse Concept Decomposition via Canonical Correlation Analysis
-
Multi-Modal Character Localization and Extraction for Chinese Text Recognition
-
Large Language Models Reproduce Racial Stereotypes When Used for Text Annotation
-
UVLM: A Universal Vision-Language Model Loader for Reproducible Multimodal Benchmarking
-
Robust Self-Training with Closed-loop Label Correction for Learning from Noisy Labels
-
MO-SAE:Multi-Objective Stacked Autoencoders Optimization for Edge Anomaly Detection
-
CT-Conditioned Diffusion Prior with Physics-Constrained Sampling for PET Super-Resolution
-
Distributed Acoustic Sensing for Urban Traffic Monitoring: Spatio-Temporal Attention in Recurrent Neural Networks
-
Pixel-level Scene Understanding in One Token: Visual States Need What-is-Where Composition
-
LineMaster Pro: A Low-Cost Intelligent Line Following Robot with PID Control and Ultrasonic Obstacle Avoidance for Educational Robotics
-
FedPBS: Proximal-Balanced Scaling Federated Learning Model for Robust Personalized Training for Non-IID Data
-
Scene Generation at Absolute Scale: Utilizing Semantic and Geometric Guidance From Text for Accurate and Interpretable 3D Indoor Scene Generation
-
AgriChat: A Multimodal Large Language Model for Agriculture Image Understanding
-
The Phenomenology of Hallucinations
-
Towards Stable Self-Supervised Object Representations in Unconstrained Egocentric Video
-
Evaluation of Visual Place Recognition Methods for Image Pair Retrieval in 3D Vision and Robotics
-
OpenCOOD-Air: Prompting Heterogeneous Ground-Air Collaborative Perception with Spatial Conversion and Offset Prediction
-
Generative Inverse Design of Cold Metals for Low-Power Electronics
-
SmoothVLA: Aligning Vision-Language-Action Models with Physical Constraints via Intrinsic Smoothness Optimization
-
Close to Reality: Interpretable and Feasible Data Augmentation for Imbalanced Learning
-
Discriminative Flow Matching Via Local Generative Predictors
-
True 4-Bit Quantized Convolutional Neural Network Training on CPU: Achieving Full-Precision Parity
-
OmniCompliance-100K: A Multi-Domain, Rule-Grounded, Real-World Safety Compliance Dataset
-
Iterative Semantic Reasoning from Individual to Group Interests for Generative Recommendation with LLMs
-
GroupGuard: A Framework for Modeling and Defending Collusive Attacks in Multi-Agent Systems
-
Bidirectional Cross-Attention Fusion of High-Res RGB and Low-Res HSI for Multimodal Automated Waste Sorting
-
Sat-JEPA-Diff: Bridging Self-Supervised Learning and Generative Diffusion for Remote Sensing
-
ToolFlood: Beyond Selection -- Hiding Valid Tools from LLM Agents via Semantic Covering
-
DCP-CLIP:A Coarse-to-Fine Framework for Open-Vocabulary Semantic Segmentation with Dual Interaction
-
LLM-Guided Reinforcement Learning for Audio-Visual Speech Enhancement
-
EviAgent: Evidence-Driven Agent for Radiology Report Generation
-
GenLie: A Global-Enhanced Lie Detection Network under Sparsity and Semantic Interference
-
IMS3: Breaking Distributional Aggregation in Diffusion-Based Dataset Distillation
-
USIS-PGM: Photometric Gaussian Mixtures for Underwater Salient Instance Segmentation
-
sebis at ArchEHR-QA 2026: How Much Can You Do Locally? Evaluating Grounded EHR QA on a Single Notebook
-
VID-AD: A Dataset for Image-Level Logical Anomaly Detection under Vision-Induced Distraction
-
vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models
-
EchoLVFM: One-Step Video Generation via Latent Flow Matching for Echocardiogram Synthesis
-
Leveraging a Statistical Shape Model for Efficient Generation of Annotated Training Data: A Case Study on Liver Landmarks Segmentation
-
Shapes are not enough: CONSERVAttack and its use for finding vulnerabilities and uncertainties in machine learning applications
