Papers
-
Are a Thousand Words Better Than a Single Picture? Beyond Images -- A Framework for Multi-Modal Knowledge Graph Dataset Enrichment
-
GAP-MLLM: Geometry-Aligned Pre-training for Activating 3D Spatial Perception in Multimodal Large Language Models
-
Optimal uncertainty bounds for multivariate kernel regression under bounded noise: A Gaussian process-based dual function
-
DST-Net: A Dual-Stream Transformer with Illumination-Independent Feature Guidance and Multi-Scale Spatial Convolution for Low-Light Image Enhancement
-
On the Emotion Understanding of Synthesized Speech
-
Implementation of tangent linear and adjoint models for neural networks based on a compiler library tool
-
Unlearning for One-Step Generative Models via Unbalanced Optimal Transport
-
ExpressMind: A Multimodal Pretrained Large Language Model for Expressway Operation
-
AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents
-
Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Advancing Time Series Foundation Models
-
From the Inside Out: Progressive Distribution Refinement for Confidence Calibration
-
Rewarding DINO: Predicting Dense Rewards with Vision Foundation Models
-
VIEW2SPACE: Studying Multi-View Visual Reasoning from Sparse Observations
-
FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data
-
An approximate graph elicits detonation lattice
-
Exploring different approaches to customize language models for domain-specific text-to-code generation
-
SympFormer: Accelerated attention blocks via Inertial Dynamics on Density Manifolds
-
Designing for Disagreement: Front-End Guardrails for Assistance Allocation in LLM-Enabled Robots
-
Rethinking Pose Refinement in 3D Gaussian Splatting under Pose Prior and Geometric Uncertainty
-
How often do Answers Change? Estimating Recency Requirements in Question Answering
-
DanceHA: A Multi-Agent Framework for Document-Level Aspect-Based Sentiment Analysis
-
SAMSEM -- A Generic and Scalable Approach for IC Metal Line Segmentation
-
Bridging the Simulation-to-Reality Gap in Electron Microscope Calibration via VAE-EM Estimation
-
CompDiff: Hierarchical Compositional Diffusion for Fair and Zero-Shot Intersectional Medical Image Generation
-
EmoLLM: Appraisal-Grounded Cognitive-Emotional Co-Reasoning in Large Language Models
-
BenchPreS: A Benchmark for Context-Aware Personalized Preference Selectivity of Persistent-Memory LLMs
-
Segmentation-Based Attention Entropy: Detecting and Mitigating Object Hallucinations in Large Vision-Language Models
-
Understanding Cell Fate Decisions with Temporal Attention
-
Deep Learning-Driven Black-Box Doherty Power Amplifier with Pixelated Output Combiner and Extended Efficiency Range
-
VideoMatGen: PBR Materials through Joint Generative Modeling
-
Characterizing Delusional Spirals through Human-LLM Chat Logs
-
Manifold-Matching Autoencoders
-
Deep Tabular Representation Corrector
-
Face2Scene: Using Facial Degradation as an Oracle for Diffusion-Based Scene Restoration
-
Malicious Or Not: Adding Repository Context to Agent Skill Classification
-
Diverging Transformer Predictions for Human Sentence Processing: A Comprehensive Analysis of Agreement Attraction Effects
-
REFORGE: Multi-modal Attacks Reveal Vulnerable Concept Unlearning in Image Generation Models
-
When and Why Does Unsupervised RL Succeed in Mathematical Reasoning? A Manifold Envelopment Perspective
-
V-DyKnow: A Dynamic Benchmark for Time-Sensitive Knowledge in Vision Language Models
-
Trajectory-Optimized Time Reparameterization for Learning-Compatible Reduced-Order Modeling of Stiff Dynamical Systems
-
Runtime Governance for AI Agents: Policies on Paths
-
HistoAtlas: A Pan-Cancer Morphology Atlas Linking Histomics to Molecular Programs and Clinical Outcomes
-
BATQuant: Outlier-resilient MXFP4 Quantization via Learnable Block-wise Optimization
-
On the Transfer of Collinearity to Computer Vision
-
FSMC-Pose: Frequency and Spatial Fusion with Multiscale Self-calibration for Cattle Mounting Pose Estimation
-
Data-driven generalized perimeter control: Zürich case study
-
Rationale Matters: Learning Transferable Rubrics via Proxy-Guided Critique for VLM Reward Models
-
Tarab: A Multi-Dialect Corpus of Arabic Lyrics and Poetry
-
Omnilingual SONAR: Cross-Lingual and Cross-Modal Sentence Embeddings Bridging Massively Multilingual Text and Speech
-
ACPV-Net: All-Class Polygonal Vectorization for Seamless Vector Map Generation from Aerial Imagery
