Papers
-
OAHuman: Occlusion-Aware 3D Human Reconstruction from Monocular Images
-
Mitigating Overthinking in Large Reasoning Language Models via Reasoning Path Deviation Monitoring
-
MistExit: Learning to Exit for Early Mistake Detection in Procedural Videos
-
ZOTTA: Test-Time Adaptation with Gradient-Free Zeroth-Order Optimization
-
ITKIT: Feasible CT Image Analysis based on SimpleITK and MMEngine
-
Automatic Inter-document Multi-hop Scientific QA Generation
-
Sampling Boltzmann distributions via normalizing flow approximation of transport maps
-
Bringing Model Editing to Generative Recommendation in Cold-Start Scenarios
-
MedPriv-Bench: Benchmarking the Privacy-Utility Trade-off of Large Language Models in Medical Open-End Question Answering
-
DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization
-
Beyond Distance: Quantifying Point Cloud Dynamics with Persistent Homology and Dynamic Optimal Transport
-
Toward Clinically Ready Foundation Models in Medical Image Analysis: Adaptation Mechanisms and Deployment Trade-offs
-
Learning in Function Spaces: An Unified Functional Analytic View of Supervised and Unsupervised Learning
-
Controllable Accent Normalization via Discrete Diffusion
-
All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation
-
DC-ViT: Modulating Spatial and Channel Interactions for Multi-Channel Images
-
Multi-Period Texture Contrast Enhancement for Low-Contrast Wafer Defect Detection and Segmentation
-
AEX: Non-Intrusive Multi-Hop Attestation and Provenance for LLM APIs
-
High-Fidelity Compression of Seismic Velocity Models via SIREN Auto-Decoders
-
MorphSNN: Adaptive Graph Diffusion and Structural Plasticity for Spiking Neural Networks
-
Windowed Fourier Propagator: A Frequency-Local Neural Operator for Wave Equations in Inhomogeneous Media
-
RegFormer++: An Efficient Large-Scale 3D LiDAR Point Registration Network with Projection-Aware 2D Transformer
-
Seeking Physics in Diffusion Noise
-
RL-ScanIQA: Reinforcement-Learned Scanpaths for Blind 360°Image Quality Assessment
-
Show Me When and Where: Towards Referring Video Object Segmentation in the Wild
-
4D Synchronized Fields: Motion-Language Gaussian Splatting for Temporal Scene Understanding
-
SemantiCache: Efficient KV Cache Compression via Semantic Chunking and Clustered Merging
-
A Physically-Grounded Attack and Adaptive Defense Framework for Real-World Low-Light Image Enhancement
-
In-Field 3D Wheat Head Instance Segmentation From TLS Point Clouds Using Deep Learning Without Manual Labels
-
Autonomous Agents Coordinating Distributed Discovery Through Emergent Artifact Exchange
-
Mind the Shift: Decoding Monetary Policy Stance from FOMC Statements with Large Language Models
-
Enhancing LLM Training via Spectral Clipping
-
Direct Object-Level Reconstruction via Probabilistic Gaussian Splatting
-
Structure-Dependent Regret and Constraint Violation Bounds for Online Convex Optimization with Time-Varying Constraints
-
Early Failure Detection and Intervention in Video Diffusion Models
-
Personalized Cell Segmentation: Benchmark and Framework for Reference-Guided Cell Type Segmentation
-
How Do Medical MLLMs Fail? A Study on Visual Grounding in Medical Images
-
Learning-to-Defer with Expert-Conditioned Advice
-
ECG-Reasoning-Benchmark: A Benchmark for Evaluating Clinical Reasoning Capabilities in ECG Interpretation
-
AvatarForcing: One-Step Streaming Talking Avatars via Local-Future Sliding-Window Denoising
-
Data-Driven Physics Embedded Dynamics with Predictive Control and Reinforcement Learning for Quadrupeds
-
UAVBench and UAVIT-1M: Benchmarking and Enhancing MLLMs for Low-Altitude UAV Vision-Language Understanding
-
On the Nature of Attention Sink that Shapes Decoding Strategy in MLLMs
-
State-Dependent Safety Failures in Multi-Turn Language Model Interaction
-
Generation of Human Comprehensible Access Control Policies from Audit Logs
-
AgroNVILA: Perception-Reasoning Decoupling for Multi-view Agricultural Multimodal Large Language Models
-
Localizing and Editing Knowledge in Large Audio-Language Models
-
Motivation in Large Language Models
-
Refold: Refining Protein Inverse Folding with Efficient Structural Matching and Fusion
-
Deconfounded Lifelong Learning for Autonomous Driving via Dynamic Knowledge Spaces
MongoDB - Build AI That Scales
