Papers
-
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs
-
MedKCO: Medical Vision-Language Pretraining via Knowledge-Driven Cognitive Orchestration
-
Probabilistic Hysteresis Factor Prediction for Electric Vehicle Batteries with Graphite Anodes Containing Silicon
-
Training-free Motion Factorization for Compositional Video Generation
-
Composed Vision-Language Retrieval for Skin Cancer Case Search via Joint Alignment of Global and Local Representations
-
VIVID-Med: LLM-Supervised Structured Pretraining for Deployable Medical ViTs
-
Progressive Representation Learning for Multimodal Sentiment Analysis with Incomplete Modalities
-
PM-Nav: Priori-Map Guided Embodied Navigation in Functional Buildings
-
DexHiL: A Human-in-the-Loop Framework for Vision-Language-Action Model Post-Training in Dexterous Manipulation
-
QUSR: Quality-Aware and Uncertainty-Guided Image Super-Resolution Diffusion Model
-
Chaotic Dynamics in Multi-LLM Deliberation
-
ADVERSA: Measuring Multi-Turn Guardrail Degradation and Judge Reliability in Large Language Models
-
Transformer-Based Multi-Region Segmentation and Radiomic Analysis of HR-pQCT Imaging for Osteoporosis Classification
-
Rotation Equivariant Mamba for Vision Tasks
-
Agentic AI as a Network Control-Plane Intelligence Layer for Federated Learning over 6G
-
Causally Sufficient and Necessary Feature Expansion for Class-Incremental Learning
-
RTFDNet: Fusion-Decoupling for Robust RGB-T Segmentation
-
Deep Tabular Research via Continual Experience-Driven Execution
-
DataFactory: Collaborative Multi-Agent Framework for Advanced Table Question Answering
-
Bioalignment: Measuring and Improving LLM Disposition Toward Biological Systems for AI Safety
-
Real-Time Trust Verification for Safe Agentic Actions using TrustBench
-
RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning
-
Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL
-
POLISH'ing the Sky: Wide-Field and High-Dynamic Range Interferometric Image Reconstruction with Application to Strong Lens Discovery
-
GIAT: A Geologically-Informed Attention Transformer for Lithology Identification
-
Improving Search Agent with One Line of Code
-
Better Bounds for the Distributed Experts Problem
-
Progressive Split Mamba: Effective State Space Modelling for Image Restoration
-
Point Cloud as a Foreign Language for Multi-modal Large Language Model
-
Differentiable Stochastic Traffic Dynamics: Physics-Informed Generative Modelling in Transportation
-
Dissecting Chronos: Sparse Autoencoders Reveal Causal Feature Hierarchies in Time Series Foundation Models
-
DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization
-
Latent-DARM: Bridging Discrete Diffusion And Autoregressive Models For Reasoning
-
DEO: Training-Free Direct Embedding Optimization for Negation-Aware Retrieval
-
The Costs of Reproducibility in Music Separation Research: a Replication of Band-Split RNN
-
Explainable Innovation Engine: Dual-Tree Agent-RAG with Methods-as-Nodes and Verifiable Write-Back
-
$P^2$GNN: Two Prototype Sets to boost GNN Performance
-
The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness
-
The Radio-Frequency Transformer for Signal Separation
-
Evaluate-as-Action: Self-Evaluated Process Rewards for Retrieval-Augmented Agents
-
Emotion is Not Just a Label: Latent Emotional Factors in LLM Processing
-
Strategically Robust Multi-Agent Reinforcement Learning with Linear Function Approximation
-
Abundant Intelligence and Deficient Demand: A Macro-Financial Stress Test of Rapid AI Adoption
-
Geometry-Aware Metric Learning for Cross-Lingual Few-Shot Sign Language Recognition on Static Hand Keypoints
-
PrivPRISM: Automatically Detecting Discrepancies Between Google Play Data Safety Declarations and Developer Privacy Policies
-
Why LLMs Fail: A Failure Analysis and Partial Success Measurement for Automated Security Patch Generation
-
SPAR-K: Scheduled Periodic Alternating Early Exit for Spoken Language Models
-
TubeMLLM: A Foundation Model for Topology Knowledge Exploration in Vessel-like Anatomy
-
Embodied Human Simulation for Quantitative Design and Analysis of Interactive Robotics
-
Distributed Convolutional Neural Networks for Object Recognition
