Papers
-
Anchored Alignment: Preventing Positional Collapse in Multimodal Recommender Systems
-
On Using Machine Learning to Early Detect Catastrophic Failures in Marine Diesel Engines
-
VecMol: Vector-Field Representations for 3D Molecule Generation
-
SRAM-Based Compute-in-Memory Accelerator for Linear-decay Spiking Neural Networks
-
ToolTree: Efficient LLM Agent Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning
-
MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization
-
TaoBench: Do Automated Theorem Prover LLMs Generalize Beyond MathLib?
-
Music Source Restoration with Ensemble Separation and Targeted Reconstruction
-
Thinking in Dynamics: How Multimodal Large Language Models Perceive, Track, and Reason Dynamics in Physical 4D World
-
Modality-free Graph In-context Alignment
-
SLICE: Semantic Latent Injection via Compartmentalized Embedding for Image Watermarking
-
Show, Don't Tell: Detecting Novel Objects by Watching Human Videos
-
Taming the Long Tail: Efficient Item-wise Sharpness-Aware Minimization for LLM-based Recommender Systems
-
A Method for Learning Large-Scale Computational Construction Grammars from Semantically Annotated Corpora
-
AI Model Modulation with Logits Redistribution
-
FC-Track: Overlap-Aware Post-Association Correction for Online Multi-Object Tracking
-
SAP: Segment Any 4K Panorama
-
HIFICL: High-Fidelity In-Context Learning for Multimodal Tasks
-
TerraFlow: Multimodal, Multitemporal Representation Learning for Earth Observation
-
CtrlAttack: A Unified Attack on World-Model Control in Diffusion Models
-
SAVA-X: Ego-to-Exo Imitation Error Detection via Scene-Adaptive View Alignment and Bidirectional Cross View Fusion
-
Catalyst4D: High-Fidelity 3D-to-4D Scene Editing via Dynamic Propagation
-
SectEval: Evaluating the Latent Sectarian Preferences of Large Language Models
-
PVI: Plug-in Visual Injection for Vision-Language-Action Models
-
Empowering Semantic-Sensitive Underwater Image Enhancement with VLM
-
The RIGID Framework: Research-Integrated, Generative AI-Mediated Instructional Design
-
Upper Bounds for Local Learning Coefficients of Three-Layer Neural Networks
-
Generalized Recognition of Basic Surgical Actions Enables Skill Assessment and Vision-Language-Model-based Surgical Planning
-
Think and Answer ME: Benchmarking and Exploring Multi-Entity Reasoning Grounding in Remote Sensing
-
Coherent Human-Scene Reconstruction from Multi-Person Multi-View Video in a Single Pass
-
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation
-
A Fractional Fox H-Function Kernel for Support Vector Machines: Robust Classification via Weighted Transmutation Operators
-
SteerRM: Debiasing Reward Models via Sparse Autoencoders
-
Spectral Defense Against Resource-Targeting Attack in 3D Gaussian Splatting
-
What Makes VLMs Robust? Towards Reconciling Robustness and Accuracy in Vision-Language Models
-
GLEAM: A Multimodal Imaging Dataset and HAMM for Glaucoma Classification
-
A Multi-task Large Reasoning Model for Molecular Science
-
OARS: Process-Aware Online Alignment for Generative Real-World Image Super-Resolution
-
Context is all you need: Towards autonomous model-based process design using agentic AI in flowsheet simulations
-
Residual SODAP: Residual Self-Organizing Domain-Adaptive Prompting with Structural Knowledge Preservation for Continual Learning
-
Adaptive Vision-Language Model Routing for Computer Use Agents
-
NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval
-
Rethinking Multiple-Choice Questions for RLVR: Unlocking Potential via Distractor Design
-
From AI Weather Prediction to Infrastructure Resilience: A Correction-Downscaling Framework for Tropical Cyclone Impacts
-
coDrawAgents: A Multi-Agent Dialogue Framework for Compositional Image Generation
-
Hierarchical Dual-Change Collaborative Learning for UAV Scene Change Captioning
-
Mask2Flow-TSE: Two-Stage Target Speaker Extraction with Masking and Flow Matching
-
DAST: A Dual-Stream Voice Anonymization Attacker with Staged Training
-
Multimodal Protein Language Models for Enzyme Kinetic Parameters: From Substrate Recognition to Conformational Adaptation
-
Hierarchical Reference Sets for Robust Unsupervised Detection of Scattered and Clustered Outliers
