Papers
-
SuperSkillsStack: Agency, Domain Knowledge, Imagination, and Taste in Human-AI Design Education
-
Can Safety Emerge from Weak Supervision? A Systematic Analysis of Small Language Models
-
TEA-Time: Transporting Effects Across Time
-
AutoChecklist: Composable Pipelines for Checklist Generation and Scoring with LLM-as-a-Judge
-
RESCHED: Rethinking Flexible Job Shop Scheduling from a Transformer-based Architecture with Simplified States
-
OV-DEIM: Real-time DETR-Style Open-Vocabulary Object Detection with GridSynthetic Augmentation
-
Hit-RAG: Learning to Reason with Long Contexts via Preference Alignment
-
Enhancing Web Agents with a Hierarchical Memory Tree
-
Language-Aware Distillation for Multilingual Instruction-Following Speech LLMs with ASR-Only Supervision
-
Permutation-Equivariant 2D State Space Models: Theory and Canonical Architecture for Multivariate Time Series
-
Resource-Adaptive Federated Text Generation with Differential Privacy
-
Two Frames Matter: A Temporal Attack for Text-to-Video Model Jailbreaking
-
Targeted Bit-Flip Attacks on LLM-Based Agents
-
Self-Supervised Multi-Modal World Model with 4D Space-Time Embedding
-
Fine-Grained 3D Facial Reconstruction for Micro-Expressions
-
Looking Back and Forth: Cross-Image Attention Calibration and Attentive Preference Learning for Multi-Image Hallucination Mitigation
-
Hindsight Credit Assignment for Long-Horizon LLM Agents
-
Animating Petascale Time-varying Data on Commodity Hardware with LLM-assisted Scripting
-
Bi-directional digital twin prototype anchoring with multi-periodicity learning for few-shot fault diagnosis
-
SODA: Sensitivity-Oriented Dynamic Acceleration for Diffusion Transformer
-
MedSteer: Counterfactual Endoscopic Synthesis via Training-Free Activation Steering
-
User Review Writing via Interview with Dialogue Systems
-
VirtueBench: Evaluating Trustworthiness under Uncertainty in Long Video Understanding
-
The Talking Robot: Distortion-Robust Acoustic Models for Robot-Robot Communication
-
Interpretable Maximum Margin Deep Anomaly Detection
-
Physics-Guided VLM Priors for All-Cloud Removal
-
Retinex Meets Language: A Physics-Semantics-Guided Underwater Image Enhancement Network
-
Aligning What EEG Can See: Structural Representations for Brain-Vision Matching
-
CoTJudger: A Graph-Driven Framework for Automatic Evaluation of Chain-of-Thought Efficiency and Redundancy in LRMs
-
Entropy-Aware On-Policy Distillation of Language Models
-
VLN-Cache: Enabling Token Caching for VLN Models with Visual/Semantic Dynamics Awareness
-
Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction
-
Countdown-Code: A Testbed for Studying The Emergence and Generalization of Reward Hacking in RLVR
-
mAVE: A Watermark for Joint Audio-Visual Generation Models
-
Statistical Contraction for Chance-Constrained Trajectory Optimization of Non-Gaussian Stochastic Systems
-
Facial Expression Generation Aligned with Human Preference for Natural Dyadic Interaction
-
NuNext: Reframing Nucleus Detection as Next-Point Detection
-
Grounding Machine Creativity in Game Design Knowledge Representations: Empirical Probing of LLM-Based Executable Synthesis of Goal Playable Patterns under Structural Constraints
-
Efficient Personalized Reranking with Semi-Autoregressive Generation and Online Knowledge Distillation
-
Deep Generative Spatiotemporal Engression for Probabilistic Forecasting of Epidemics
-
Vision Language Models Cannot Reason About Physical Transformation
-
Enhancing Consistency of Werewolf AI through Dialogue Summarization and Persona Information
-
Efficient Chest X-ray Representation Learning via Semantic-Partitioned Contrastive Learning
-
aCAPTCHA: Verifying That an Entity Is a Capable Agent via Asymmetric Hardness
-
Turn: A Language for Agentic Computation
-
TIQA: Human-Aligned Text Quality Assessment in Generated Images
-
Inter-Image Pixel Shuffling for Multi-focus Image Fusion
-
Combining Adam and its Inverse Counterpart to Enhance Generalization of Deep Learning Optimizers
-
Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge
-
The Model Knows Which Tokens Matter: Automatic Token Selection via Noise Gating
