Papers
-
Demographic Fairness in Multimodal LLMs: A Benchmark of Gender and Ethnicity Bias in Face Verification
-
Social Hippocampus Memory Learning
-
PICon: A Multi-Turn Interrogation Framework for Evaluating Persona Agent Consistency
-
The Geometry of Efficient Nonconvex Sampling
-
Visual or Textual: Effects of Explanation Format and Personal Characteristics on the Perception of Explanations in an Educational Recommender System
-
LanteRn: Latent Visual Structured Reasoning
-
Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?
-
Anchored-Branched Steady-state WInd Flow Transformer (AB-SWIFT): a metamodel for 3D atmospheric flow in urban environments
-
Designing Any Imaging System from Natural Language: Agent-Constrained Composition over a Finite Primitive Basis
-
Beyond Via: Analysis and Estimation of the Impact of Large Language Models in Academic Papers
-
RenoBench: A Citation Parsing Benchmark
-
Colon-Bench: An Agentic Workflow for Scalable Dense Lesion Annotation in Full-Procedure Colonoscopy Videos
-
A Mentalistic Interface for Probing Folk-Psychological Attribution to Non-Humanoid Robots
-
Calorimeter Shower Superresolution with Conditional Normalizing Flows: Implementation and Statistical Evaluation
-
arg-VU: Affordance Reasoning with Physics-Aware 3D Geometry for Visual Understanding in Robotic Surgery
-
Fast-dVLA: Accelerating Discrete Diffusion VLA to Real-Time Performance
-
Uncertainty-Guided Label Rebalancing for CPS Safety Monitoring
-
Can Users Specify Driving Speed? Bench2Drive-Speed: Benchmark and Baselines for Desired-Speed Conditioned Autonomous Driving
-
Longitudinal Digital Phenotyping for Early Cognitive-Motor Screening
-
Measuring What Matters -- or What's Convenient?: Robustness of LLM-Based Scoring Systems to Construct-Irrelevant Factors
-
Self-Improvement of Large Language Models: A Technical Overview and Future Outlook
-
Persistent Robot World Models: Stabilizing Multi-Step Rollouts via Reinforcement Learning
-
Just Zoom In: Cross-View Geo-Localization via Autoregressive Zooming
-
On Neural Scaling Laws for Weather Emulation through Continual Training
-
LEMMA: Laplacian pyramids for Efficient Marine SeMAntic Segmentation
-
A Unified Memory Perspective for Probabilistic Trustworthy AI
-
The Kitchen Loop: User-Spec-Driven Development for a Self-Evolving Codebase
-
Neural Network Conversion of Machine Learning Pipelines
-
S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation
-
Wan-Weaver: Interleaved Multi-modal Generation via Decoupled Training
-
TRACE: Object Motion Editing in Videos with First-Frame Trajectory Guidance
-
Seeing to Ground: Visual Attention for Hallucination-Resilient MDLLMs
-
Agent Factories for High Level Synthesis: How Far Can General-Purpose Coding Agents Go in Hardware Optimization?
-
R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning
-
No Hard Negatives Required: Concept Centric Learning Leads to Compositionality without Degrading Zero-shot Capabilities of Contrastive Models
-
AnyHand: A Large-Scale Synthetic Dataset for RGB(-D) Hand Pose Estimation
-
Back to Basics: Revisiting ASR in the Age of Voice Agents
-
PixelSmile: Toward Fine-Grained Facial Expression Editing
-
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference
-
BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation
-
SlotVTG: Object-Centric Adapter for Generalizable Video Temporal Grounding
-
Unleashing Guidance Without Classifiers for Human-Object Interaction Animation
-
Training the Knowledge Base through Evidence Distillation and Write-Back Enrichment
-
How good was my shot? Quantifying Player Skill Level in Table Tennis
-
MegaFlow: Zero-Shot Large Displacement Optical Flow
-
PSDesigner: Automated Graphic Design with a Human-Like Creative Workflow
-
Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving
-
Vega: Learning to Drive with Natural Language Instructions
-
RefAlign: Representation Alignment for Reference-to-Video Generation
-
MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models
MongoDB - Build AI That Scales
