Papers
-
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
-
Quality over Quantity: Demonstration Curation via Influence Functions for Data-Centric Robot Learning
-
Adaptive Active Learning for Online Reliability Prediction of Satellite Electronics
-
Dynamic Multi-period Experts for Online Time Series Forecasting
-
Learning Adaptive LLM Decoding
-
Verifying Good Regulator Conditions for Hypergraph Observers: Natural Gradient Learning from Causal Invariance via Established Theorems
-
Intelligent Spatial Estimation for Fire Hazards in Engineering Sites: An Enhanced YOLOv8-Powered Proximity Analysis Framework
-
A Text-Native Interface for Generative Video Authoring
-
Exclusive Self Attention
-
GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models
-
PPO-Based Hybrid Optimization for RIS-Assisted Semantic Vehicular Edge Computing
-
OmniEdit: A Training-free framework for Lip Synchronization and Audio-Visual Editing
-
Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting
-
Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges
-
Exploring Collatz Dynamics with Human-LLM Collaboration
-
Overcoming Valid Action Suppression in Unmasked Policy Gradient Algorithms
-
HTMuon: Improving Muon via Heavy-Tailed Spectral Correction
-
Chain of Event-Centric Causal Thought for Physically Plausible Video Generation
-
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs
-
MedKCO: Medical Vision-Language Pretraining via Knowledge-Driven Cognitive Orchestration
-
Probabilistic Hysteresis Factor Prediction for Electric Vehicle Batteries with Graphite Anodes Containing Silicon
-
Training-free Motion Factorization for Compositional Video Generation
-
Composed Vision-Language Retrieval for Skin Cancer Case Search via Joint Alignment of Global and Local Representations
-
VIVID-Med: LLM-Supervised Structured Pretraining for Deployable Medical ViTs
-
Progressive Representation Learning for Multimodal Sentiment Analysis with Incomplete Modalities
-
PM-Nav: Priori-Map Guided Embodied Navigation in Functional Buildings
-
DexHiL: A Human-in-the-Loop Framework for Vision-Language-Action Model Post-Training in Dexterous Manipulation
-
QUSR: Quality-Aware and Uncertainty-Guided Image Super-Resolution Diffusion Model
-
Chaotic Dynamics in Multi-LLM Deliberation
-
ADVERSA: Measuring Multi-Turn Guardrail Degradation and Judge Reliability in Large Language Models
-
Transformer-Based Multi-Region Segmentation and Radiomic Analysis of HR-pQCT Imaging for Osteoporosis Classification
-
Rotation Equivariant Mamba for Vision Tasks
-
Agentic AI as a Network Control-Plane Intelligence Layer for Federated Learning over 6G
-
Causally Sufficient and Necessary Feature Expansion for Class-Incremental Learning
-
RTFDNet: Fusion-Decoupling for Robust RGB-T Segmentation
-
Deep Tabular Research via Continual Experience-Driven Execution
-
DataFactory: Collaborative Multi-Agent Framework for Advanced Table Question Answering
-
Bioalignment: Measuring and Improving LLM Disposition Toward Biological Systems for AI Safety
-
Real-Time Trust Verification for Safe Agentic Actions using TrustBench
-
RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning
-
Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL
-
POLISH'ing the Sky: Wide-Field and High-Dynamic Range Interferometric Image Reconstruction with Application to Strong Lens Discovery
-
GIAT: A Geologically-Informed Attention Transformer for Lithology Identification
-
Improving Search Agent with One Line of Code
-
Better Bounds for the Distributed Experts Problem
-
Progressive Split Mamba: Effective State Space Modelling for Image Restoration
-
Point Cloud as a Foreign Language for Multi-modal Large Language Model
-
Differentiable Stochastic Traffic Dynamics: Physics-Informed Generative Modelling in Transportation
-
Dissecting Chronos: Sparse Autoencoders Reveal Causal Feature Hierarchies in Time Series Foundation Models
-
DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization
