Papers
-
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
-
GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training
-
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
-
AdaptEvolve: Improving Efficiency of Evolutionary AI Agents through Adaptive Model Selection
-
Canzona: A Unified, Asynchronous, and Load-Balanced Framework for Distributed Matrix-based Optimizers
-
M2XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization
-
Zebra-Llama: Towards Extremely Efficient Hybrid Models
-
Power Aware Dynamic Reallocation For Inference
-
AIE4ML: An End-to-End Framework for Compiling Neural Networks for the Next Generation of AMD AI Engines
-
CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models
-
CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving
-
SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning
-
Efficient and Adaptable Overlapping for Computation and Communication via Signaling and Reordering
-
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation
-
Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks
-
Agent Laboratory: Using LLM Agents as Research Assistants
