Papers
-
FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling
-
V1 : Unifying Generation and Self-Verification for Parallel Reasoners
-
Speculative Speculative Decoding
-
Learning to Discover at Test Time
-
Asynchronous Reasoning: Training-Free Interactive Thinking LLMs
-
DSGym: A Holistic Framework for Evaluating and Training Data Science Agents
-
Understanding and Steering the Cognitive Behaviors of Reasoning Models at Test-Time
-
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
-
RedPajama: an Open Dataset for Training Large Language Models
