TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

FlashDrive

By Z Lab
FlashDrive is an algorithm-system co-design framework for speeding up autonomous-driving VLA models that combine explicit reasoning with trajectory prediction. Z Lab presents it as a way to overcome the latency bottlenecks of models like Alpamayo 1.5 by optimizing encode, prefill, decode, and action stages together. Its main techniques include streaming inference with KV-cache reuse, speculative reasoning with DFlash, adaptive-step flow matching, W4A8 quantization using ParoQuant, and system-level CUDA graph and kernel-fusion optimizations. The project reports a 4.5x speedup on RTX PRO 6000 with negligible accuracy loss and transfer across Jetson Thor and multiple RTX GPUs.
Multimodal Gen 3
Released: March 2, 2026

Overview

FlashDrive is Z Lab’s real-time inference framework for vision-language-action autonomous driving models. It is designed to make reasoning-based end-to-end driving practical by accelerating all major inference stages, reducing latency from 716 ms to 159 ms on Alpamayo 1.5 while keeping accuracy nearly unchanged.

About Z Lab

View Company Profile

Tools using FlashDrive

No tools found for this model yet.

Last updated: April 20, 2026
0 AIs selected
Clear selection
#
Name
Task