FlashDrive

FlashDrive

FlashDrive is an algorithm-system co-design framework for speeding up autonomous-driving VLA models that combine explicit reasoning with trajectory prediction. Z Lab presents it as a way to overcome the latency bottlenecks of models like Alpamayo 1.5 by optimizing encode, prefill, decode, and action stages together. Its main techniques include streaming inference with KV-cache reuse, speculative reasoning with DFlash, adaptive-step flow matching, W4A8 quantization using ParoQuant, and system-level CUDA graph and kernel-fusion optimizations. The project reports a 4.5x speedup on RTX PRO 6000 with negligible accuracy loss and transfer across Jetson Thor and multiple RTX GPUs.

Overview

FlashDrive is Z Lab’s real-time inference framework for vision-language-action autonomous driving models. It is designed to make reasoning-based end-to-end driving practical by accelerating all major inference stages, reducing latency from 716 ms to 159 ms on Alpamayo 1.5 while keeping accuracy nearly unchanged.

🧠AI inference 🗃️Model utilization 🚗Autonomous driving

About Z Lab

View Company Profile

Last updated: July 7, 2026

Go to section

Search

Overview

About Z Lab

Related Models

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: