At World Labs, our mission is to revolutionize artificial intelligence by developing Large World Models, taking AI beyond language and 2D visuals into the realm of complex 3D environments, both virtual and real. We're the team that's envisioning a future where AI doesn't just process information but truly understands and interacts with the world around us.
We're looking for the overachievers, the visionaries, and the relentless innovators who aren't satisfied with the status quo. You know that person who's always dreaming up the next big breakthrough? That's us. And we want you to be part of it.
Model Optimization Engineer
World Labs
On-site
San Francisco, CA, United States
Full-time
$150,000 -
$250,000
About World Labs
About the Role
We're seeking an experienced engineer to bridge the gap between our research team's state-of-the-art models and production-ready inference systems. You'll take PyTorch research code and transform it into highly optimized, low-latency inference solutions.
Qualifications
3+ years optimizing deep learning models for production inference.
Expert-level PyTorch and CUDA programming experience.
Hands-on experience with model quantization (INT8/FP16) and inference frameworks (TensorRT, ONNX Runtime).
Proficiency in GPU profiling tools and performance analysis.
Experience with multi-GPU inference and model serving at scale.
Strong understanding of transformer architectures and modern ML model optimization techniques.
Preferred:
Custom CUDA kernel development experience.
Experience with Triton, vLLM, or similar high-performance serving frameworks.
Background in both research and production ML environments.
Expert-level PyTorch and CUDA programming experience.
Hands-on experience with model quantization (INT8/FP16) and inference frameworks (TensorRT, ONNX Runtime).
Proficiency in GPU profiling tools and performance analysis.
Experience with multi-GPU inference and model serving at scale.
Strong understanding of transformer architectures and modern ML model optimization techniques.
Preferred:
Custom CUDA kernel development experience.
Experience with Triton, vLLM, or similar high-performance serving frameworks.
Background in both research and production ML environments.
Responsibilities
Optimize neural network models for inference through quantization, pruning, and architectural modifications while maintaining accuracy.
Profile and benchmark model performance to identify computational bottlenecks.
Implement optimizations using torch.compile, custom CUDA kernels, and specialized inference frameworks.
Deploy multi-GPU inference solutions with efficient model parallelism and serving architectures.
Collaborate with research teams to ensure optimization techniques integrate smoothly with model development workflows.
Profile and benchmark model performance to identify computational bottlenecks.
Implement optimizations using torch.compile, custom CUDA kernels, and specialized inference frameworks.
Deploy multi-GPU inference solutions with efficient model parallelism and serving architectures.
Collaborate with research teams to ensure optimization techniques integrate smoothly with model development workflows.

