TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

Engineering Manager, Machine Learning

Captions
On-site
Union Square, New York City, NY, United States
Full-time
$200,000 - $300,000

About Captions

Captions is the leading AI video company—our mission is to empower anyone, anywhere to tell their stories through video. Over 10 million creators and businesses have used Captions to simplify video creation with truly novel and groundbreaking AI capabilities.

We are a rapidly growing team of ambitious, experienced, and devoted engineers, researchers, designers, marketers, and operators based in NYC. As an early member of our team, you’ll have an opportunity to have an outsized impact on our products and our company's culture.

About the Role

Captions is seeking an ML Engineering Lead to lead a small, high-impact team of ML engineers that bring large-scale multimodal video foundation models into production. ML engineering is responsible for optimizing and deploying state-of-the-art generative models (tens to hundreds of billions of parameters) to deliver low-latency, high-throughput inference at scale. This is a unique opportunity to work on cutting-edge AI—spanning audio-video generation, diffusion architectures, and temporal modeling—and ensure these innovations reach millions of creators worldwide.

Qualifications

Proven experience deploying deep learning models on GPU-based infrastructure (NVIDIA GPUs, CUDA, TensorRT, etc.).

Strong knowledge of containerization (Docker, Kubernetes) and microservice architectures for ML model serving.

Proficiency with Python and at least one deep learning framework (PyTorch, TensorFlow).

Familiarity with compression techniques (quantization, pruning, distillation) for large-scale models.

Experience profiling and optimizing model inference (batching, concurrency, hardware utilization).

Hands-on experience with ML pipeline orchestration (Airflow, Kubeflow, Argo) and automated CI/CD for ML.

Strong grasp of logging, monitoring, and alerting tools (Prometheus, Grafana, etc.) in distributed systems.

Exposure to diffusion models, multimodal video generation, or large-scale generative architectures.

Experience with distributed training frameworks (FSDP, DeepSpeed, Megatron-LM) or HPC environments.

Responsibilities

Drive the technical vision for deploying large-scale multimodal diffusion models (tens to hundreds of billions of parameters) in production.

Oversee and contribute to core ML pipelines—from GPU-based inference to model optimization.

Collaborate with researchers to adapt state-of-the-art generative models for real-world performance and reliability.

Develop high-performance GPU-based inference pipelines for large multimodal diffusion models.

Build, optimize, and maintain serving infrastructure to deliver low-latency predictions at large scale.

Collaborate with software engineering teams to containerize models, manage autoscaling, and ensure uptime SLAs.

Leverage techniques like quantization, pruning, and distillation to reduce latency and memory footprint without compromising quality.

Implement continuous fine-tuning workflows to adapt models based on real-world data and feedback.

Design and maintain automated CI/CD pipelines for model deployment, versioning, and rollback.

Implement robust monitoring (latency, throughput, concept drift) and alerting for critical production systems.

Explore cutting-edge GPU acceleration frameworks (e.g., TensorRT, Triton, TorchServe) to continuously improve throughput and reduce costs.

Benefits

Comprehensive medical, dental, and vision plans

401K with employer match

Commuter Benefits

Catered lunch multiple days per week

Dinner stipend every night if you're working late and want a bite!

Doordash DashPass subscription

Health & Wellness Perks (Talkspace, Kindbody, One Medical subscription, HealthAdvocate, Teladoc)

Multiple team offsites per year with team events every month

Generous PTO policy
0 AIs selected
Clear selection
#
Name
Task