Staff Technical Lead for Inference & ML Performance
Fal is at the forefront of developing generative-media infrastructure, aiming to deliver seamless creative experiences at an unprecedented scale. The company is seeking a Staff Technical Lead for Inference & ML Performance to guide and optimize state-of-the-art inference systems. This role is pivotal in shaping the future of Fal's inference engine, ensuring that their generative models achieve best-in-class performance.
The primary responsibilities include setting the technical direction for a team focused on kernels, applied performance, ML compilers, and distributed inference. The lead will personally contribute to critical inference performance enhancements and optimizations, collaborating closely with research and applied ML teams to influence model inference strategies and deployment techniques. Additionally, the role involves driving advanced performance optimizations, such as implementing model parallelism, kernel optimization, and compiler strategies, while mentoring and scaling a team of performance-focused engineers.
Candidates should have deep experience in ML performance optimization, particularly in optimizing inference for large-scale generative models in production environments. A comprehensive understanding of the full ML performance stack, including tools like PyTorch, TensorRT, TransformerEngine, Triton, and CUTLASS kernels, is essential. Expert-level familiarity with advanced inference techniques—such as quantization, kernel authoring, compilation, model parallelism (including tensor, context/sequence, and expert parallelism), distributed serving, and profiling—is required. The ideal candidate will lead from the front, demonstrating hands-on expertise and thriving in cross-functional collaboration with applied ML teams, researchers, and stakeholders.
This position offers one of the highest impact roles at a rapidly growing company, with revenue increasing 40% month-over-month and a 60x+ revenue run rate compared to the previous year. Fal has raised Series A, B, and C funding within the last 12 months, reflecting its world-changing vision of hyperscaling human creativity.