Staff ML Performance Engineer (Inference Optimisation)
As a Staff ML Performance Engineer at Wayve, you will be instrumental in optimizing machine learning inference for edge accelerators and GPUs. This role focuses on running large transformer-based models efficiently on low-cost, low-power edge devices, contributing directly to the development of Wayve's first driving product. You will collaborate with cross-functional teams to ensure that these models operate reliably on in-vehicle compute systems.
Your primary responsibilities will include profiling and identifying bottlenecks across the entire inference stack, from model graphs to kernel execution and memory movement. You will implement and validate optimizations in compilers, runtimes, and kernels, such as operator fusion, scheduling, and quantization-aware performance enhancements. Additionally, you will build robust benchmarking and regression testing frameworks to maintain performance improvements across various models, devices, and software releases.
The ideal candidate will have proven experience in improving performance in production systems with stringent constraints, such as latency, memory, bandwidth, power, thermal, or cost. Strong proficiency with relevant toolchains like TensorRT, CUDA, Qualcomm QNN, Triton, or OpenCL is essential, along with the ability to quickly learn adjacent frameworks. A solid foundation in software engineering fundamentals, including debugging, profiling, testing, and writing maintainable code, is also required.
Wayve offers a dynamic and inclusive work environment where your contributions will have a significant impact. The company values diversity and fosters a culture of continuous learning and innovation. This full-time role is based in London, with a hybrid working policy that combines in-office collaboration to fuel innovation and relationships, along with the flexibility of working from home.