Member of Technical Staff - Distributed Systems
About Us
Gimlet is building the next generation of AI infrastructure: large-scale AI datacenters and the orchestration platform that coordinates them.
The future of AI will require vastly more compute than exists today. But as AI workloads become more complex and new hardware architectures emerge, simply deploying more GPUs isn't enough. The challenge is making increasingly diverse compute work together.
Gimlet's platform intelligently partitions and routes workloads across heterogeneous hardware, enabling step-function improvements in performance and efficiency. Customers deploy through production-grade APIs without needing to think about hardware selection, placement, or optimization.
We work with foundation labs, hyperscalers, and AI-native companies to power production workloads at massive scale and help define the infrastructure layer for the future of AI.
ABOUT THE ROLE
Gimlet Labs is seeking a Member of Technical Staff focused on distributed systems. In this role, you will build the core platform that schedules, routes, and operates AI workloads reliably at production scale. You will work on systems that coordinate execution across thousands of nodes, expose stable production APIs, and ensure workloads run predictably under real-world load and failure conditions.
This role is well-suited for engineers who enjoy building foundational infrastructure, understanding systems end-to-end, and operating at scale.
WHAT YOU WILL WORK ON
- Design and build distributed systems that orchestrate and operate AI workloads at large scale
- Develop scheduling, routing, and resource management components that coordinate execution across many nodes and services
- Build production-grade APIs and control planes for deploying and managing workloads
- Implement mechanisms for reliability, availability, and fault tolerance in distributed environments
- Instrument systems for observability and debugging at scale
- Work closely with compilers, runtimes, and har...