Member of Technical Staff - ML Systems & Inference

🇺🇸 San Francisco, California
$1K - $2K Annual
Posted 3 months ago
Expires August 1, 2026

About Us

Gimlet is building the next generation of AI infrastructure: large-scale AI datacenters and the orchestration platform that coordinates them.

The future of AI will require vastly more compute than exists today. But as AI workloads become more complex and new hardware architectures emerge, simply deploying more GPUs isn't enough. The challenge is making increasingly diverse compute work together.

Gimlet's platform intelligently partitions and routes workloads across heterogeneous hardware, enabling step-function improvements in performance and efficiency. Customers deploy through production-grade APIs without needing to think about hardware selection, placement, or optimization.

We work with foundation labs, hyperscalers, and AI-native companies to power production workloads at massive scale and help define the infrastructure layer for the future of AI.

ABOUT THE ROLE

Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design and build the inference systems that execute full models end-to-end under real production constraints. You will work at the intersection of model architecture, runtime behavior, and system performance to ensure inference is fast, predictable, and scalable.

This role is ideal for engineers who deeply understand how modern models execute in practice and who care about latency, throughput, and memory behavior across the full inference lifecycle.

WHAT YOU WILL WORK ON

- Design and optimize end-to-end inference pipelines from request ingestion through execution and response

- Build and evolve inference runtimes that balance latency, throughput, and concurrency under real-world load

- Reason about batching, queuing, and scheduling tradeoffs, including their impact on tail latency and fairness

- Manage KV cache allocation, placement, reuse, and eviction across models and requests

- Optimize prefill and decode paths, including attention mechanisms and memory usa...

More Jobs at Gimlet