Senior Cloud Engineer (K8S)
Graphcore is seeking a Senior Cloud Engineer (K8S) to join our Cloud Platform Team in Bristol, UK. In this role, you will collaborate with Software Platform, Datacentre Operations, and Product Development teams to deploy services on our cutting-edge AI systems. As part of our Software Platform organization, you will be involved in cloud integration, validation, performance benchmarking, optimization, and development of high-performance AI solutions, including in-house AI systems and off-the-shelf high-performance servers, switches, and storage solutions. This hands-on technical role requires a solid background in cloud infrastructure, deployment using Infrastructure-as-Code, observability, high-performance networking, and storage systems. You may have experience working in an IT organization, a datacentre, a cloud provider, or as a developer of orchestration or cloud services.
Key responsibilities include developing and operating Kubernetes-managed end-user services on our private clouds and supporting internal users in their use. You will translate end-user and product requirements into deployed services. Additionally, you will work with our Datacentre Operations Engineers to maintain and operate the fleet of AI systems at peak performance in our private clouds. Configuring and testing new Graphcore AI hardware and systems using Continuous Deployment and Infrastructure-as-Code in internal and external datacentres is also a critical part of this role.
The ideal candidate will possess a bachelor's degree or equivalent practical experience in a relevant subject. Experience with managing production Kubernetes clusters and workloads with a continuous delivery tool such as ArgoCD is essential. A solid software engineering or IT background with a proven track record of delivering technical output as an individual contributor is required. Experience working in an AGILE and SCRUM framework, including understanding priorities, risks, issues, impacts, and constraints, is important. Strong proven Linux scripting ability (bash, python, awk, sed) and Linux system administration (Ubuntu, RHEL, and variants) are necessary. Experience with version control systems (preferably Git) and using them to manage system configuration or automation is expected. Familiarity with Continuous Integration or testing pipelines using GitLab, GitHub, or similar tools is beneficial. A solid hands-on understanding of the technologies underpinning cloud services (APIs, virtualization of CPUs, IO, systems), virtual networks, block storage, resource management, and monitoring is required. Experience with Infrastructure-as-Code automation tools (Terraform/OpenTofu, Ansible, Packer) is also essential. Good communication and presentation skills, along with experience dealing with end-users of IT services, are important. The ability to work independently on critical infrastructure with minimal oversight and a focus on end-user availability is crucial.
In addition to a competitive salary, Graphcore offers flexible working, a generous annual leave policy, private medical insurance and health cash plan, a dental plan, pension (matched up to 5%), life assurance, and income protection. We have a generous parental leave policy and an employee assistance programme, which includes health, mental wellbeing, and bereavement support. We offer a range of healthy food and snacks at our central Bristol office and have our own barista bar. We welcome people of different backgrounds and experiences; we’re committed to building an inclusive work environment that makes Graphcore a great home for everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interviews and encourage you to chat with us if you require any reasonable adjustments.