Site Reliability Engineer (SRE/ DevOps) - Engineering Productivity
Arista Networks is seeking a Site Reliability Engineer (SRE) to join our Engineering Productivity (EngProd) team. This role involves designing, building, and administering secure, scalable, and fault-tolerant tools and infrastructure in a hybrid cloud environment. As part of the software engineering team, you will collaborate with other engineers to support our rapidly expanding infrastructure and internal user base.
In this position, you will be responsible for building and deploying critical production systems with a focus on scalability, reliability, observability, performance, and security. Your day-to-day tasks will include monitoring and enhancing developer experience across services, automating operations to reduce manual toil, proactively managing alerts, creating and maintaining incident response runbooks, and triaging infrastructural issues in collaboration with software engineers and third-party vendors.
The ideal candidate will have a Bachelor's or Master's degree in Computer Science or Engineering, along with at least three years of relevant experience. Proficiency in programming languages such as Go, Python, or shell scripting is essential, as is a strong understanding of Linux or UNIX systems from an administration and debugging perspective. Hands-on experience in operating software systems at scale, server provisioning, and infrastructure-as-code practices are also required.
Arista Networks offers a dynamic work environment where engineers have complete ownership of their projects. Our flat management structure and emphasis on sound software engineering principles provide opportunities for professional growth and innovation. Join us to be part of a culture that values invention, quality, respect, and fun.