Senior DevOps Engineer

NVIDIA

NVIDIA

Software Engineering
Shanghai, China
Posted on Friday, September 22, 2023

We are looking for a Senior DevOps Engineer to join our Data and Application Services team to improve its growing services infrastructure. At the core of our application services platform is our multi-tenant Kubernetes platform that is designed to run a variety of inhouse application services. You will be working with a team of passionate and skilled engineers that are continuously working to provide better tools to build and manage this infrastructure. Our team is a mix of varying levels of experience and backgrounds, from new grads to industry experts. We are looking for a motivated, hardworking and focused individual who have a real passion for operational excellence, data systems, and automation.

What you'll be doing:

  • Own the services you build working with cross functional teams

  • Comfortable with frequent code testing and deployment

  • Continuously improve infrastructure provisioning and management using automation

  • Identify areas to improve service resiliency through industry standard practices

  • Support a globally distributed, multi-cloud hybrid environment - AWS, GCP and On-prem

  • Determine root-cause for production level incidents and write corresponding high-quality RCA reports

  • Ensure the highest level of up-time and Quality of Service (QoS) to internal customers through operational excellence

  • Define service level objectives (SLOs) and service level indicators (SLIs) to represent and measure service quality

  • Participate in team's on-call rotation and be an escalation contact for service incidents

What we need to see:

  • 3+ years in operating services including web servers, load balancers, relational/non-relational databases, messaging systems and storage solutions

  • 3+ years coding/scripting in at least two high level programming languages - Python, Go, Ruby, Groovy etc.,

  • Deep understanding of linux operation system and TCP/IP fundamentals

  • Expertise with at least one major cloud service provider- AWS, GCP, Azure

  • Proficient in modern CI/CD techniques, GitOps and Infrastructure as Code(IaC)

  • Hands on experience managing production quality observability stacks

  • Excellent troubleshooting and problem solving skills

  • B.S. degree in Computer Science or related technical field

  • Detail oriented with great communication and documentation skills

Ways to stand out from the crowd:

  • Linux certification from a well known vendor - RedHat, Oracle etc.,

  • Prior experience managing large scale kubernetes deployment in production.

  • Strong skills in modern container networking and storage architecture