Senior Cloud and Data Center Test Developer Architect
We are seeking a highly skilled and hard-working Senior Test Developer Architect to join our multifaceted Enterprise Software QA team. This role offers an outstanding opportunity to leave your mark on the design, construction, optimization and testing of large-scale infrastructure for various foundational NVIDIA unified cloud services and data center offerings. If you are a dedicated engineer with a deep understanding of cloud infrastructure and distributed systems, and you thrive in an exciting, innovative environment, this could be the flawless role for you.
What you'll be doing:
Engage with product engineering teams to gain a comprehensive understanding of their infrastructure use cases. Provide mentorship to SWQA teams on effectively testing at scale.
Develop end to end test plans that exercise all layers of SW stacks for NVIDIA cloud-based infrastructure
Lead NVIDIA Cloud and Data Center bring up activities from SWQA perspective
Develop sophisticated tooling to automate the build and deployment of microservices and infrastructure components, improving efficiency and productivity.
Reduce manual labor and increase operational efficiency through automation.
Supervise the infrastructure to alert on significant events, ensuring the highest level of system performance and reliability.
Work closely with partners to understand their infrastructure needs and to ensure our testing encompass their use cases
What we need to see:
A Master's or Ph.D. in Computer Science or a related field, or equivalent experience.
4+ years of hands-on experience in cluster management and related tools, including Docker Containers, Slurm, Kubernetes, and Ansible.
8+ years strong experience with cloud infrastructure platforms like AWS, Azure, or Google Cloud.
Hands-on experience with network, storage, cluster configuration and debugging.
High level of proficiency in Infrastructure as Code and Configuration Management tools like Terraform.
Expertise in administering, operating, and configuring Kubernetes and Envoy.
Proficiency in scripting languages such as Python.
Validated experience in Continuous Integration/Continuous Delivery (CI/CD) tools such as Gitlab and Jenkins and the GitOps model.
Proficiency in various monitoring tools :Prometheus, Grafana, Cloudwatch, and Thanos.
Strong background in cloud security, Kubernetes security, and application security.
Proficiency in debugging issues involving networks, DNS, HTTP, Linux, and containers.
Strong analytical and problem-solving skills, along with an ability to articulate what you know to others.
Ways to Stand Out from the Crowd:
A true innovator who isn't afraid to challenge the status quo and bring fresh ideas to the table. You're always looking for ways to improve existing systems and processes.
Passion and curiosity about the latest technologies and trends in cloud infrastructure and distributed systems. You're not just familiar with the tools, but you understand the underlying principles and can demonstrate this knowledge to make strategic decisions.
Committed to personal and professional growth. You're crafting opportunities to learn new skills and deepen your expertise. Consider joining our team!
By joining our team, you will be part of a forward-thinking company that values innovation and creativity. We offer a competitive salary and benefits package, a flexible work environment, and the opportunity to work with some of the industry leading experts. If you're ready to take your career to the next level, we'd love to hear from you.The base salary range is 192,000 USD - 368,000 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.