Senior System Software Engineer - Cloud Infrastructure

NVIDIA

NVIDIA

Software Engineering, Other Engineering
Multiple locations
Posted on Oct 29, 2024

NVIDIA is looking for a Senior System Software Engineer - Cloud Infrastructure to join the NGN GPU Cloud Infrastructure group, working to design and deliver the platforms that enable deep learning, game streaming, content delivery, and generative AI systems in the cloud. This position will focus on design, development, and implementation of software-defined infrastructure and automation for compute, network, and storage systems. We have crafted a team of extraordinary people stretching around the globe, whose mission is to push the frontiers of what is possible today and define the platform of tomorrow.

What you will be doing:

  • Design, prototype, implement and help operate the next generation of software to automate global cloud infrastructure for NVIDIA GPU-accelerated applications such as Deep Learning, Game Streaming, Content Delivery, and Generative AI.

  • Actively write code, participate in systems design, code reviews, test authoring, feature development, bug triage, automation, configuration, documentation, and bug fixes – including open source and NVIDIA internal software projects.

  • Benchmark, evaluate, and optimize the performance, reliability, and efficiency of network and storage subsystems and applications.

  • Lead and participate in PoC and development efforts for various application use cases, working with cloud tenants, application owners, and solutions architects to design optimal and performant systems.

What we need to see:

  • BS or MS in Computer Science or Computer Engineering, and 8+ years of professional experience in software engineering.

  • Outstanding knowledge with Go (Golang), Python, C, C++, or other modern programming languages, including at least one compiled language.

  • Experience designing and writing concurrent software code for large-scale and performance-optimized distributed systems.

  • Excellent problem solving, collaborative, and interpersonal skills. Outstanding communication and soft skills, able to present to senior management in a sensible and persuasive manner. Ability to influence and build relationships with other software teams and functional groups.

  • Experience integrating network, storage, and compute technologies with virtual machine and container orchestration systems.

  • A security-first approach with a desire to deliver highly reliable, high-quality products.

  • Ability to root-cause functional and performance issues in distributed systems – and drive issues to closure.

  • Expert-level Linux systems configuration, automation, debugging, and performance optimization (ex. RHEL, CentOS, Ubuntu, Rocky Linux).

Ways to stand out from the crowd:

  • Production experience with git ops and devops workflows and tooling such as FluxCD, ArgoCD, Helm Charts, Terraform and/or Ansible.

  • Prior experience running Kubernetes clusters at-scale and in production.

  • Proven skills in modern container networking and storage architecture.

  • Experience working in distributed teams across multiple time zones.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you are creative and autonomous, we want to hear from you!

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.