Distinguished Engineer - Data Center System Software Architect

NVIDIA

NVIDIA

Software Engineering, IT
Santa Clara, CA, USA
Posted on Friday, October 6, 2023

NVIDIA data center systems, such as DGX and HGX, have become core to NVIDIA's rapidly growing enterprise and cloud provider businesses. These platforms bring together the full power of NVIDIA GPUs, NVIDIA NVLink, NVIDIA InfiniBand networking, NVIDIA Grace CPUs, and a fully optimized NVIDIA AI and HPC software stack. We’re looking for a strong technical architect to own the end-to-end architecture of these products, at the system software level. Including firmware, kernel drivers, operating systems, and user mode drivers. You will work with component leads internally and engage with industry leading cloud service providers on taking these products to market.

What you’ll be doing:

  • Drive the system architecture for a complex server platform in a cross functional environment.

  • Work directly with major customers to understand their requirements and work to align their roadmap with NVIDIA’s roadmap.

  • Work with business partners and vendors to shape their products to meet NVIDIA’s needs.

  • Develop a roadmap of new technologies and protocols and drive their design and adoption.

  • Mentor architects and engineering teams to grow them into future leaders.

  • Make key technical decisions even when faced with ambiguity, and mitigate execution risks by following left shift strategy.

What we need to see:

  • Deep experience in designing architecture for scalable and performant server systems, particularly at the SW/HW interface.

  • Previous experience working with complex system software for accelerators such as GPUs, DPUs, or FPGAs

  • Expertise in Out of Band and Inband management architectures.

  • Knowledge of device management protocols such as MCTP, PLDM and RDE.

  • Knowledge of system management protocols such as Redfish and IPMI.

  • Experience working with platform security experts to define tradeoffs between security and ease of use.

  • Demonstrable experience in implementing left shift strategy to de-risk program execution.

  • Excellent written and verbal communication skills.

  • BS or MS degree in Computer Engineering, Computer Science, or related degree or equivalent experience

  • 20+ years in the area of System architecture and design

Ways to stand out from the crowd:

  • Knowledge of cloud and cluster level deployment and management systems.

  • Participation and contributions in standards bodies such as OCP and DMTF.

  • Familiarity with CXL architectures.

  • Knowledge in storage and networking technologies

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative, passionate and self-motivated, we want to hear from you!

The base salary range is $304,000 - $460,000. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.