Senior DL Performance Infrastructure and MLOps Engineer



Other Engineering
Santa Clara, CA, USA
Posted on Friday, September 22, 2023

We are now looking for a Senior DL Performance Infrastructure & MLOps Engineer:

NVIDIA is seeking engineers who love building world-class infrastructure, from automated command-line scripting to full-blown CI/CD systems running on some of the world's largest clusters, to support our work to accelerate training of deep neural networks like Stable Diffusion or ChatGPT via hardware and software innovations. If you have that itch whenever the mechanical aspects of code development, performance analysis, and data processing consume any more human time than necessary, we'd like to hear from you. If you are passionate about accelerating all existing workfloads in a diverse team while also envisioning next-gen opportunities to enable new forms of hardware/software analysis and development we haven't even thought of, this is the place for you.

What you'll be doing:

  • Improve all tooling and automation in use in the team, from simple data collection scripts to datacenter-scale ML CI/CD systems.

  • Understand and internalize workflows for GPU performance analysis and optimization so you can help us re-invent them.

  • Build Python-based machinery hooking into common Deep Learning software like PyTorch or JAX to support performance analysis work.

  • Ruthlessly discover and chase down workflow- and tool-related inefficiencies in the team's daily work, and dream up and implement ways to eliminate them.

What we need to see

  • MS degree in CS or adjacent fields or equivalent experience

  • 3+ years of relevant work experience

  • Background in deep learning fundamentals and common deep learning software, especially PyTorch/JAX

  • Experience in GPU computing, i.e. fundamental understanding of heterogeneous multi-node accelerated computing systems

  • Background in analyzing and optimizing application performance

  • Familiarity with containerized CI/CD flows, e.g. gitlab + docker

  • Programming skills in C++, Python, and CUDA

  • Deep passion related to tools, scripts, and automation

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. Are you creative and autonomous? Do you love a challenge? If so, we want to hear from you! Come, join our DL Architecture team and help build the real-time, cost-effective AI computing platform driving our success in this exciting and quickly growing field.

The base salary range is $144,000 - $270,250. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.