Senior DL Performance Infrastructure and MLOps Engineer
We are now looking for a Senior DL Performance Infrastructure & MLOps Engineer:
NVIDIA is seeking engineers who love building world-class infrastructure, from automated command-line scripting to full-blown CI/CD systems running on some of the world's largest clusters, to support our work to accelerate training of deep neural networks like Stable Diffusion or ChatGPT via hardware and software innovations. If you have that itch whenever the mechanical aspects of code development, performance analysis, and data processing consume any more human time than necessary, we'd like to hear from you. If you are passionate about accelerating all existing workfloads in a diverse team while also envisioning next-gen opportunities to enable new forms of hardware/software analysis and development we haven't even thought of, this is the place for you.
What you'll be doing:
Improve all tooling and automation in use in the team, from simple data collection scripts to datacenter-scale ML CI/CD systems.
Understand and internalize workflows for GPU performance analysis and optimization so you can help us re-invent them.
Build Python-based machinery hooking into common Deep Learning software like PyTorch or JAX to support performance analysis work.
Ruthlessly discover and chase down workflow- and tool-related inefficiencies in the team's daily work, and dream up and implement ways to eliminate them.
What we need to see
MS degree in CS or adjacent fields or equivalent experience
3+ years of relevant work experience
Background in deep learning fundamentals and common deep learning software, especially PyTorch/JAX
Experience in GPU computing, i.e. fundamental understanding of heterogeneous multi-node accelerated computing systems
Background in analyzing and optimizing application performance
Familiarity with containerized CI/CD flows, e.g. gitlab + docker
Programming skills in C++, Python, and CUDA
Deep passion related to tools, scripts, and automation
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. Are you creative and autonomous? Do you love a challenge? If so, we want to hear from you! Come, join our DL Architecture team and help build the real-time, cost-effective AI computing platform driving our success in this exciting and quickly growing field.The base salary range is $144,000 - $270,250. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.