DevOps Manager - Big Data and AI

NVIDIA

NVIDIA

Software Engineering, Operations, Data Science
Shanghai, China
Posted on Wednesday, December 27, 2023

We are now looking for a DevOps Software Manager. You will work on open-source technologies and enterprise adoptions such as, Accelerate Apache Spark with GPU (Spark-RAPIDS) to speedup data processing and machine learning dramatically, Medical deep learning framework (project MONAI) that revolutionizes healthcare AI solutions worldwide and Federated learning technology (NVFlare) that builds generalizable AI models from diverse data sources while ensuring data security and privacy.

What you'll be doing:

  • Serve as a technical leader in defining, designing, developing, and maintaining the DevOps tools, frameworks & platforms

  • Implement, advocate, and carry out CI/CD conventions and write tools to automate various steps involved in this process

  • Develop and maintain Build, Deployment, and Continuous Integration infrastructure

  • Enable the development team by providing automated build and test solutions using Docker, Kubernetes/YARN, and on-prem/CSPs

  • Work with open source communities, including RAPIDS, Spark, MONAI, and NVFlare, on CI/CD

  • Work closely with Development and QA teams to help ensure end-to-end quality

  • Full stack development opportunities depending on the candidate's capabilities

What we need to see:

  • BS or MS in Computer Science, Computer Engineering, or closely related fields or equivalent experience

  • 8+ years of working experience in software development with the 2+ years management experience

  • 2+ years experience in CI/CD system, Strong programming and debugging skills in Python/Java/C++ with extensive bash scripting experience

  • Excellent knowledge of Gitlab/Github or other source version control systems

  • Configuring, maintaining, and building upon deployments of industry-standard tools (e.g. Jenkins, Kubernetes, Docker, etc)

  • Strong experience in build tools like maven, setup tools, cmake, unit testing, and code-coverage tools and strong skills in software release process (maven repository, PyPI, Conda)

  • Familiar with various Linux systems like Ubuntu, CentOS, Rocky and familiar with cloud services like AWS, Azure, GCP

  • Good knowledge of open-source big-data technologies (Spark, Hadoop) and/or ML/DL frameworks (TensorFlow, PyTorch)

Ways to stand out from the crowd:

  • Good open-source project management skills

  • Kubernetes, YARN, Spark, or Ray experience

  • Experience with Configuration Management such as Ansible, and Terraform

  • Knowledge of monitoring systems (Prometheus, Grafana)

  • Experience with CUDA would be a huge plus

We are an AA/EEO/Disabled employer and with highly competitive salaries and a comprehensive benefits package, NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most brilliant and talented people on the planet working for us. Are you creative and autonomous? Do you love a challenge? If so, we want to hear from you.