Senior AI/ML Engineer



Software Engineering, Data Science
Pune, Maharashtra, India
Posted on Friday, September 22, 2023

NVIDIA is hiring senior software engineers in its Infrastructure Planning and Process Team, to lead the massive scale up in various key solutions for its internal Cloud infrastructure. IPP (Infrastructure, Planning and Process) is a global organization within NVIDIA. The group works with various other teams within NVIDIA such as Graphics Processors, Mobile Processors, Deep Learning, Artificial Intelligence and Driverless Cars to cater to their infrastructure needs. These cloud services provide almost half a million automated jobs per day on five thousand servers helping with the productivity of thousands of NVIDIA’s software developers worldwide. The cloud hosts heterogeneous mix of machines and devices with various operating systems (Windows/Linux/Android), multitude of hardware platforms both NVIDIA GPUs and Tegra processors. Are you passionate about infrastructure and looking for complex meaningful issues, ready to build the next generation of cloud services, design creative solutions, mine through data to uncover real problems and fix them? We are excited to have a fun-loving person like you join our team!

As a Senior Software Engineer, you will understand the overall movement of data in the entire platform, find bottlenecks, define solutions, develop key pieces, write APIs and own deployment of those. You will work with internal and external development teams to discover these opportunities and to solve hard problems. You will also guide other teammates in developing the APIs you have defined, developing your acceptance tests for those and reviewing the work and the test results. For this role, you will need to have excellent technical leadership, communication, organizational, and analytical skills as well as passion for large and hard problems, e.g. Peta Bytes of fast storage, Million cores, 100,000 builds and 100,000 tests.

What you’ll be doing:

  • Responsible for implementing AI/ML/DL use cases to improve efficiency and resiliency of systems and data center.

  • Identify problems related to performance, utilization and scale in existing systems and provide innovative solutions for the same.

  • Work with the architect and other developers in internal and external development teams.

What we need to see:

  • BE (MS preferred) or equivalent experience in EE/CS with 5+ years of work experience.

  • Well versed with Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) algorithm and techniques.

  • Hands on experience on large language model (LLM).

  • Expert programming ability in Python, Go or Java.

  • Strong understanding and hands on to various anomaly detection techniques.

  • Experience in working with SQL/NoSQL database systems such as MySQL, MongoDB or Elasticsearch.

  • Good understanding of distributed systems, understanding of microservice architecture and REST APIs.

  • Excellent knowledge and working experience with Docker containers. Consistent track record in developing large scale distributed applications.

  • Ability to effectively work across organizational boundaries to enhance alignment and productivity between teams.

Ways to stand out from the crowd:

  • Experienced in implementing AI.ML, DL algorithms for solving various business problems.

  • Good understanding of networking, storage, and data center infrastructure domain.

  • Exposure to various anomaly detection techniques.

  • Prior development of a large software project using service oriented architecture operating with real time constraints.

We have some of the most forward-thinking and versatile people in the world working for us and, due to unprecedented growth, our best-in-class engineering teams are rapidly growing. We are building a team that will truly change the world. If you are passionate about new technologies, care about software quality, and want to be part of the future of transportation and AI, would love for you to join us.