Senior System Software Engineer, Firmware Update Infrastructure - MGX



Software Engineering, Other Engineering
Taipei City, Taiwan
Posted on Friday, September 22, 2023

NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern deep learning — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company.” We're looking to grow our company and establish teams with the most thoughtful people in the world.

NVIDIA MGX is bringing accelerating computing into any data center with modular server design. With MGX, OEM and ODM partners can build tailored solutions for different use cases while saving development resources and reducing time to market. The modular reference architecture allows different configurations of GPUs, CPUs, and DPUs—including x86 or Arm® CPU servers and NVIDIA OVX™ servers—to accelerate diverse enterprise data center workloads. We’re looking for a highly motivated, creative engineer with strong experience in system software to join the Server Software team. You will be developing tools and infrastructure to manage the firmware lifecycle for OEM and ODM partners and validate updates on servers built on NVIDIA GPUs, CPUs, and DPUs. Are you ready to change the next generation of computing? Join us at the forefront of technological advancement.

What you’ll be doing:

  • Develop and validate Tools for Firmware Lifecycle Management on MGX NVIDIA data center products being built by NVIDIA OEM and ODM partners.

  • Work with NVIDIA partners on architecture and discussions to improve their use of NVIDIA products while performing firmware updates.

  • Use the Firmware Tools to update and validate Partner’s servers during various development phases and provide continuous feedback.

  • Drive product life cycles with Partner and QA teams to productize the platform software code, and be responsible as a product owner.

  • Contribute to all phases of product development, from product definition and architecture and design, through implementation, debugging, testing and early customer support.

What we need to see:

  • BS, MS, or PhD in EE/CS or related field of education with 5+ years of experience of active development using Python and C/C++ as primary programming language using Linux as OS.

  • System knowledge - how platform management works - areas like BMC-BIOS communication, thermal management, power management, firmware update, device monitoring, firmware security, etc.

  • Strong programming in Python and C/C++ in Linux operating environment, strong understanding of Linux kernel internals, strong code review skills.

  • Experience in SCM (e.g. Git, Perforce).

  • Possess excellent written and oral communication skills, good work ethics, high sense of team-work, love to produce quality work and commitment to finish your tasks every single day.

  • You are a self-starter who loves to find creative solutions to complicated problems.

Ways to stand out from the crowd:

  • Familiarity with the architecture of datacenter server hardware and experience with the in-band and out-of-band management of firmware and hardware components.

  • Understanding on REST architecture style especially JSON over HTTPs with OAuth.