Solutions Architect - Cloud Infrastructure
NVIDIA
We are excited to announce an opening for a Cloud Solution Architect at NVIDIA and are seeking a passionate individual with a strong interest in cloud infrastructure engineering! If you are enthusiastic about contributing to projects that push the boundaries of cloud-based AI and resilience in large-scale environments, we invite you to read on. NVIDIA is renowned as one of the most sought-after employers in the technology world, offering highly competitive benefits. We are home to some of the most innovative and forward-thinking individuals globally. If you are creative, autonomous, and eager to apply your skills and knowledge in a dynamic environment, we want to hear from you!
What you'll be doing:
Working as a key member of our cloud solutions team, you will be the go-to technical expert on NVIDIA's GPU-accelerated cloud offerings, helping clients build resilient and telemetry-driven cloud infrastructures.
Collaborating directly with engineering teams to secure design wins, address challenges, and deploy solutions into production, with a focus on developing robust tooling for observability and failure recovery.
Acting as a trusted advisor to our clients, understanding their cloud environment, translating requirements into technical solutions, and providing guidance on optimizing NVIDIA DGX Cloud for scalable, reliable, and high-performance workloads.
What we need to see:
2+ years of experience in cloud infrastructure engineering, AI/ML systems, or large-scale distributed systems.
A BS in Computer Science, Electrical Engineering, Mathematics, or Physics, or equivalent experience.
A proven understanding of cloud computing and large-scale computing systems.
Proficiency in Linux, Windows Subsystem for Linux, and Windows.
A passion for machine learning and AI, and the drive to continually learn and apply new technologies.
Excellent interpersonal skills, including the ability to explain complex technical topics to non-experts.
Ways to stand out from the crowd:
Expertise with orchestration tools like Slurm and Kubernetes.
Familiarity with NVIDIA’s DGX Cloud, Base Command Platform, and its ecosystem.
Hands-on experience designing telemetry systems and failure recovery mechanisms for large-scale cloud infrastructures including observability tools such as Grafana, Prometheus, and OpenTelemetry.
Proficiency in deploying and managing cloud-native solutions using platforms such as AWS, Azure, or Google Cloud, with a focus on GPU-accelerated workloads.
Contributions to open-source projects showcasing expertise in cloud-AI/infrastructure engineering.
You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.