Opens nvidia.wd5.myworkdayjobs.com in a new tab
Overview
- We are looking for a Software Solutions Engineer to support NVIDIA AI Enterprise customers and deployments across cloud and datacenter environments.
- This is a dual role: (1) Support, triage and resolve complex customer software issues end-to-end, and (2) build software features, automation, diagnostics, reproducible test cases, and deployment tooling—to improve product readiness and scale support across enterprise environments.
- You will work across compute and cloud-native technologies in CSP environments, including container platforms/orchestrators, enterprise system software, and GPU-accelerated AI frameworks and inference services used to run production AI workloads at scale.
- In this customer-facing role, you will work closely with customers and internal engineering teams to understand issues, explain root causes, drive resolution, and collaborate on fixes and improvements.
- Success in this role requires strong debugging skills, crisp communication, and ownership of technically deep escalations from inception to closure.
- What you'll be doing: Develop and maintain product-facing features and deployment assets for AI Enterprise supportability (e.g., scripts, configuration guidance, Kubernetes manifests/Helm charts, and reproducible test cases) Develop and maintain Python-based tooling/automation (validators, log collectors, repro harnesses) to improve NVIDIA AI Enterprise deployment reliability across NGC and container orchestrators (e.g., Kubernetes) Contribute code-level fixes, patches, or pull requests (as appropriate) in collaboration with engineering to address customer-impacting issues and improve product readiness Support enterprise customers deploying NVIDIA AI Enterprise in datacenter and CSP environments, including Kubernetes-based and containerized production AI platforms Take ownership of customer issues from inception to resolution: reproduce in lab/cloud, collect diagnostics, provide mitigations, and partner with engineering on fixes Create high-quality bug reports and RFEs with clear repro steps, environment details (CSP/Kubernetes/GPU), impact analysis, and supporting artifacts Develop customer-facing and internal documentation (KBs, runbooks, deployment guidance) to improve time-to-value and reduce recurring issues Be on call one weekend per month in the event a customer has a Sev1 outage and requires engineering assistance What we need to see: BS in Computer Science, Electrical Engineering, Computer Engineering, or related field (or equivalent experience) At least 5+ years system software development and troubleshooting experience, ideally with some customer facing Strong computer science fundamentals and programming/scripting skills (Python required; Bash; Go/C++ a plus) to automate investigations and build diagnostics/repro tools Strong troubleshooting fundamentals (networking, concurrency, OS concepts) and a structured approach to isolating issues across application, platform, and infrastructure layers Deep understanding of at least two of the following: data centers/servers, distributed systems, virtualization, deep learning frameworks, containers (Docker/Kubernetes), hybrid cloud (AWS/Azure/GCP), and CI/CD for reliable deployments Familiarity with GPU-accelerated AI/ML stacks and production model deployment/serving (e.g., NGC containers, CUDA/tooling concepts, inference serving such as Triton or similar) Deep Linux knowledge and comfort troubleshooting in production Linux environments; working knowledge of Windows is a plus Professional-level communication skills, interpersonal skills with a passion to solve problems Ways to stand out from the crowd: Hands-on experience deploying and operating NVIDIA AI Enterprise components in production across on-prem or CSP environments Hands-on experience using AI coding assistants/tools (e.g., Cursor, Claude Code, Codex, or similar) to accelerate debugging, automation, and test creation Experience operating Kubernetes-based platforms in production (cluster operations, upgrades, control-plane/data-plane failure modes) Strong performance debugging skills for GPU and cloud workloads (profiling, latency/throughput tuning) and familiarity with observability/tracing tools.
Sourced directly from NVIDIA’s career page
Your application goes straight to NVIDIA.
Opens nvidia.wd5.myworkdayjobs.com in a new tab
Specialisation
Open roles at NVIDIA
1998 positions
Job ID
/job/India-Pune/Software-Solutions-Engineer_JR2018442-1
Get matched to roles like this
Upload your resume once. We’ll notify you when matching roles open up.
Join talent pool — freeSimilar Other roles
Samsung Semiconductor
Senior Engineer, System Software
San Jose, California, United States|Other
Micron Technology
PEE PROCESS SHIFT ENGINEER
Taichung - Fab 16, Taiwan|Other
Micron Technology
Staff Electrical Engineer – Facilities Construction
Fab 10N/X, Singapore|Other
Micron Technology
Fab10 Facilities Construction Document and Cost Control Engineer
Fab 10N/X, Singapore|Other