Opens nvidia.wd5.myworkdayjobs.com in a new tab
Overview
- At NVIDIA, we are pioneers in innovation, transforming computer graphics, PC gaming, and accelerated computing for over 25 years.
- Our team is driven by powerful technology and outstanding people who expand the limits of what’s achievable.
- Now, we are unlocking the potential of AI to usher in the next era of computing.
- As part of our engineering organization, you will play a key hands-on role in developing and executing software-driven characterization workflows on NVIDIA rack-scale systems.
- This role is focused on running AI workloads across the full stack to analyze, characterize, and optimize power, performance, and drive behavior at system level.
- This is an opportunity to work at the intersection of software, infrastructure, silicon, and large-scale AI platforms, with direct impact on next-generation NVIDIA systems.
- What you’ll be doing: Develop and run software tools, automation, and workloads to characterize power, performance, and drive behavior across NVIDIA rack-scale systems.
- Execute AI and system-level workloads to stress and evaluate behavior across the stack, including GPUs, CPUs, networking, storage, firmware, drivers, and system software.
- Build automated frameworks for data collection, telemetry, validation, correlation, and analysis of characterization results.
- Investigate system behavior under different workloads and operating conditions to identify bottlenecks, anomalies, and optimization opportunities.
- Work closely with hardware, firmware, driver, system software, performance, and validation teams to define characterization methodologies and debug cross-stack issues.
- Support bring-up, validation, and readiness activities for new rack-scale platforms and AI infrastructure.
- Create clear documentation, test flows, and repeatable processes to improve coverage, efficiency, and reproducibility.
- What we need to see: B.Sc.
- in Computer Science, Electrical Engineering, or a related field.
- 5+ years of software engineering experience, preferably in system software, infrastructure, validation, or performance-focused environments.
- Strong programming skills in Python and at least one system-level language such as C/C++.
- Experience developing automation and test infrastructure for complex hardware/software systems.
- Hands-on experience running, debugging, or optimizing AI, HPC, or large-scale system workloads.
- Good understanding of system-level architecture, including interactions across hardware, firmware, drivers, operating systems, and application layers Experience working in Linux environments and with scripting, telemetry, logging, and data analysis tools.
- Strong debugging and problem-solving skills, with the ability to work across multiple engineering disciplines.
- Good communication skills and the ability to drive technical work in a fast-paced, cross-functional environment.
- Ways to stand out from the crowd: Experience with NVIDIA platforms, GPU systems, or rack-scale AI infrastructure.
- Background in power, thermal, performance, or storage/drive characterization.
- Experience with workload automation, cluster orchestration, or lab infrastructure.
- Familiarity with AI benchmarks, training/inference workloads, and system stress methodologies.
- Experience in post-silicon validation, production testing, or system bring-up.
Sourced directly from NVIDIA’s career page
Your application goes straight to NVIDIA.
Opens nvidia.wd5.myworkdayjobs.com in a new tab
Specialisation
Open roles at NVIDIA
2000 positions
Job ID
/job/Israel-Yokneam/Senior-Software-Engineer--Data-Center-Workloads---Infrastructure_JR2017132
Get matched to roles like this
Upload your resume once. We’ll notify you when matching roles open up.
Join talent pool — freeSimilar Other roles
Samsung Semiconductor
Thermal Engineer
San Jose, California, United States|Other
Samsung Semiconductor
Senior Manager, OLED Field Applications Engineering
San Jose, California, United States|Other
Samsung Semiconductor
Compensation Partner
San Jose, California, United States|Other
Micron Technology
HVM PEE Bench Operation Equipment Technician (内製修理テクニシャン)
Hiroshima - Fab 15, Japan|Other