Opens nvidia.wd5.myworkdayjobs.com in a new tab
Overview
- NVIDIA is seeking Software Performance Architects to optimize GPU kernel performance for state-of-the-art data-center platforms.
- We build automated, data-driven workflows to detect, explain, and prevent performance regressions across key deep learning workloads, partnering closely with kernel developers, compiler teams, infrastructure, and architecture/performance groups.
- What you'll be doing: Performance analysis + debugging Validate and analyze performance of GPU-accelerated kernels and key deep learning building blocks.
- Debug performance issues end-to-end: reproduce, isolate root causes, propose fixes or mitigation paths, and drive closure with the owning teams.
- Build performance narratives using structured evidence: baselines, controlled comparisons, and regression attribution.
- Automation + regression infrastructure (Python-heavy) Develop and maintain Python-based automation for performance testing and analysis—using modern AI-assisted developer tools (e.g., Cursor/Claude Code/Copilot) to accelerate scripting while keeping code maintainable and reviewable.
- Design and operate performance test workflows: coverage definition, test/workload generation, automated large-scale execution (CI/nightly/on-demand), rerun rules, and reproducibility standards.
- Convert raw run outputs into actionable insight: statistics, noise control, post-processing, visualization, and large-scale result mining.
- Cross-team collaboration and operating model Work with kernel developers and compiler/rotation teams to ensure performance checks are practical, scalable, and aligned to release needs.
- Partner with SWQA and infrastructure teams for execution at scale and reliable pipelines/dashboards.
- Contribute to clear ownership/triage/routing rules so regressions close quickly and consistently Following general software engineering best practices including support for regression testing and CI/CD flows What we need to see: Masters or PhD degree or equivalent experience in Computer Science, Computer Engineering, Applied Math, or related field Strong programming ability in Python plus C/C++ (performance-oriented code reading/debugging) Solid fundamentals in computer architecture and performance reasoning (latency/throughput, memory hierarchy, parallelism).
- Experience with performance analysis workflows: profiling, measurement methodology, reproducibility, and regression triage.
- Comfortable working across teams and driving issues to decision/closure with clear communication Demonstrated strong C++ programming and software design skills, including debugging, performance analysis, and test design Experience with performance-oriented parallel programming, even if it’s not on GPUs (e.g.
- with OpenMP or pthreads) Solid understanding of computer architecture and some experience with assembly programming Identify bottlenecks, optimize resource utilization, and improve throughput Ways to stand out from the crowd: Experience with high-performance kernels or math libraries (e.g., GEMM/attention, CUTLASS-like concepts) Experience building CI/nightly regression systems, dashboards, or large-scale performance analytics GPU programming/perf experience (CUDA or equivalent parallel programming) Strong ML/DL workload understanding (training/inference shapes, precision modes, perf bottlenecks) Familiarity with simulators/analytical modeling or performance characterization methodology.
Sourced directly from NVIDIA’s career page
Your application goes straight to NVIDIA.
Opens nvidia.wd5.myworkdayjobs.com in a new tab
Specialisation
Open roles at NVIDIA
2000 positions
Job ID
/job/China-Shanghai/Senior-Performance-Software-Engineer--Deep-Learning-Libraries_JR2004267
Get matched to roles like this
Upload your resume once. We’ll notify you when matching roles open up.
Join talent pool — freeSimilar Other roles
Samsung Semiconductor
Staff Technical Program Manager
San Jose, California, United States|Other
Samsung Semiconductor
Associate, Executive Administration
San Jose, California, United States|Other
Micron Technology
STAFF ENGINEER GFAC SASIA - ELECTRICAL
Fab 10A, Singapore|Other
Micron Technology
TEST HBM DATA ANALYST
Taichung - MTB, Taiwan|Other