Opens nvidia.wd5.myworkdayjobs.com in a new tab
Overview
- We're looking for outstanding AI systems software engineers to develop groundbreaking technologies across the inference systems software stack.
- Our team builds core AI systems software that accelerates high-impact workloads on NVIDIA GPUs, from deep learning primitives and kernel libraries to LLM inference runtimes, serving abstractions, and code generation technologies.
- As a member of the team, you will help design, build, optimize, and ship production-quality software that powers NVIDIA's AI software stack.
- This role spans both foundational library engineering and next-generation inference systems work, with opportunities to contribute across the stack from low-level kernels and performance primitives to serving runtimes and developer-facing abstractions.
- You may work on GPU-accelerated deep learning primitives, efficient attention kernel implementations, LLM serving components, just-in-time compilation systems, software abstractions, and performance-critical runtime infrastructure for large language models, agents, and other advanced AI workloads.
- You will collaborate with world-class engineers across deep learning software, compilers, GPU architecture, and open-source inference ecosystems, and your work will directly impact NVIDIA's AI platform and the performance of real-world workloads at scale.
- What you'll be doing: Develop production-quality software that ships as part of NVIDIA's AI software stack, including cuDNN, FlashInfer, and optimized support for large language model inference workloads.
- Innovate and develop new AI systems technologies for efficient inference, with a focus on performance, scalability, maintainability, and usability.
- Design, implement, and optimize kernels for high-impact AI workloads across LLM inference, generative AI, computer vision, autonomous driving, and recommender systems.
- Design and implement extensible software abstractions for deep learning libraries, LLM serving engines, and runtime systems.
- Build and improve just-in-time compilation, code generation, and runtime technologies for performance-critical GPU workloads.
- Analyze workload performance, tune current software, and propose improvements to future software and hardware-software interfaces.
- Collaborate closely with engineers across deep learning frameworks, libraries, kernels, compilers, and GPU architecture teams at NVIDIA.
- Contribute to open-source communities and ecosystem integrations where relevant, including projects such as FlashInfer, vLLM, and SGLang.
- What we need to see: Master's degree in Computer Science, Electrical Engineering, or a related field, or equivalent experience.
- 3+ years of relevant industry, research, or systems software development experience in machine learning, deep learning systems, compilers, or GPU software.
- More experience is expected for senior-level candidates.
- Strong programming skills in C/C++ and Python, with hands-on experience developing high-performance software.
- Solid experience with CUDA development and GPU programming fundamentals.
- Strong experience developing or using deep learning frameworks such as PyTorch, JAX, TensorFlow, or ONNX.
- Good understanding of linear algebra, performance analysis, profiling, and code optimization.
- Experience designing software abstractions, APIs, or higher-level system architecture for performance-sensitive systems.
- Familiarity with modern machine learning and inference system trends, especially around LLMs and generative AI.
- For senior candidates, strong experience in GPU kernel development and performance optimization, especially using CUDA C/C++, cuTile, Triton, or similar technologies, is expected.
- Ways to stand out from the crowd: Hands-on experience with inference engines and runtimes such as vLLM, SGLang, MLC, TensorRT-LLM, or similar systems.
- Background in domain-specific compiler, code generation, or library solutions for LLM inference and training.
- Expertise in machine learning compilers or IR systems such as MLIR, Apache TVM, TensorIR, or related technologies.
- Practical experience with GPU performance modeling, computer architecture, or accelerator-oriented software design.
- Open-source project ownership or meaningful contributions in deep learning systems, compilers, kernels, or inference infrastructure.
Sourced directly from NVIDIA’s career page
Your application goes straight to NVIDIA.
Opens nvidia.wd5.myworkdayjobs.com in a new tab
Specialisation
Open roles at NVIDIA
96 positions
Job ID
/job/China-Shanghai/Software-Engineer--AI-and-DL-Kernel-Libraries_JR2019913
Get matched to roles like this
Upload your resume once. We’ll notify you when matching roles open up.
Join talent pool — freeSimilar Other roles
Samsung Semiconductor
Staff Technical Program Manager
San Jose, California, United States|Other
Samsung Semiconductor
Staff Engineer, SRAM Circuit Design
San Jose, California, United States|Other
Samsung Semiconductor
Senior Manager, Market Intelligence
San Jose, California, United States|Other
Samsung Semiconductor
Manager, Memory Sales
San Jose, California, United States|Other