Opens nvidia.wd5.myworkdayjobs.com in a new tab
Overview
- Are you passionate about programming languages, compiler technology, and GPU performance? Do you want to help shape the future of high-performance kernel development for AI? We are looking for outstanding engineers to build CUTLASS DSL, a Python-native language for GPU kernel development, along with the MLIR dialects and lowering passes behind it.
- In this role, you will also help accelerate kernel compilation while delivering performance comparable to CUTLASS C++, enabling efficient hardware-software co-design for NVIDIA's next generation of AI platforms.
- What you'll be doing: Design, develop, and optimize C UTLASS DSL, a Python-native language for high-performance GPU kernel development Build and advance the MLIR dialects, lowering passes, and code generation flows that power the C UTLASS DSL stack Drive innovations that improve kernel compilation speed while maintaining performance on par with CUTLASS C++ Collaborate closely with architecture, research, software product teams, and the open-source community to bring cutting-edge optimizations into real products What we need to see: MS, PhD, or equivalent experience in Computer Science, Software Engineering, or a related field 2+ years of relevant work experience Excellent programming skills in Python and strong proficiency in C++ Hands-on experience with DSLs, compilers, or code generation systems Strong command of the MLIR/LLVM stack, including IR design and pass optimization Strong communication skills and the ability to thrive in a highly collaborative environment Ways to stand out from the crowd: Deep understanding of the CUDA GPU programming model, GPU microarchitecture, and performance analysis and optimization techniques Familiarity with key high-performance computing abstractions such as Layout, Tile, MMA, and TMA in the CuTe ecosystem.
Sourced directly from NVIDIA’s career page
Your application goes straight to NVIDIA.
Opens nvidia.wd5.myworkdayjobs.com in a new tab
Specialisation
Open roles at NVIDIA
1999 positions
Job ID
/job/China-Shanghai/Deep-Learning-Performance-Architect--CUTLASS-DSL_JR2018773
Get matched to roles like this
Upload your resume once. We’ll notify you when matching roles open up.
Join talent pool — freeSimilar Other roles
Broadcom
Principal Software Engineer
USA-CA San Jose Innovation Drive|Other
Micron Technology
Manufacturing Industrial Engineer (MIE) – Advanced Analytics & AI Enablement
Boise, ID - ID1|Other
Micron Technology
FA LAB Technician
Miaoli - Tongluo, Taiwan|Other
Micron Technology
Product Quality Assurance Manager
Taichung - Fab 16, Taiwan|Other