Deep Learning Performance Architect, CUTLASS DSL

Opens nvidia.wd5.myworkdayjobs.com in a new tab

Overview

  • Are you passionate about programming languages, compiler technology, and GPU performance? Do you want to help shape the future of high-performance kernel development for AI? We are looking for outstanding engineers to build CUTLASS DSL, a Python-native language for GPU kernel development, along with the MLIR dialects and lowering passes behind it.
  • In this role, you will also help accelerate kernel compilation while delivering performance comparable to CUTLASS C++, enabling efficient hardware-software co-design for NVIDIA's next generation of AI platforms.
  • What you'll be doing: Design, develop, and optimize C UTLASS DSL, a Python-native language for high-performance GPU kernel development Build and advance the MLIR dialects, lowering passes, and code generation flows that power the C UTLASS DSL stack Drive innovations that improve kernel compilation speed while maintaining performance on par with CUTLASS C++ Collaborate closely with architecture, research, software product teams, and the open-source community to bring cutting-edge optimizations into real products What we need to see: MS, PhD, or equivalent experience in Computer Science, Software Engineering, or a related field 2+ years of relevant work experience Excellent programming skills in Python and strong proficiency in C++ Hands-on experience with DSLs, compilers, or code generation systems Strong command of the MLIR/LLVM stack, including IR design and pass optimization Strong communication skills and the ability to thrive in a highly collaborative environment Ways to stand out from the crowd: Deep understanding of the CUDA GPU programming model, GPU microarchitecture, and performance analysis and optimization techniques Familiarity with key high-performance computing abstractions such as Layout, Tile, MMA, and TMA in the CuTe ecosystem.

Sourced directly from NVIDIA’s career page

Your application goes straight to NVIDIA.

NVIDIA logo

NVIDIA

2 Locations

Specialisation
Open roles at NVIDIA
1999 positions
Job ID
/job/China-Shanghai/Deep-Learning-Performance-Architect--CUTLASS-DSL_JR2018773

Get matched to roles like this

Upload your resume once. We’ll notify you when matching roles open up.

Join talent pool — free

Similar Other roles