Deep Learning Performance Architect, CUTLASS DSL

2 Locations Other

Opens nvidia.wd5.myworkdayjobs.com in a new tab

Overview

Are you passionate about programming languages, compiler technology, and GPU performance? Do you want to help shape the future of high-performance kernel development for AI? We are looking for outstanding engineers to build CUTLASS DSL, a Python-native language for GPU kernel development, along with the MLIR dialects and lowering passes behind it.
In this role, you will also help accelerate kernel compilation while delivering performance comparable to CUTLASS C++, enabling efficient hardware-software co-design for NVIDIA's next generation of AI platforms.
What you'll be doing: Design, develop, and optimize C UTLASS DSL, a Python-native language for high-performance GPU kernel development Build and advance the MLIR dialects, lowering passes, and code generation flows that power the C UTLASS DSL stack Drive innovations that improve kernel compilation speed while maintaining performance on par with CUTLASS C++ Collaborate closely with architecture, research, software product teams, and the open-source community to bring cutting-edge optimizations into real products What we need to see: MS, PhD, or equivalent experience in Computer Science, Software Engineering, or a related field 2+ years of relevant work experience Excellent programming skills in Python and strong proficiency in C++ Hands-on experience with DSLs, compilers, or code generation systems Strong command of the MLIR/LLVM stack, including IR design and pass optimization Strong communication skills and the ability to thrive in a highly collaborative environment Ways to stand out from the crowd: Deep understanding of the CUDA GPU programming model, GPU microarchitecture, and performance analysis and optimization techniques Familiarity with key high-performance computing abstractions such as Layout, Tile, MMA, and TMA in the CuTe ecosystem.

Sourced directly from NVIDIA’s career page

Your application goes straight to NVIDIA.

More from NVIDIA (1998 roles)

Hardware Failure Analysis Engineer – Physical Failure Analysis

Israel, Yokneam|Other

Technical Project Manager - Data Center Construction

2 Locations|Other

Senior Network Modeling Architect

3 Locations|Other

NVIDIA

2 Locations

Opens nvidia.wd5.myworkdayjobs.com in a new tab

Specialisation

Open roles at NVIDIA

1998 positions

Job ID

/job/China-Shanghai/Deep-Learning-Performance-Architect--CUTLASS-DSL_JR2018773

Get matched to roles like this

Upload your resume once. We’ll notify you when matching roles open up.

Join talent pool — free

Similar Other roles

Micron Technology

Water Treatment Technician

Sanand - 303A - AT/SSD/MOD, India|Other

Micron Technology

SR GPCOE PROCESS MANAGER II

Hyderabad - Phoenix Aquila, India|Other

Micron Technology

SR. Scheduling System Engineer

2 Locations|Other

Micron Technology

Program Analyst

Hyderabad - Phoenix Aquila, India|Other