Opens nvidia.wd5.myworkdayjobs.com in a new tab
Overview
- NVIDIA is seeking a motivated AI Acceleration & Optimization Engineer to join our Acceleration Computing, Optimization and Tools (ACOT) team.
- In this role, you will help improve the performance, scalability, and efficiency of modern AI models across NVIDIA GPU platforms.
- You will work with engineers across algorithms, systems, and hardware to support high-performance model deployment and development for real-world AI workloads.
- As part of ACOT, you will collaborate with architecture, research, CUDA, compiler, and framework teams to help bring next-generation AI workloads from research to production with strong performance and reliability.
- What you will be doing Assist in optimizing AI models such as LLMs, VLMs, diffusion models, and multimodal models for inference and training on NVIDIA GPUs.
- Profile workloads and help identify performance bottlenecks across GPU compute, memory, networking, and storage.
- Support the development and integration of optimization techniques such as quantization, kernel fusion, parallelism, and memory efficiency improvements.
- Use tools including CUDA, TensorRT, Nsight, and NVIDIA acceleration libraries to analyze and improve model performance.
- Work with deep learning frameworks including PyTorch, JAX, and TensorFlow, as well as open-source inference frameworks like vLLM and SGLang.
- Contribute to performance benchmarking, testing, and internal tooling to improve optimization workflows.
- Partner with senior engineers and multi-functional teams to evaluate workload behavior and support future performance improvements.
- What we want to see Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, Computer Engineering, or related field (or equivalent experience).
- 2–4 years of experience, or strong academic/project experience, in deep learning, performance engineering, systems, or high-performance computing.
- Good understanding of deep learning fundamentals and modern AI model architectures, especially transformers.
- Familiarity with GPU architecture and parallel computing concepts such as CUDA, kernels, memory hierarchy, and streams.
- Exposure to profiling and performance analysis tools.
- Programming skills in Python.
- Experience with at least one major ML framework such as PyTorch, TensorFlow, or JAX.
- Ways to stand out from the crowd Internship, research, or project experience optimizing AI/ML workloads on GPUs.
- Hands-on experience with TensorRT, TensorRT-LLM, vLLM, SGLang, or similar inference/runtime frameworks.
- Familiarity with quantization, sparsity, or mixed-precision techniques.
- Experience with distributed training or inference concepts.
- Contributions to open-source ML systems, performance tools, or infrastructure projects.
- Proficiency in C++, strong debugging skills and interest in low-level performance optimization.
Sourced directly from NVIDIA’s career page
Your application goes straight to NVIDIA.
Opens nvidia.wd5.myworkdayjobs.com in a new tab
Specialisation
Open roles at NVIDIA
2000 positions
Job ID
/job/Vietnam-Ho-Chi-Minh-City/Deep-Learning-Algorithms-Engineer---ACOT_JR2015256
Get matched to roles like this
Upload your resume once. We’ll notify you when matching roles open up.
Join talent pool — freeSimilar Other roles
Samsung Semiconductor
Staff Technical Program Manager
San Jose, California, United States|Other
Samsung Semiconductor
Associate, Executive Administration
San Jose, California, United States|Other
Micron Technology
STAFF ENGINEER GFAC SASIA - ELECTRICAL
Fab 10A, Singapore|Other
Micron Technology
TEST HBM DATA ANALYST
Taichung - MTB, Taiwan|Other