Deep Learning Engineer - LLM and VLM Model Compression

Opens nvidia.wd5.myworkdayjobs.com in a new tab

Overview

  • We are looking for DL engineers passionate about building deep learning frameworks for large language (LLM) and vision language (VLM) model compression that push the boundaries of AI efficiency.
  • In this role, you’ll collaborate with world-class teams across NVIDIA to advance both the software and hardware stack that powers modern AI.
  • Join the team building software used by the entire world.
  • Work with world class engineers and researchers to build next-generation deep learning frameworks for compressing LLM and VLM models through pruning, distillation, and neural architecture search (NAS).
  • Work on most powerful, enterprise-grade GPU clusters capable of hundreds of Peta FLOPS and on unreleased hardware before anyone in the world.
  • Are you ready for this challenge? What you’ll be doing: Design and implement a deep learning framework for compressing large language and vision-language models to deliver highly optimized, high-performance AI systems used worldwide.
  • Develop and integrate new algorithms for pruning, NAS, and distillation in collaboration with NVIDIA researchers and engineers.
  • Experiment with compressing the latest LLMs and VLMs, analyzing their performance and behavior across diverse workloads.
  • Collaborate with researchers and engineers across NVIDIA, providing guidance on improving the design, usability and performance of workloads.
  • Lead best-practices for building, testing, and releasing DL software.
  • What we need to see: 8+ years of experience in Deep Learning and SW Development.
  • BSc, MS or PhD degree in Computer Science, Computer Architecture or related technical field.
  • Hands-on experience with LLM or VLM model training or inference.
  • Excellent Python programming skills.
  • Extensive knowledge of at least one DL Framework (PyTorch, TensorFlow, JAX, MxNet) with practical experience in PyTorch required.
  • Strong problem solving and analytical skills.
  • Algorithms and DL fundamentals.
  • Ways to stand out from the crowd: Experience applying and implementing model compression techniques such as pruning, NAS, distillation, and quantization.
  • Experience building deep learning frameworks for training, inference, model compression, or related topic.
  • GPU programming experience (CUDA or OpenCL) is a plus but not required.
  • First-author publication in a top-tier deep learning or AI conference.
  • NVIDIA is widely considered to be one of the technology world’s most desirable employers.
  • We have some of the most brilliant and forward-thinking people in the world working for us.
  • We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
  • Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.
  • For Poland: The base salary range is 292,500 PLN - 507,000 PLN for Level 4, and 375,000 PLN - 650,000 PLN for Level 5.


#deeplearning.

Sourced directly from NVIDIA’s career page

Your application goes straight to NVIDIA.

NVIDIA logo

NVIDIA

4 Locations

Specialisation
Open roles at NVIDIA
2000 positions
Job ID
/job/Poland-Warsaw/Deep-Learning-Engineer---LLM-and-VLM-Model-Compression_JR2014941

Get matched to roles like this

Upload your resume once. We’ll notify you when matching roles open up.

Join talent pool — free

Similar Other roles