Opens nvidia.wd5.myworkdayjobs.com in a new tab
Benefits
- both your customers and the broader community.
- Build internal tools, benchmarking harnesses, and automation pipelines that raise the productivity of your teammates and customers alike — with a multiplier attitude that makes everyone around you more effective.
- Document architectures, findings, and recommendations with clarity for technical audiences, and contribute improvements back to vLLM and related open-source projects where appropriate.
- What We Need to See: Bachelor's, Master's, or PhD in Computer Science, Computer Engineering, or equivalent experience. 5+ years of industry experience building and operating complex, production-grade software systems, with strong instincts for how systems behave at scale.
- Hands-on experience deploying and operating LLM inference workloads — particularly with vLLM — including configuration, optimization, and debugging in real-world environments.
- Proficiency with container orchestration (Kubernetes) and HPC scheduling (Slurm) for running GPU-accelerated workloads.
- Solid understanding of LLM serving fundamentals: batching strategies (continuous batching, chunked prefill), KV cache management, and tensor/pipeline parallelism.
- Familiarity with GPU performance analysis: memory hierarchy, utilization, roofline modeling, and profiling with Nsight Systems or Nsight Compute.
- Strong written and verbal communication skills, with the ability to present technical findings clearly to both engineering teams and leadership — and to navigate ambiguous, open-ended customer problems.
- Ways to Stand Out from the Crowd: Experience with NVIDIA Dynamo or other disaggregated inference serving frameworks.
- Contributions to open-source inference or ML systems projects, particularly vLLM or SGLang — please include links to relevant pull requests or artifacts.
- Background with ML compilers or GPU kernel development (Triton, CUTLASS, TorchInductor).
- Experience building developer tools or internal platforms that meaningfully improved team productivity.
- Prior experience in a customer-facing or forward-deployed engineering capacity within a technical product organization.
- Widely considered to be one of the technology world's most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package.
- As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/ #LI-Hybrid Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.
- The base salary range is 135,000 CAD - 185,000 CAD for Level 3, and 170,000 CAD - 220,000 CAD for Level 4.
- You will also be eligible for equity and benefits .
- Applications for this job will be accepted at least until April 14, 2026.
- This posting is for an existing vacancy.
- NVIDIA uses AI tools in its recruiting processes.
Sourced directly from NVIDIA’s career page
Your application goes straight to NVIDIA.
Opens nvidia.wd5.myworkdayjobs.com in a new tab
Specialisation
Open roles at NVIDIA
2000 positions
Job ID
/job/Canada-Toronto/Senior-Software-Engineer--AI-Inference_JR2016014
Get matched to roles like this
Upload your resume once. We’ll notify you when matching roles open up.
Join talent pool — freeSimilar Other roles
Samsung Semiconductor
Staff Technical Program Manager
San Jose, California, United States|Other
Samsung Semiconductor
Associate, Executive Administration
San Jose, California, United States|Other
Micron Technology
STAFF ENGINEER GFAC SASIA - ELECTRICAL
Fab 10A, Singapore|Other
Micron Technology
TEST HBM DATA ANALYST
Taichung - MTB, Taiwan|Other