Opens nvidia.wd5.myworkdayjobs.com in a new tab
Overview
- Join NVIDIA as a Senior Deep Learning Algorithms Engineer to optimize cutting-edge biology and structural biology models, including LLMs and VLMs, for maximum performance and efficiency on NVIDIA GPUs.
- Focus on world-class inference for workloads like protein structure prediction and design.
- As part of BioNeMo, you will collaborate across teams to move next-gen AI models (e.g., Boltz1/2, OpenFold2/3) from research to production serving via TensorRT-LLM and related stacks, ensuring industry-leading, scalable performance for scientists and developers.
- What you will be doing: Integrate TensorRT-LLM for BioNeMo models (Boltz1–2, OpenFold2–3) and upcoming structural biology models (RFDiffusion, DiffDock, ProteinNMN, Evo2, ESM3).
- Optimize models for low-latency, high-throughput inference using parallelism, quantization (FP8/INT8), and sparsity/pruning.
- Profile and debug deep learning workloads on GPUs, resolving kernel/graph bottlenecks in training/inference, including custom operators.
- Develop and validate custom GPU kernels (CUDA, Triton) for hot paths, memory-bound ops, and non-standard blocks in structural biology models.
- Collaborate with research to align model architecture and training with deployment constraints for smooth production transition.
- What we want to see: MS/PhD in CS, EE, Comp.
- Eng., or equivalent practical experience.
- 5+ years professional experience in deep learning/applied ML, with a track record of deploying optimized models/inference paths in production (not research prototypes).
- Strong foundation in transformer/diffusion architectures; direct experience with LLMs, VLMs, or large biology models (e.g., structure prediction).
- Proficient in PyTorch (and/or TensorFlow) for production-grade model building, debugging, and deployment.
- Strong Python/C++; ability to read/modify performance-critical C++/CUDA code for inference stacks and custom ops.
- Practical experience with TensorRT/TensorRT-LLM: model conversion, optimization, deployment, and performance measurement (latency/throughput) under realistic conditions.
- Familiarity with GPU performance engineering: profiling (Nsight), roofline analysis, and optimization of kernels/memory access; experience writing/extending custom GPU kernels for model hot paths is required.
- Ways to stand out from the crowd: Led or significantly contributed to large-scale LLM/VLM/biology model serving (strict SLOs, high QPS, multi-GPU/node inference, cost/perf ownership).
- Deep customization of, or substantial contributions to, TensorRT-LLM, vLLM, SGLang, or comparable stacks, including debugging and extending for novel architectures.
- End-to-end ownership of FP8/INT8 (or other formats), including calibration, regression testing, and documenting accuracy vs.
- speed tradeoffs on biology workloads.
- Strong familiarity with protein structure, docking, or diffusion-based design and model families (e.g., OpenFold, Boltz, ESM, RFDiffusion, DiffDock)—demonstrated by benchmarks, publications, or open-source work.
- Repeated success taking non-text architectures (geometric, multimodal, structure-centric) from research/checkpoint to optimized, production-ready inference with clear metrics as well as e xamples of writing, maintaining, or upstreaming custom kernels or fused ops that produced measurable gains on real models or hardware.
Sourced directly from NVIDIA’s career page
Your application goes straight to NVIDIA.
Opens nvidia.wd5.myworkdayjobs.com in a new tab
Specialisation
Open roles at NVIDIA
2000 positions
Job ID
/job/Vietnam-Ho-Chi-Minh-City/Senior-Deep-Learning-Algorithms-Engineer---BioNeMo_JR2016601
Get matched to roles like this
Upload your resume once. We’ll notify you when matching roles open up.
Join talent pool — freeSimilar Other roles
Samsung Semiconductor
Principal Engineer, RFIC
San Jose, California, United States|Other
Micron Technology
Engineer, Production
Tainan, Taiwan|Other
Micron Technology
Senior data scientist
Fab 10A, Singapore|Other
Micron Technology
Member of Technical Staff (MTS), Machine Learning, SMAI
Fab 10A, Singapore|Other