Senior Deep Learning Algorithms Engineer - BioNeMo

2 Locations Other

Opens nvidia.wd5.myworkdayjobs.com in a new tab

Overview

Join NVIDIA as a Senior Deep Learning Algorithms Engineer to optimize cutting-edge biology and structural biology models, including LLMs and VLMs, for maximum performance and efficiency on NVIDIA GPUs.
Focus on world-class inference for workloads like protein structure prediction and design.
As part of BioNeMo, you will collaborate across teams to move next-gen AI models (e.g., Boltz1/2, OpenFold2/3) from research to production serving via TensorRT-LLM and related stacks, ensuring industry-leading, scalable performance for scientists and developers.
What you will be doing: Integrate TensorRT-LLM for BioNeMo models (Boltz1–2, OpenFold2–3) and upcoming structural biology models (RFDiffusion, DiffDock, ProteinNMN, Evo2, ESM3).
Optimize models for low-latency, high-throughput inference using parallelism, quantization (FP8/INT8), and sparsity/pruning.
Profile and debug deep learning workloads on GPUs, resolving kernel/graph bottlenecks in training/inference, including custom operators.
Develop and validate custom GPU kernels (CUDA, Triton) for hot paths, memory-bound ops, and non-standard blocks in structural biology models.
Collaborate with research to align model architecture and training with deployment constraints for smooth production transition.
What we want to see: MS/PhD in CS, EE, Comp.
Eng., or equivalent practical experience.
5+ years professional experience in deep learning/applied ML, with a track record of deploying optimized models/inference paths in production (not research prototypes).
Strong foundation in transformer/diffusion architectures; direct experience with LLMs, VLMs, or large biology models (e.g., structure prediction).
Proficient in PyTorch (and/or TensorFlow) for production-grade model building, debugging, and deployment.
Strong Python/C++; ability to read/modify performance-critical C++/CUDA code for inference stacks and custom ops.
Practical experience with TensorRT/TensorRT-LLM: model conversion, optimization, deployment, and performance measurement (latency/throughput) under realistic conditions.
Familiarity with GPU performance engineering: profiling (Nsight), roofline analysis, and optimization of kernels/memory access; experience writing/extending custom GPU kernels for model hot paths is required.
Ways to stand out from the crowd: Led or significantly contributed to large-scale LLM/VLM/biology model serving (strict SLOs, high QPS, multi-GPU/node inference, cost/perf ownership).
Deep customization of, or substantial contributions to, TensorRT-LLM, vLLM, SGLang, or comparable stacks, including debugging and extending for novel architectures.
End-to-end ownership of FP8/INT8 (or other formats), including calibration, regression testing, and documenting accuracy vs.
speed tradeoffs on biology workloads.
Strong familiarity with protein structure, docking, or diffusion-based design and model families (e.g., OpenFold, Boltz, ESM, RFDiffusion, DiffDock)—demonstrated by benchmarks, publications, or open-source work.
Repeated success taking non-text architectures (geometric, multimodal, structure-centric) from research/checkpoint to optimized, production-ready inference with clear metrics as well as e xamples of writing, maintaining, or upstreaming custom kernels or fused ops that produced measurable gains on real models or hardware.

Sourced directly from NVIDIA’s career page

Your application goes straight to NVIDIA.

Similar Other roles

Micron Technology

Senior Deep Learning Algorithms Engineer - BioNeMo

Overview

More from NVIDIA (2000 roles)

Performance Engineer, Deep Learning and HPC

Senior System Software Engineer – Dynamo Tools

Senior System Software Engineer - Halos Core and Robotics Platform

NVIDIA

Get matched to roles like this

Similar Other roles

Sr Design Engineer, DEG Design Method

Principal/Staff Engineer, FE OCT Materials Cost

QE PCM CH Engineer

Test Engineer