Deep Learning Solution Architect

2 Locations Other

Opens nvidia.wd5.myworkdayjobs.com in a new tab

Benefits

package, we are widely considered to be one of the world’s most desirable employers! We have some of the most forward-thinking and hardworking people in the world working for us and, due to outstanding growth, our best-in-class engineering teams are rapidly growing.
If you're a creative and autonomous person with a real passion for technology, we want to hear from you.
What You Will Be Doing: Architect end-to-end solutions focused on LLM pretraining, fine-tuning, high-performance inference, RAG workflows, and agentic inference orchestration using NVIDIA’s hardware and software platforms.
Collaborate with customers to understand their LLM-related business challenges and design tailored solutions aligned with the NVIDIA ecosystem.
Lead LLM training, distributed optimization, and performance tuning to achieve optimal throughput, latency, and memory efficiency.
Design and integrate RAG workflows and agentic inference pipelines into customer systems; provide technical guidance on best practices.
Collaborate with NVIDIA engineering teams to provide feedback and support pre-sales technical activities (workshops, demos).
What We Need to See: Master’s / Ph.D. in Computer Science, Artificial Intelligence, or equivalent experience. 4+ years hands-on experience in AI, focusing on open-source LLM training, fine-tuning, and production inference optimization.
Deep understanding of mainstream LLM architectures and proficiency in LLM customization via PyTorch, Hugging Face Transformers.
Solid knowledge of GPU computing, cluster architecture, and distributed parallel training/inference for LLMs.
Competency in agentic inference design and using AI agents to solve business challenges.
Strong communication skills, able to articulate complex technical concepts to technical and non-technical stakeholders.
Ways to Stand Out from the Crowd: Hands-on experience with NVIDIA’s generative AI ecosystem (TRT-LLM, Megatron-LM, NVIDIA NeMo).
Advanced skills in LLM optimization (quantization, KV Cache tuning, memory footprint reduction).
Experience with Docker, Kubernetes for containerized LLM and agent workflow deployment on-prem.
In-depth knowledge of multi-GPU parallelism and large-scale GPU cluster management. #deeplearning

Sourced directly from NVIDIA’s career page

Your application goes straight to NVIDIA.

Similar Other roles

Broadcom

Deep Learning Solution Architect

Benefits

More from NVIDIA (1999 roles)

Manufacturing Test Engineer

HR Business Partner

Senior Site Reliability Engineer, GeForce NOW

NVIDIA

Get matched to roles like this

Similar Other roles

Principal Software Engineer

Manufacturing Industrial Engineer (MIE) – Advanced Analytics & AI Enablement

FA LAB Technician

Product Quality Assurance Manager