Opens nvidia.wd5.myworkdayjobs.com in a new tab
Benefits
- package, we are widely considered to be one of the world’s most desirable employers! We have some of the most forward-thinking and hardworking people in the world working for us and, due to outstanding growth, our best-in-class engineering teams are rapidly growing.
- If you're a creative and autonomous person with a real passion for technology, we want to hear from you.
- What You Will Be Doing: Architect end-to-end solutions focused on LLM pretraining, fine-tuning, high-performance inference, RAG workflows, and agentic inference orchestration using NVIDIA’s hardware and software platforms.
- Collaborate with customers to understand their LLM-related business challenges and design tailored solutions aligned with the NVIDIA ecosystem.
- Lead LLM training, distributed optimization, and performance tuning to achieve optimal throughput, latency, and memory efficiency.
- Design and integrate RAG workflows and agentic inference pipelines into customer systems; provide technical guidance on best practices.
- Collaborate with NVIDIA engineering teams to provide feedback and support pre-sales technical activities (workshops, demos).
- What We Need to See: Master’s / Ph.D. in Computer Science, Artificial Intelligence, or equivalent experience. 4+ years hands-on experience in AI, focusing on open-source LLM training, fine-tuning, and production inference optimization.
- Deep understanding of mainstream LLM architectures and proficiency in LLM customization via PyTorch, Hugging Face Transformers.
- Solid knowledge of GPU computing, cluster architecture, and distributed parallel training/inference for LLMs.
- Competency in agentic inference design and using AI agents to solve business challenges.
- Strong communication skills, able to articulate complex technical concepts to technical and non-technical stakeholders.
- Ways to Stand Out from the Crowd: Hands-on experience with NVIDIA’s generative AI ecosystem (TRT-LLM, Megatron-LM, NVIDIA NeMo).
- Advanced skills in LLM optimization (quantization, KV Cache tuning, memory footprint reduction).
- Experience with Docker, Kubernetes for containerized LLM and agent workflow deployment on-prem.
- In-depth knowledge of multi-GPU parallelism and large-scale GPU cluster management. #deeplearning
Sourced directly from NVIDIA’s career page
Your application goes straight to NVIDIA.
Opens nvidia.wd5.myworkdayjobs.com in a new tab
Specialisation
Open roles at NVIDIA
2000 positions
Job ID
/job/China-Beijing/Deep-Learning-Solution-Architect_JR2015520-1
Get matched to roles like this
Upload your resume once. We’ll notify you when matching roles open up.
Join talent pool — freeSimilar Other roles
Samsung Semiconductor
Staff Technical Program Manager
San Jose, California, United States|Other
Samsung Semiconductor
Associate, Executive Administration
San Jose, California, United States|Other
Micron Technology
STAFF ENGINEER GFAC SASIA - ELECTRICAL
Fab 10A, Singapore|Other
Micron Technology
TEST HBM DATA ANALYST
Taichung - MTB, Taiwan|Other