Solution Architecture Intern, AI Infra - 2026

Opens nvidia.wd5.myworkdayjobs.com in a new tab

Overview

  • NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years.
  • It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.
  • Today, we’re tapping into the unlimited potential of AI to define the next era of computing.
  • An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world.
  • Doing what’s never been done before takes vision, innovation, and the world’s best talent.
  • As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work.
  • Come join the team and see how you can make a lasting impact on the world.
  • Join NVIDIA, a groundbreaking leader in AI computing and visual technologies, at the forefront of innovation.
  • As an AI in Industry Solution Architecture Intern, you'll be integral to our mission of redefining industries through AI and HPC.
  • Our Solution Architect team builds innovative AI computing platforms, analyzes applications, and delivers outstanding value to our customers.
  • This role offers a remarkable opportunity to harness NVIDIA's newest technologies to optimize large models, develop sophisticated AI workflows, and empower our clients with advanced AI solutions.
  • What you will be doing: Develop and gain in‑depth understanding of open‑source inference frameworks such as SGLang and vLLM; collaborate with the community on new features and operators, performance optimization, and model enablement.
  • Design and implement CUDA kernels/operators (e.g., GEMM, attention and related primitives) for efficient and scalable LLM inference and training.
  • Develop and optimize KV‑cache offloading frameworks for LLM scenarios, enabling multi‑level KV‑cache offloading and reuse on CPU/SSD/remote storage to accelerate inference (team project: https://github.com/taco-project/FlexKV).
  • Take ownership of R&D work related to compute performance in distributed training, continuously exploring methods and techniques for performance optimization.
  • Conduct in‑depth research on computational problems in machine learning, summarize common computational patterns and requirements, and develop sample code, acceleration libraries, or framework components.
  • What we need to see: Pursuing a Bachelor or Master or PhD in Electrical Engineering, Automation, Computer Science, Computational Mathematics, or related fields.
  • Strong interest in accelerated computing, parallel computing, and heterogeneous computing, and willingness to dive deep into these areas.
  • Solid programming skills; good understanding of data structures and general concepts of computer systems.
  • Strong ability to learn and adapt, with good skills in analyzing and formulating problems and exploring solutions independently.
  • Ways to stand out from the crowd: Familiarity with heterogeneous computing, distributed training, parallel computing, or other high‑performance computing areas.
  • Experience in performance analysis, performance modeling, or performance optimization, and contributions to open‑source frameworks.
  • Strong capability in defining new problems and exploring solution spaces; this is critical for the role.
  • Proficiency with AI‑assisted programming tools.

Sourced directly from NVIDIA’s career page

Your application goes straight to NVIDIA.

NVIDIA logo

NVIDIA

2 Locations

Specialisation
Open roles at NVIDIA
1995 positions
Job ID
/job/China-Beijing/Solution-Architecture-Intern--AI-Infra---2026_JR2019909

Get matched to roles like this

Upload your resume once. We’ll notify you when matching roles open up.

Join talent pool — free

Similar Other roles