Developer Technology Engineer - AI

Opens nvidia.wd5.myworkdayjobs.com in a new tab

Overview

  • NVIDIA is seeking a passionate, world-class software engineer to join its Compute Developer Technology team(DevTech).
  • Our team has over 150 engineers across Beijing, Shanghai, Shenzhen, Taipei, Seoul, and Sydney.
  • We understand algorithms, GPU, and real-world applications.
  • Our mission is to connect the NVIDIA platform with developers worldwide.
  • We dive deep into customer projects to solve performance bottlenecks.
  • We use insights from workloads to guide next-generation NVIDIA hardware and software.
  • If you are driven by innovation and ambition, this is the team for you! What you'll be doing: Working directly with key application developers to understand the current and future problems they are solving.
  • You will build and optimize core parallel algorithms and data structures to deliver the most effective solutions using GPUs, through both library development and direct contribution to applications.
  • This includes training and inference optimization for large language models (LLM), contributing to frameworks and open-source projects in the large language models ecosystem, such as Megatron and TRTLLM, SGLang, vLLM..
  • Collaborating closely with the architecture, research, libraries, tools, and system software teams at NVIDIA to influence the build of next-generation architectures, software platforms, and programming models.
  • This includes investigating impact on application performance and developer efficiency, and turning real-world developer feedback into actionable platform improvements.
  • Engaging in deep optimization of high-performance operators, involving but not limited to GPU kernel optimization, instruction-level tuning, and compiler optimization.
  • These optimizations will directly support customers or be coordinated within computation libraries and open-source projects across the community, like cuDNN, cuBLAS, and CUTLASS and Open- source libs like DeepGEMM, FlashMLA, FlashAttention, Flashinfer..
  • Improving communication for broad distributed large language models workloads.
  • You will spearhead advancements in distributed training and inference by refining communication libraries(NCCL,NCCL GIN , NVSHMEM ) and engaging in open-source communication libraries(like DeepEP, NCCL EP).
  • This demands in-depth study of interconnect topologies(NVLINK) and network protocols( InfiniBand/RoCE) to design efficient data transfer strategies and methods for compute-communication overlap.
  • What we need to see: A degree or equivalent experience from a university in an engineering or computer science related field.
  • A masters or doctoral degree is preferred.
  • Two or more years of work experience.
  • Solid understanding of C, C++, Python, or Fortran.
  • Strong knowledge of software development, programming techniques, and algorithms.
  • Strong mathematical fundamentals, including linear algebra and numerical methods.
  • Background in parallel programming and accelerated computing, with comprehensive knowledge of parallel architectures and methods for performance analysis and tuning.
  • Experience in GPU programming is desirable.
  • Experience in full-stack performance analysis and optimization within at least one of these areas: large language models and high-performance computing.
  • Having expertise ranging from operator-level through framework-level to algorithm-level optimization is strongly preferred.
  • Experience in distributed communication optimization is highly advantageous.
  • This involves familiarity with remote direct memory access, GPU interconnects, collective communication algorithms, and associated open-source libraries used in large-scale model training and inference.
  • Solid software engineering fundamentals and system architecture thinking, with the ability to build modules and drive engineering practices in complex systems.
  • Strong communication and cooperation abilities, with the capability to work efficiently alongside architecture, research, and software product teams to promote optimization from concept to production.
  • A continuous learning outlook, proactively following innovative technologies and adapting to a rapidly evolving landscape.

Sourced directly from NVIDIA’s career page

Your application goes straight to NVIDIA.

NVIDIA logo

NVIDIA

3 Locations

Specialisation
Open roles at NVIDIA
2000 positions
Job ID
/job/China-Shanghai/Developer-Technology-Engineer---AI_JR2015720

Get matched to roles like this

Upload your resume once. We’ll notify you when matching roles open up.

Join talent pool — free

Similar Other roles