Opens nvidia.wd5.myworkdayjobs.com in a new tab
Overview
- NVIDIA is seeking a passionate, world-class software engineer to join its Compute Developer Technology team(DevTech).
- Our team has over 150 engineers across Beijing, Shanghai, Shenzhen, Taipei, Seoul, and Sydney.
- We understand algorithms, GPU, and real-world applications.
- Our mission is to connect the NVIDIA platform with developers worldwide.
- We dive deep into customer projects to solve performance bottlenecks.
- We use insights from workloads to guide next-generation NVIDIA hardware and software.
- If you are driven by innovation and ambition, this is the team for you! What you'll be doing: Working directly with key application developers to understand the current and future problems they are solving.
- You will build and optimize core parallel algorithms and data structures to deliver the most effective solutions using GPUs, through both library development and direct contribution to applications.
- This includes training and inference optimization for large language models (LLM), contributing to frameworks and open-source projects in the large language models ecosystem, such as Megatron and TRTLLM, SGLang, vLLM..
- Collaborating closely with the architecture, research, libraries, tools, and system software teams at NVIDIA to influence the build of next-generation architectures, software platforms, and programming models.
- This includes investigating impact on application performance and developer efficiency, and turning real-world developer feedback into actionable platform improvements.
- Engaging in deep optimization of high-performance operators, involving but not limited to GPU kernel optimization, instruction-level tuning, and compiler optimization.
- These optimizations will directly support customers or be coordinated within computation libraries and open-source projects across the community, like cuDNN, cuBLAS, and CUTLASS and Open- source libs like DeepGEMM, FlashMLA, FlashAttention, Flashinfer..
- Improving communication for broad distributed large language models workloads.
- You will spearhead advancements in distributed training and inference by refining communication libraries(NCCL,NCCL GIN , NVSHMEM ) and engaging in open-source communication libraries(like DeepEP, NCCL EP).
- This demands in-depth study of interconnect topologies(NVLINK) and network protocols( InfiniBand/RoCE) to design efficient data transfer strategies and methods for compute-communication overlap.
- What we need to see: A degree or equivalent experience from a university in an engineering or computer science related field.
- A masters or doctoral degree is preferred.
- Two or more years of work experience.
- Solid understanding of C, C++, Python, or Fortran.
- Strong knowledge of software development, programming techniques, and algorithms.
- Strong mathematical fundamentals, including linear algebra and numerical methods.
- Background in parallel programming and accelerated computing, with comprehensive knowledge of parallel architectures and methods for performance analysis and tuning.
- Experience in GPU programming is desirable.
- Experience in full-stack performance analysis and optimization within at least one of these areas: large language models and high-performance computing.
- Having expertise ranging from operator-level through framework-level to algorithm-level optimization is strongly preferred.
- Experience in distributed communication optimization is highly advantageous.
- This involves familiarity with remote direct memory access, GPU interconnects, collective communication algorithms, and associated open-source libraries used in large-scale model training and inference.
- Solid software engineering fundamentals and system architecture thinking, with the ability to build modules and drive engineering practices in complex systems.
- Strong communication and cooperation abilities, with the capability to work efficiently alongside architecture, research, and software product teams to promote optimization from concept to production.
- A continuous learning outlook, proactively following innovative technologies and adapting to a rapidly evolving landscape.
Sourced directly from NVIDIA’s career page
Your application goes straight to NVIDIA.
Opens nvidia.wd5.myworkdayjobs.com in a new tab
Specialisation
Open roles at NVIDIA
2000 positions
Job ID
/job/China-Shanghai/Developer-Technology-Engineer---AI_JR2015720
Get matched to roles like this
Upload your resume once. We’ll notify you when matching roles open up.
Join talent pool — freeSimilar Other roles
Samsung Semiconductor
Staff Technical Program Manager
San Jose, California, United States|Other
Samsung Semiconductor
Associate, Executive Administration
San Jose, California, United States|Other
Micron Technology
STAFF ENGINEER GFAC SASIA - ELECTRICAL
Fab 10A, Singapore|Other
Micron Technology
TEST HBM DATA ANALYST
Taichung - MTB, Taiwan|Other