Opens intel.wd1.myworkdayjobs.com in a new tab
About This Role
- We are building a next-generation LLM inference system , spanning model optimization, inference runtime, and system-level design .
- This is a research + engineering role where you will: Study cutting-edge work (LLM inference, MoE, system optimization) Implement and optimize core techniques Work across the stack: model → kernels → runtime → distributed system A key focus is GPU kernel and runtime optimization , including exploring Triton-like programming models and compiler approaches , as part of building an end-to-end AI rack software system for LLM inference .
- Key Responsibilities 1.
- Research & Prototyping Read and reproduce state-of-the-art work (LLM inference, MoE, systems) Translate ideas into working, optimized implementations Identify bottlenecks and iterate beyond baseline performance 2.
- LLM Inference Optimization Implement and evaluate techniques such as: Continuous / dynamic batching KV cache optimization and memory management Speculative decoding Flash / paged attention Quantization (INT8 / FP8 / low-bit) Optimize for latency, throughput, and GPU utilization 3.
- MoE (Mixture-of-Experts) Systems Explore efficient inference for sparse models: Routing strategies and load balancing Expert parallelism and sharding Communication vs computation trade-offs Improve scalability and efficiency of MoE inference 4.
- Kernel & Runtime Optimization Develop and optimize GPU kernels using modern approaches: Triton-like programming models CUDA or equivalent low-level frameworks Investigate: Memory access patterns and layout optimization Operator fusion and kernel efficiency Compiler-style optimization for tensor workloads Compare different kernel/runtime strategies and integrate into the system 5.
- End-to-End Inference System Development Build and optimize a full inference stack: Model execution layer (vLLM, TensorRT-LLM, or similar) Runtime scheduling and batching Distributed inference across GPUs/nodes Work on: Multi-GPU / multi-node scaling NCCL / communication optimization System-level performance tuning Qualifications: Basic Requirements Master’s or PhD student required (CS, EE, or related field) Strong programming skills ( Python required ) Familiar with PyTorch and transformer models Solid fundamentals in algorithms and systems Available for at least 6 months (shorter durations are not considered) Preferred Experience Experience with one or more: GPU programming (CUDA, Triton, or similar) LLM inference frameworks (vLLM, TensorRT-LLM, FasterTransformer) Distributed systems or parallel computing Knowledge of: GPU architecture and performance profiling Quantization or model optimization MoE or large-scale model systems What We Look For Ability to go from paper → implementation → optimization Strong interest in performance and system-level problems Fast execution and willingness to work on deep technical challenges Curiosity about how LLM systems actually run at scale Job Type: Student / Intern Shift: Shift 1 (China) Primary Location: PRC, Shanghai Additional Locations: PRC, Beijing, PRC, Shenzhen Business group: The Sales and Marketing Group (SMG) leverages the product portfolio to drive Intel's revenue growth and market expansion, blending strategic initiatives with dynamic sales efforts to capture and retain customers.
- SMG is responsible for empowering the sales force with tools and insights needed to close deals and build lasting customer relationships.
- Sales analytics and market research ensure strategies are both targeted and impactful.
- In SMG, disciplined execution, creativity, and ambition are celebrated, providing ample opportunities for career advancement and skill development.
Sourced directly from Intel’s career page
Your application goes straight to Intel.
Opens intel.wd1.myworkdayjobs.com in a new tab
Specialisation
Open roles at Intel
765 positions
Job ID
/job/PRC-Shanghai/AI-Software-Engineer-Intern_JR0283183
Get matched to roles like this
Upload your resume once. We’ll notify you when matching roles open up.
Join talent pool — freeSimilar Other roles
Samsung Semiconductor
Thermal Engineer
San Jose, California, United States|Other
Samsung Semiconductor
Senior Manager, OLED Field Applications Engineering
San Jose, California, United States|Other
Samsung Semiconductor
Compensation Partner
San Jose, California, United States|Other
Micron Technology
HVM PEE Bench Operation Equipment Technician (内製修理テクニシャン)
Hiroshima - Fab 15, Japan|Other