Opens intel.wd1.myworkdayjobs.com in a new tab
About This Role
- Intel's Data Center Network Edge AI team is responsible for delivering best-in-class AI performance on Intel® architecture.
- From hyperscale data centers powered by Intel® Xeon® processors to network edge nodes, our performance engineers shape the inner loops of frameworks and operator libraries that millions of developers and customers rely on every day.
- We are seeking an intern to join our CPU performance engineering team and drive operator-level optimizations for modern AI workloads, including Transformer-based LLMs, VLM / VLA multi-modal models, classical CNNs, and MLP models, etc.
- You will design, implement, and tune high-performance CPU kernels that translate Intel architectural advantages — AVX-512, Intel® AMX, and VNNI — into measurable end-user value.
- Responsibilities Design and hand-tune CPU kernels for Transformer operators (Attention, GEMM, LayerNorm, RMSNorm, RoPE, MoE, Softmax) and classical operators (Conv2D / Conv3D, Depthwise Conv, Winograd, im2col, Pooling, BatchNorm, RNN / LSTM / GRU).
- Develop SIMD-optimized implementations using Intel® AVX2 / AVX-512 / AMX / VNNI intrinsics, with ARM Neon / SVE as a secondary target where applicable.
- Apply parallelization strategies (OpenMP, TBB, thread-pool design) and exploit CPU micro-architectural features: cache blocking and tiling, NUMA affinity, prefetching, memory alignment, and false-sharing mitigation.
- Implement and optimize low-bit quantized kernels (INT8 / INT4 / W4A16 / W8A8) for LLM / VLM inference, leveraging Intel® AMX and VNNI for maximum throughput per watt.
- Integrate custom operators into production frameworks and runtimes, including Intel® oneDNN, PyTorch CPU backend, ONNX Runtime, llama.cpp, MLC-LLM, and XNNPACK.
- Conduct systematic performance analysis using Intel® VTune™ Profiler, Linux perf, and roofline modeling; identify bottlenecks and quantify optimization gains.
- Contribute reusable kernels, optimization templates, and best-practice documentation to Intel's internal performance libraries.
Requirements
- The candidate must have the right to work in the country of employment without restriction.
- Currently pursuing a BS (senior year), MS, or PhD in Computer Science, Electrical Engineering, Computer Engineering, Parallel Computing, or a related technical field.
- Available for a minimum of 3 months of full-time or near full-time engagement.
- Strong proficiency in C / C++ and solid understanding of computer architecture, including CPU pipelines, cache hierarchies, memory models, and SIMD execution.
- Hands-on experience with at least one of: x86 SIMD intrinsics (AVX2 / AVX-512 / AMX) ARM Neon / SVE intrinsics OpenMP / TBB-based multi-threaded optimization High-performance CPU GEMM or convolution implementation (e.g., referencing oneDNN, OpenBLAS, XNNPACK, ggml) Experience with performance profiling tools (Intel® VTune™ Profiler, perf) and the ability to translate profile data into concrete optimizations.
- Preferred Qualifications Open-source contributions to projects such as oneDNN, OpenVINO™ toolkit, llama.cpp, ggml, XNNPACK, OpenBLAS, PyTorch, or ONNX Runtime.
- Familiarity with CNN inference optimizations: Winograd, im2col + GEMM, Direct Conv, NCHW / NHWC layout transforms.
- Familiarity with LLM inference optimization techniques: KV-cache management, continuous batching, speculative decoding, and low-bit quantization.
- Experience with compiler infrastructure (LLVM, MLIR, TVM) or auto-tuning frameworks (AutoTVM, Ansor).
- Edge or on-device deployment experience (ARM servers, AI PCs, embedded SoCs).
Sourced directly from Intel’s career page
Your application goes straight to Intel.
Opens intel.wd1.myworkdayjobs.com in a new tab
Specialisation
Open roles at Intel
765 positions
Job ID
/job/PRC-Shanghai/AI-Software-Engineer-Intern_JR0283186-1
Get matched to roles like this
Upload your resume once. We’ll notify you when matching roles open up.
Join talent pool — freeSimilar Other roles
Samsung Semiconductor
Thermal Engineer
San Jose, California, United States|Other
Samsung Semiconductor
Senior Manager, OLED Field Applications Engineering
San Jose, California, United States|Other
Samsung Semiconductor
Compensation Partner
San Jose, California, United States|Other
Micron Technology
HVM PEE Bench Operation Equipment Technician (内製修理テクニシャン)
Hiroshima - Fab 15, Japan|Other