Opens nvidia.wd5.myworkdayjobs.com in a new tab
Overview
- We’re seeking a hands-on System Administrator who thrives in complex data center environments powering next-generation AI and networking platforms.
- The role involves deploying and maintaining bare-metal and multi-node environments running NVIDIA networking, DGX, and advanced computing systems — focusing on firmware validation infrastructure, BMC management, regression lab automation, and continuous availability of critical test platforms.
- The ideal candidate brings deep Linux expertise and is comfortable solving issues at the hardware-firmware boundary such as BIOS, BMC, NIC firmware, and debug interfaces.
- They also have experience with infrastructure-as-code and monitoring at scale.
- You’ll join a team supporting rapid silicon bring-up and pre-production validation — where uptime and automation directly accelerate product delivery.
- What You’ll Be Doing: Deploy, configure, and maintain NVIDIA DGX, GB, and HPC systems within our data center.
- Monitor and ensure system health through preventive maintenance, upgrades, patching, and resolving issues in both physical and virtual environments.
- Implement and update automation for efficient AI and HPC administration via Bash and Python scripting.
- Lead integration, onboarding, and optimization for new hardware and edge technologies alongside cross-functional teams.
- Provide technical support and collaborate to enable rapid deployment and system bring-up of new technologies.
- What We Need to See: Practical electronics or software engineering diploma, or system administrator certificates (any), or equivalent hands-on experience.
- Minimum 3+ years' experience as a System Administrator handling large-scale data center, HPC, or AI infrastructure deployments.
- Proven background in Linux server environments and hands-on experience with platforms such as NVIDIA DGX and GB, or HPC clusters.
- Solid grasp of system architecture, networking fundamentals, and enterprise storage operations.
- Clear experience in automating system administration tasks and improving workflows for AI and HPC infrastructure.
- Ways to Stand Out from the Crowd: Extensive experience with cluster management, platform monitoring, and best practices in high-performance and GPU-accelerated environments.
- Certifications in system administration, Linux, or enterprise HPC/AI infrastructure.
- Practical experience with rack installation, high-density physical infrastructure, and scalable solutions for demanding workloads.
- Proven troubleshooting skills and ability to collaborate across large-scale technical environments.
Sourced directly from NVIDIA’s career page
Your application goes straight to NVIDIA.
Opens nvidia.wd5.myworkdayjobs.com in a new tab
Specialisation
Open roles at NVIDIA
1995 positions
Job ID
/job/Israel-Yokneam/System-Administrator---Advanced-Data-Center-and-AI-Infrastructure_JR2019633
Get matched to roles like this
Upload your resume once. We’ll notify you when matching roles open up.
Join talent pool — freeSimilar Other roles
Samsung Semiconductor
Senior Engineer, Trace-Driven Simulator Development
San Jose, California, United States|Other
Micron Technology
Senior Engineer, Process Development, Diffusion, NTI
Fab 10N/X, Singapore|Other
Micron Technology
Tactical Capacity Engineer
Taoyuan - Fab 11, Taiwan|Other
Micron Technology
OCT Photo Mfg. Alignment (MA) Engineer/Sr Engineer
Taichung - Fab 16, Taiwan|Other