Opens cadence.wd1.myworkdayjobs.com in a new tab
What You'll Do
- Deploy and maintain Linux-based compute, GPU, and storage infrastructure across data center environments, ensuring high availability and consistent performance.
- Configure and bring up InfiniBand fabric and GPU clusters, including switch configuration, subnet management, and end-to-end validation testing.
- Install, rack, label, and cable server hardware — including CPUs, memory, NICs, HDDs, and RAID components — in line with approved design specifications and quality standards.
- Troubleshoot and resolve complex operational issues across Linux systems, GPU platforms, networking equipment, and storage infrastructure.
- Conduct daily health checks of systems and infrastructure components, proactively identifying and mitigating risks before they affect service delivery.
- Monitor the data center environment using established alerting frameworks, escalate issues appropriately, and drive timely service restoration in line with SLAs.
- Coordinate with vendors and onsite staff for hardware delivery, diagnostics, replacement, and warranty fulfilment.
- Maintain accurate operational documentation, system configurations, and runbooks to support consistency and knowledge sharing across the team.
- Participate in an on-call rotation and provide on-site or remote support during maintenance windows and operational incidents.
- Collaborate with global infrastructure and operations teams to support data center builds, migrations, refresh programmes , and process improvement initiatives.
- Job Qualifications: Bachelor’s degree in Computer Science , Engineering, Information Technology, or equivalent practical experience. 3–6 years of hands-on experience in Linux system administration, troubleshooting, and performance validation.
- Proficiency with Linux command-line tools and shell scripting (Bash or equivalent).
- Experience with cluster bring-up, GPU server deployment, driver installation, and system-level configuration.
- Hands-on experience setting up and validating GPU servers in clustered environments, including end-to-end GPU testing in InfiniBand-based clusters.
- Working knowledge of InfiniBand networking, including switch configuration and subnet management.
- Solid understanding of networking fundamentals including the OSI model and TCP/IP protocol suite (IP, ARP, ICMP, TCP, UDP).
- Experience installing, configuring, and troubleshooting routers, switches, and terminal servers for out-of-band management.
- Familiarity with fibre and copper cabling in IP and SAN environments.
- Strong organisational skills with meticulous attention to detail in data center environments.
- Clear verbal and written communication skills, with the ability to work effectively across cross-functional and global teams.
- Additional Skills/ Preferences: Experience supporting HPC, AI, or large-scale GPU environments.
- Exposure to data center monitoring and alerting platforms.
- Experience documenting operational processes and maintaining technical runbooks.
- Familiarity with large-scale data center buildouts or refresh programmes.
- Cadence is committed to equal employment opportunity and employment equity throughout all levels of the organization.
- We strive to attract a qualified and diverse candidate pool and encourage diversity and inclusion in the workplace.
- We’re doing work that matters.
- Help us solve what others can’t.
Sourced directly from Cadence Design Systems’s career page
Your application goes straight to Cadence Design Systems.
Opens cadence.wd1.myworkdayjobs.com in a new tab
Specialisation
Open roles at Cadence Design Systems
637 positions
Job ID
/job/FELDKIRCHEN-Munich/Sr-Systems-Engineer--Data-Center-Operations-_R55078
Get matched to roles like this
Upload your resume once. We’ll notify you when matching roles open up.
Join talent pool — freeSimilar Other roles
Samsung Semiconductor
Staff Technical Program Manager
San Jose, California, United States|Other
Samsung Semiconductor
Staff Engineer, SRAM Circuit Design
San Jose, California, United States|Other
Samsung Semiconductor
Senior Manager, Market Intelligence
San Jose, California, United States|Other
Samsung Semiconductor
Manager, Memory Sales
San Jose, California, United States|Other