Sr Systems Engineer (Data Center Operations)

Opens cadence.wd1.myworkdayjobs.com in a new tab

What You'll Do

  • Deploy and maintain Linux-based compute, GPU, and storage infrastructure across data center environments, ensuring high availability and consistent performance.
  • Configure and bring up InfiniBand fabric and GPU clusters, including switch configuration, subnet management, and end-to-end validation testing.
  • Install, rack, label, and cable server hardware — including CPUs, memory, NICs, HDDs, and RAID components — in line with approved design specifications and quality standards.
  • Troubleshoot and resolve complex operational issues across Linux systems, GPU platforms, networking equipment, and storage infrastructure.
  • Conduct daily health checks of systems and infrastructure components, proactively identifying and mitigating risks before they affect service delivery.
  • Monitor the data center environment using established alerting frameworks, escalate issues appropriately, and drive timely service restoration in line with SLAs.
  • Coordinate with vendors and onsite staff for hardware delivery, diagnostics, replacement, and warranty fulfilment.
  • Maintain accurate operational documentation, system configurations, and runbooks to support consistency and knowledge sharing across the team.
  • Participate in an on-call rotation and provide on-site or remote support during maintenance windows and operational incidents.
  • Collaborate with global infrastructure and operations teams to support data center builds, migrations, refresh programmes , and process improvement initiatives.
  • Job Qualifications: Bachelor’s degree in Computer Science , Engineering, Information Technology, or equivalent practical experience. 3–6 years of hands-on experience in Linux system administration, troubleshooting, and performance validation.
  • Proficiency with Linux command-line tools and shell scripting (Bash or equivalent).
  • Experience with cluster bring-up, GPU server deployment, driver installation, and system-level configuration.
  • Hands-on experience setting up and validating GPU servers in clustered environments, including end-to-end GPU testing in InfiniBand-based clusters.
  • Working knowledge of InfiniBand networking, including switch configuration and subnet management.
  • Solid understanding of networking fundamentals including the OSI model and TCP/IP protocol suite (IP, ARP, ICMP, TCP, UDP).
  • Experience installing, configuring, and troubleshooting routers, switches, and terminal servers for out-of-band management.
  • Familiarity with fibre and copper cabling in IP and SAN environments.
  • Strong organisational skills with meticulous attention to detail in data center environments.
  • Clear verbal and written communication skills, with the ability to work effectively across cross-functional and global teams.
  • Additional Skills/ Preferences: Experience supporting HPC, AI, or large-scale GPU environments.
  • Exposure to data center monitoring and alerting platforms.
  • Experience documenting operational processes and maintaining technical runbooks.
  • Familiarity with large-scale data center buildouts or refresh programmes.
  • Cadence is committed to equal employment opportunity and employment equity throughout all levels of the organization.
  • We strive to attract a qualified and diverse candidate pool and encourage diversity and inclusion in the workplace.
  • We’re doing work that matters.
  • Help us solve what others can’t.

Sourced directly from Cadence Design Systems’s career page

Your application goes straight to Cadence Design Systems.

Cadence Design Systems logo

Cadence Design Systems

FELDKIRCHEN (Munich)

Specialisation
Open roles at Cadence Design Systems
637 positions
Job ID
/job/FELDKIRCHEN-Munich/Sr-Systems-Engineer--Data-Center-Operations-_R55078

Get matched to roles like this

Upload your resume once. We’ll notify you when matching roles open up.

Join talent pool — free

Similar Other roles