Senior Solutions Architect, Infiniband and Networking Ethernet - NVIS

Opens nvidia.wd5.myworkdayjobs.com in a new tab

What You'll Do

  • will include building AI/HPC infrastructure for new and existing customers.
  • Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, real-time monitoring, logging, and alerting.
  • Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation, and refinement.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
  • Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.
  • What we need to see: BS/MS/PhD or equivalent experience in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related fields.
  • At least 5+ years of professional experience in networking fundamentals, Ethernet or InfiniBand World.
  • Hands-on experience with network switch/router platforms like Cumulus Linux, SONiC, IOS, JunosOS, and EOS, etc.
  • Possess solid working knowledge of Ethernet/InfiniBand/RDMA core principles.
  • Be proficient in end-to-end IB/Eth cluster deployment, adapter configuration and firmware maintenance, and able to conduct professional performance benchmarking with mainstream RDMA testing tools.
  • Capable of independently diagnosing and troubleshooting typical IB/Eth network anomalies, including link flapping, connection failure, as well as bandwidth and latency jitter issues.
  • Master practical RDMA network optimization strategies such as QP tuning, MTU configuration and congestion control optimization.
  • Hands-on working experience in RDMA-accelerated business scenarios, including distributed storage and high-performance computing clusters.
  • Extensive experience delivering automated network provisioning solutions using tools like Ansible, Salt, and Python.
  • Ability to develop CI/CD pipelines for network operations.
  • Strong written, verbal, and listening skills in English are essential.
  • Ways to stand out from the crowd: Familiarity with cloud networks (AWS, GCP, Azure) is a plus.
  • Advanced Linux or Networking Certifications.
  • Experience with High-performance computing architectures.
  • Understanding of how job schedulers(Slurm, PBS) work. luster management technologies knowledge (bonus credit for BCM (Base Command Manager).) Experience with GPU (Graphics Processing Unit) focused hardware/software.

Sourced directly from NVIDIA’s career page

Your application goes straight to NVIDIA.

NVIDIA logo

NVIDIA

4 Locations

Specialisation
Open roles at NVIDIA
2000 positions
Job ID
/job/India-Pune/Senior-Solutions-Architect--Infiniband-and-Networking-Ethernet---NVIS_JR2019584

Get matched to roles like this

Upload your resume once. We’ll notify you when matching roles open up.

Join talent pool — free

Similar Other roles