Opens nvidia.wd5.myworkdayjobs.com in a new tab
Overview
- NVIDIA is the world leader in computer graphics, artificial intelligence, and accelerated computing.
- For over 25 years, we have been at the forefront of research and engineering around the greatest advances in technology.
- Our history of innovation drives us to solve the worlds hardest problems.
- NVIDIA is looking for Senior Cloud Infrastructure/DevOps Solutions Architect to join its NVIDIA Infrastructure Specialist Team.
- Academic and commercial groups around the world are using NVIDIA products to revolutionize deep learning and data analytics, and to power data centers.
- Join the team building many of the largest and fastest AI/HPC systems in the world! We are looking for someone with the ability to work on a dynamic customer focused team that requires excellent interpersonal skills.
- This role will be interacting with customers, partners and internal teams, to analyze, define and implement large scale Networking projects.
- The scope of these efforts includes a combination of Networking, System Design and Automation and being the face to the customer! What You'll Be Doing: Develop and maintain continuous integration and delivery pipelines.
- Develop tooling to automate deployment and management of large-scale infrastructure environments, to automate operational monitoring and alerting, and to enable self-service consumption of resources.
- Deploy monitoring solutions for the servers, network and storage.
- Perform troubleshooting bottom up from bare metal, operating system, software stack and application level.
- Being a technical resource, develop, re-define and document standard methodologies to share with internal teams Support Research & Development activities and engage in POCs/POVs for future improvements.
- What We Need To See: BS/MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering fields with at least 8 years work or research experience in networking fundamentals, TCP/IP stack, and data center architecture.
- 5+ Years of Design, implement and maintain large scale HPC/AI clusters with monitoring, logging and alerting Manage Linux job/workload schedulers and orchestration tools.
- Knowledge of HPC and AI solution technologies from CPU’s and GPU’s to high speed interconnects and supporting software.
- Direct design, implementation and management experience with cloud computing platforms (e.g.
- AWS, Azure, Google Cloud).
- Experience with job scheduling workloads and orchestration technologies such as Slurm, Kubernetes and Singularity.
- Excellent knowledge of Windows and Linux (Redhat/CentOS and Ubuntu) networking (sockets, firewalld, iptables, wireshark, etc.) and internals, ACLs and OS level security protection and common protocols e.g.
- Experience with multiple storage solutions such as Lustre, GPFS, zfs and xfs.
- Familiarity with newer and emerging storage technologies.
- Python programming and bash scripting experience.
- Comfortable with automation and configuration management tools including Jenkins, Ansible, Puppet/Chef, etc.
- Deep knowledge of Networking Protocols like InfiniBand, Ethernet Deep understanding and experience with virtual systems (for example VMware, Hyper-V, KVM, or Citrix).
- Strong written, verbal, and listening skills in English are critical.
- Ways To Stand Out From The Crowd: Knowledge of CPU and/or GPU architecture.
- Knowledge of Kubernetes, container related microservice technologies.
- Experience with GPU-focused hardware/software (DGX, CUDA.) Background with RDMA (InfiniBand or RoCE) fabrics.
- NVIDIA is widely considered to be one of the technology world’s most desirable employers.
- We have some of the most forward-thinking and hardworking individuals in the world working for us.
- If you're creative and autonomous, we want to hear from you.
Sourced directly from NVIDIA’s career page
Your application goes straight to NVIDIA.
Opens nvidia.wd5.myworkdayjobs.com in a new tab
Specialisation
Open roles at NVIDIA
2000 positions
Job ID
/job/India-Mumbai/Senior-Solution-Architect--Cloud-Infrastructure---DevOps_JR2017866
Get matched to roles like this
Upload your resume once. We’ll notify you when matching roles open up.
Join talent pool — freeSimilar Other roles
Samsung Semiconductor
Technical Account Manager, DRAM Business Enablement
San Jose, California, United States|Other
Micron Technology
Senior Photolithography Equipment Engineer
Boise, ID - ID1|Other
Micron Technology
Senior Photolithography Process Engineer
Boise, ID - ID1|Other
Micron Technology
NTI Physical Failure Analysis Technician
Fab 10N/X, Singapore|Other