Opens nvidia.wd5.myworkdayjobs.com in a new tab
Overview
- We are looking for a system engineer to join our Failure Analysis engineering team under the System Product Engineering group in the NVIDIA.
- As a System Failure Analysis (FA) Engineer, you are responsible for the end-to-end investigation of product failures.
- You act as the Failure analysis product owner diagnosing complex issues that span Hardware, Software, Firmware, and Mechanical boundaries of the investigation, synthesizing data from all engineering disciplines to reach a definitive root cause.
- While you provide the architectural oversight for the team, you remain deeply technical and active in the laboratory environment.
- What You’ll Be Doing: Hands-on Lab Investigation: You are active in the lab environment.
- You perform advanced debugging, characterize system behavior, run reproductions of failures in the lab, and utilize sophisticated lab equipment to validate hypotheses, bridging the gap between high-level data and physical hardware reality.
- Multidisciplinary Failure Analysis: Lead deep-dive investigations into system-level failures, understand and analyse customer usage for the product, diagnose how software execution, firmware logic, and hardware components interact to cause specific failure modes.
- Root Cause Ownership: Drive the investigation lifecycle from initial symptom to final physics-of-failure or logic-error identification.
- Task Force Leadership: Orchestrate and lead cross-organizational technical task forces at the company level.
- You align experts from HW, SW, Mechanical, and NPI teams to solve high-priority technical problems.
- Advanced Data & AI Integration: Define and utilize sophisticated data analysis tools and AI-driven methodologies.
- You correlate customer failure patterns with production telemetry and RMA history to identify hidden trends and systemic risks.
- Customer Quality Support: Take part in the customer interface by interacting with NVIDIA’s Customer Quality Engineers.
- You provide the deep technical evidence and root-cause clarity needed for quality reports and high-level technical presentations.
- Strategic Lab Direction: Define the high-level debug strategy and complex test plans for the lab.
- You guide hardware practical engineers on characterization requirements and system-level stress testing. What We Need to See: Lab Proficiency: Expert-level experience with lab equipment and the ability to conduct complex characterization on state-of-the-art hardware.
- System Engineering Depth: B.Sc/B.Tech in Electrical Engineering, or a related technical field.
- Product Development Experience: 5+ years of experience in Product Development, System-Level Debugging, or Architecture.
- You must understand how a product is designed and manufactured to effectively analyze its failure.
- Full-Stack Debugging Skills: Proven ability to troubleshoot issues where the hardware, software, and firmware interface.
- You are comfortable navigating different technical domains to find a root cause.
- Data Fluency: Experience using data analysis tools and a strong interest in applying AI/Machine Learning to automate and scale failure analysis processes.
- Leadership Presence: The ability to lead technical teams through high-pressure investigations and clearly communicate findings to both engineering and quality stakeholders.
- Ways to Stand Out from the crowd: Hybrid Technical Background: Experience in Board Design combined with SW or Firmware development.
- NPI to Mass Production Expertise: A track record of solving technical problems during the transition from prototype to high-volume manufacturing.
- Data Tooling: Experience building custom Python scripts or SQL dashboards to visualize and analyze global product failure distributions.
- Failure Avoidance Mindset: Ability to provide technical feedback to R&D teams based on FA findings to improve the robustness of future products.
- NVIDIA is widely considered to be one of the technology world’s most desirable employers.
- We have some of the most forward-thinking and hardworking people on the planet working for us.
Sourced directly from NVIDIA’s career page
Your application goes straight to NVIDIA.
Opens nvidia.wd5.myworkdayjobs.com in a new tab
Specialisation
Open roles at NVIDIA
2000 positions
Job ID
/job/Israel-Yokneam/Senior-System-Failure-Analysis-Engineer_JR2014028
Get matched to roles like this
Upload your resume once. We’ll notify you when matching roles open up.
Join talent pool — freeSimilar Other roles
Samsung Semiconductor
Staff Technical Program Manager
San Jose, California, United States|Other
Samsung Semiconductor
Associate, Executive Administration
San Jose, California, United States|Other
Micron Technology
STAFF ENGINEER GFAC SASIA - ELECTRICAL
Fab 10A, Singapore|Other
Micron Technology
TEST HBM DATA ANALYST
Taichung - MTB, Taiwan|Other