Principal Architect, Memory-Centric Computing · AI Infrastructure

San Jose, California, United States Other

Opens job-boards.greenhouse.io in a new tab

Overview

Please Note: To provide the best candidate experience amidst our high application volumes, each candidate is limited to 10 applications across all open jobs within a 6-month period.  Advancing the World’s Technology Together Our technology solutions power the tools you use every day--including smartphones, electric vehicles, hyperscale data centers, IoT devices, and so much more.
Here, you’ll have an opportunity to be part of a global leader whose innovative designs are pushing the boundaries of what’s possible and powering the future.  We believe innovation and growth are driven by an inclusive culture and a diverse workforce.
We’re dedicated to empowering people to be their true selves.
Together, we’re building a better tomorrow for our employees, customers, partners, and communities.
The AGI (Artificial General Intelligence) Computing Lab is dedicated to solving the complex system-level challenges posed by the growing demands of future AI/ML workloads.
Our team is committed to designing and developing scalable platforms that can effectively handle the computational and memory requirements of these workloads while minimizing energy consumption and maximizing performance.
To achieve this goal, we collaborate closely with both hardware and software engineers to identify and address the unique challenges posed by AI/ML workloads and to explore new computing abstractions that can provide a better balance between the hardware and software components of our systems.
Additionally, we continuously conduct research and development in emerging technologies and trends across memory, computing, interconnect, and AI/ML, ensuring that our platforms are always equipped to handle the most demanding workloads of the future.
By working together as a dedicated and passionate team, we aim to revolutionize the way AI/ML applications are deployed and executed, ultimately contributing to the advancement of AGI in an affordable and sustainable manner.
Join us in our passion to shape the future of computing! As AI models scale, memory — its capacity, bandwidth, cost, and placement — has become the central architectural constraint.
The question is no longer whether to rethink memory system design, but how.
A broad solution space exists: GPU-side shared memory architectures, DRAM and Flash as capacity tiers, fabric-attached pooling and disaggregation, and new interconnect approaches all represent credible paths.
Each carries different tradeoff profiles across workloads, deployment contexts, and cost structures.
This role exists to bring rigor to that question.
You will build workload-grounded models that evaluate the full solution space, quantify where each approach wins and why, and translate those findings into architecture decisions that directly shape product strategy and investment.
You will work closely with architects across compute, networking, storage, and software, and present directly to senior technical leadership.
This is a principal individual contributor role: you personally build the models, own the conclusions, and drive the decisions.
Location:  Daily onsite presence at our San Jose, CA office / U.S.
headquarters in alignment with our Flexible Work policy.
  What You’ll Do Architecture Strategy & Trade Studies Define and evaluate the memory solution space — GPU-side shared memory, DRAM and Flash capacity tiers, pooled/disaggregated memory, and fabric-attached approaches — with quantified value propositions across performance, power, cost/TCO, density, and operability Identify break-even conditions and decision criteria across solution approaches; produce architecture briefs and sensitivity analyses ready for executive audiences Workload-Driven Analysis Ground every architectural comparison in real AI behavior: large model training/inference (including long-context and KV-cache dynamics), MoE and sparse workloads, multi-step agentic pipelines, and recommendation/embedding workloads Build and maintain a workload methodology — microbenchmarks, proxy models, traces — tied to throughput, latency, tail latency, utilization, and SLA impact Memory Hierarchy & Tiered Design Architect and compare memory hierarchies spanning local high-bandwidth memory, DRAM capacity tiers, Flash (NVMe/NVMe-oF), pooled/remote memory, and storage-class approaches; evaluate placement, caching, prefetching, eviction, QoS, and contention policies across tiers Define the software exposure and operational model — runtime, OS, and library expectations — with deployability and observability as first-class requirements Connectivity & Pooling Approaches Evaluate the connectivity and pooling solution space as complementary or competing answers to the memory capacity and bandwidth problem — including GPU-side shared memory (e.g., NVLink-class, Vera Rubin-style), fabric-attached pooling (e.g., CXL-class), and emerging interconnect directions (UALink/UEth-class) Quantify how latency, bandwidth, congestion, topology, and coherency assumptions affect end-to-end AI performance across approaches; drive cross-domain alignment on connectivity trade decisions Hands-On Modeling & Validation Build and extend system simulators and trace-driven models spanning compute, memory, Flash/storage, and IO; write analysis code (Python, C/C++) to automate experiments and process results Profile and instrument GPU/CPU/system stacks to validate model assumptions; run disciplined studies with baselines, parameter sweeps, and reproducible documentation   What You Bring Cross-domain reasoning.
You connect AI workload behavior, memory hierarchy (including DRAM and Flash tiers), connectivity/fabric, and storage/IO into coherent, quantified arguments — evaluating a broad solution space rather than advocating for any single technology.
12+ years in system architecture, performance engineering, or infrastructure modeling with a track record of studies that influenced product direction, investment decisions, or platform strategy.
AI infrastructure fluency.
Working knowledge of training and inference bottlenecks, data movement patterns, and memory pressure across transformers, MoE, and recommendation workloads.
Engineering literacy required; researcher depth is not.
Memory and storage grounding.
Solid understanding of memory hierarchy and tiering principles across DRAM and Flash; storage/IO fundamentals including tail latency, QoS, and NVMe/NVMe-oF behavior; and connectivity/fabric options for shared, pooled, and disaggregated memory.
Credible quantitative modeling, clean experimental methodology, and the ability to defend assumptions under scrutiny from both hardware and software engineers.
Communication that moves decisions.
Converts complex multi-domain analysis into clear recommendations for engineering and executive audiences, written and verbal.
Hands-on experience with GPU-side shared memory architectures, DRAM/Flash tiering for AI workloads, or fabric-attached memory pooling/disaggregation.
Familiarity with NVLink-class fabrics, CXL-class pooling, or emerging interconnect standards (UALink/UEth-class).
Prior ownership of benchmarking strategy for memory-intensive or storage-tiered AI workloads.
Familiarity with inference caching, KV-cache management, or Flash-backed serving at scale.
Experience with discrete-event or trace-driven system simulation.
You’re inclusive, adapting your style to the situation and diverse global norms of our people.
An avid learner, you approach challenges with curiosity and resilience, seeking data to help build understanding.
You’re collaborative, building relationships, humbly offering support and openly welcoming approaches.
Innovative and creative, you proactively explore new ideas and adapt quickly to change.
<span style="font-family: arial, helvetica, sans-serif; fo.

Sourced directly from Samsung Semiconductor’s career page

Your application goes straight to Samsung Semiconductor.

Similar Other roles

Analog Devices

Principal Architect, Memory-Centric Computing · AI Infrastructure

Overview

More from Samsung Semiconductor (53 roles)

Staff Software Engineer AI/ML

Staff Engineer, Test Development

Staff Engineer, SRAM Layout

Samsung Semiconductor

Get matched to roles like this

Similar Other roles

Staff Modeling Engineer

Senior System Software Engineer, Compute System

Senior System Software Engineer, CUDA - Tegra

Senior System Software Engineer, CUDA - Tegra