Principal Architect, Memory-Centric Computing · AI Infrastructure
Opens job-boards.greenhouse.io in a new tab
Overview
- Please Note: To provide the best candidate experience amidst our high application volumes, each candidate is limited to 10 applications across all open jobs within a 6-month period. Advancing the World’s Technology Together Our technology solutions power the tools you use every day--including smartphones, electric vehicles, hyperscale data centers, IoT devices, and so much more.
- Here, you’ll have an opportunity to be part of a global leader whose innovative designs are pushing the boundaries of what’s possible and powering the future. We believe innovation and growth are driven by an inclusive culture and a diverse workforce.
- We’re dedicated to empowering people to be their true selves.
- Together, we’re building a better tomorrow for our employees, customers, partners, and communities.
- The AGI (Artificial General Intelligence) Computing Lab is dedicated to solving the complex system-level challenges posed by the growing demands of future AI/ML workloads.
- Our team is committed to designing and developing scalable platforms that can effectively handle the computational and memory requirements of these workloads while minimizing energy consumption and maximizing performance.
- To achieve this goal, we collaborate closely with both hardware and software engineers to identify and address the unique challenges posed by AI/ML workloads and to explore new computing abstractions that can provide a better balance between the hardware and software components of our systems.
- Additionally, we continuously conduct research and development in emerging technologies and trends across memory, computing, interconnect, and AI/ML, ensuring that our platforms are always equipped to handle the most demanding workloads of the future.
- By working together as a dedicated and passionate team, we aim to revolutionize the way AI/ML applications are deployed and executed, ultimately contributing to the advancement of AGI in an affordable and sustainable manner.
- Join us in our passion to shape the future of computing! As AI models scale, memory — its capacity, bandwidth, cost, and placement — has become the central architectural constraint.
- The question is no longer whether to rethink memory system design, but how.
- A broad solution space exists: GPU-side shared memory architectures, DRAM and Flash as capacity tiers, fabric-attached pooling and disaggregation, and new interconnect approaches all represent credible paths.
- Each carries different tradeoff profiles across workloads, deployment contexts, and cost structures.
- This role exists to bring rigor to that question.
- You will build workload-grounded models that evaluate the full solution space, quantify where each approach wins and why, and translate those findings into architecture decisions that directly shape product strategy and investment.
- You will work closely with architects across compute, networking, storage, and software, and present directly to senior technical leadership.
- This is a principal individual contributor role: you personally build the models, own the conclusions, and drive the decisions.
- Location: Daily onsite presence at our San Jose, CA office / U.S.
- headquarters in alignment with our Flexible Work policy.
- What You’ll Do Architecture Strategy & Trade Studies Define and evaluate the memory solution space — GPU-side shared memory, DRAM and Flash capacity tiers, pooled/disaggregated memory, and fabric-attached approaches — with quantified value propositions across performance, power, cost/TCO, density, and operability Identify break-even conditions and decision criteria across solution approaches; produce architecture briefs and sensitivity analyses ready for executive audiences Workload-Driven Analysis Ground every architectural comparison in real AI behavior: large model training/inference (including long-context and KV-cache dynamics), MoE and sparse workloads, multi-step agentic pipelines, and recommendation/embedding workloads Build and maintain a workload methodology — microbenchmarks, proxy models, traces — tied to throughput, latency, tail latency, utilization, and SLA impact Memory Hierarchy & Tiered Design Architect and compare memory hierarchies spanning local high-bandwidth memory, DRAM capacity tiers, Flash (NVMe/NVMe-oF), pooled/remote memory, and storage-class approaches; evaluate placement, caching, prefetching, eviction, QoS, and contention policies across tiers Define the software exposure and operational model — runtime, OS, and library expectations — with deployability and observability as first-class requirements Connectivity & Pooling Approaches Evaluate the connectivity and pooling solution space as complementary or competing answers to the memory capacity and bandwidth problem — including GPU-side shared memory (e.g., NVLink-class, Vera Rubin-style), fabric-attached pooling (e.g., CXL-class), and emerging interconnect directions (UALink/UEth-class) Quantify how latency, bandwidth, congestion, topology, and coherency assumptions affect end-to-end AI performance across approaches; drive cross-domain alignment on connectivity trade decisions Hands-On Modeling & Validation Build and extend system simulators and trace-driven models spanning compute, memory, Flash/storage, and IO; write analysis code (Python, C/C++) to automate experiments and process results Profile and instrument GPU/CPU/system stacks to validate model assumptions; run disciplined studies with baselines, parameter sweeps, and reproducible documentation What You Bring Cross-domain reasoning.
- You connect AI workload behavior, memory hierarchy (including DRAM and Flash tiers), connectivity/fabric, and storage/IO into coherent, quantified arguments — evaluating a broad solution space rather than advocating for any single technology.
- 12+ years in system architecture, performance engineering, or infrastructure modeling with a track record of studies that influenced product direction, investment decisions, or platform strategy.
- AI infrastructure fluency.
- Working knowledge of training and inference bottlenecks, data movement patterns, and memory pressure across transformers, MoE, and recommendation workloads.
- Engineering literacy required; researcher depth is not.
- Memory and storage grounding.
- Solid understanding of memory hierarchy and tiering principles across DRAM and Flash; storage/IO fundamentals including tail latency, QoS, and NVMe/NVMe-oF behavior; and connectivity/fabric options for shared, pooled, and disaggregated memory.
- Credible quantitative modeling, clean experimental methodology, and the ability to defend assumptions under scrutiny from both hardware and software engineers.
- Communication that moves decisions.
- Converts complex multi-domain analysis into clear recommendations for engineering and executive audiences, written and verbal.
- Hands-on experience with GPU-side shared memory architectures, DRAM/Flash tiering for AI workloads, or fabric-attached memory pooling/disaggregation.
- Familiarity with NVLink-class fabrics, CXL-class pooling, or emerging interconnect standards (UALink/UEth-class).
- Prior ownership of benchmarking strategy for memory-intensive or storage-tiered AI workloads.
- Familiarity with inference caching, KV-cache management, or Flash-backed serving at scale.
- Experience with discrete-event or trace-driven system simulation.
- You’re inclusive, adapting your style to the situation and diverse global norms of our people.
- An avid learner, you approach challenges with curiosity and resilience, seeking data to help build understanding.
- You’re collaborative, building relationships, humbly offering support and openly welcoming approaches.
- Innovative and creative, you proactively explore new ideas and adapt quickly to change.
- <span style="font-family: arial, helvetica, sans-serif; fo.
Sourced directly from Samsung Semiconductor’s career page
Your application goes straight to Samsung Semiconductor.
More from Samsung Semiconductor (53 roles)
Opens job-boards.greenhouse.io in a new tab
Get matched to roles like this
Upload your resume once. We’ll notify you when matching roles open up.
Join talent pool — free