AI/ML Data Engineer

Santa Clara, CA Other

Opens marvell.wd1.myworkdayjobs.com in a new tab

What You'll Do

Architect and deliver production-grade ELT/ETL pipelines across Databricks and Snowflake for ML training, validation, and inference workflows Build and maintain AI-ready datasets optimized for both ML model consumption and Gen AI use cases — clean, versioned, and reproducible Curate and structure high-quality datasets for RAG pipelines and embedding generation; design document chunking strategies, metadata schemas, and grounding data layers that directly improve retrieval accuracy and Gen AI application performance Implement data quality frameworks and data contracts at pipeline boundaries to protect model and application integrity Build and manage vector-ready data assets, integrating with vector stores and embedding infrastructure for Gen AI applications Establish DataOps best practices — CI/CD for pipelines, data lineage, versioning, and cost observability across platforms Develop Streamlit applications and React-based UIs to surface model outputs, data products, and internal AI tooling Partner with ML Engineers, Data Scientists, and AI Engineers to translate modeling and application requirements into reliable data products Contribute to lakehouse architecture decisions, storage optimization, and compute efficiency across the AI/ML data platform What We're Looking For Required Skills Databricks — Spark, Delta Lake, Databricks Workflows, Unity Catalog; production-grade experience required Snowflake — advanced SQL, data modeling, performance tuning, cost management Python — strong engineering fundamentals; PySpark, pandas, pipeline frameworks (dbt, Airflow, or equivalent) SQL — expert level; complex transformations, query optimization, schema design Front-End Development — React, JavaScript/TypeScript, REST API integration, and Streamlit for rapid AI/ML application prototyping and internal tooling Solid understanding of ML lifecycle — feature stores, training pipelines, inference data patterns Cloud-native experience on AWS, Azure, or GCP Data quality and observability tooling Nice to Have Hands-on experience with MLflow, Feast, LangChain, or LlamaIndex Exposure to graph databases (Neo4j, Neptune, or equivalent) Exposure to vector databases (Pinecone, Weaviate, pgvector, or equivalent) Experience with streaming pipelines (Kafka, Kinesis, Spark Structured Streaming) Familiarity with LLM evaluation frameworks and dataset benchmarking Expected Base Pay Range (USD) 105,200 - 157,600, $ per annum The successful candidate’s starting base pay will be determined based on job-related skills, experience, qualifications, work location and market conditions.
The expected base pay range for this role may be modified based on market conditions.
Additional Compensation and Benefit Elements Marvell is committed to providing exceptional, comprehensive benefits that support our employees at every stage - from internship to retirement and through life’s most important moments.
Our offerings are built around four key pillars: financial well-being, family support, mental and physical health, and recognition.
Highlights include an employee stock purchase plan with a 2-year look back, family support programs to help balance work and home life, robust mental health resources to prioritize emotional well-being, and a recognition and service awards to celebrate contributions and milestones.
We look forward to sharing more with you during the interview process.
Any applicant who requires a reasonable accommodation during the selection process should contact Marvell HR Helpdesk at TAOps@marvell.com .
Interview Integrity To support fair and authentic hiring practices, candidates are not permitted to use AI tools (such as transcription apps, real-time answer generators like ChatGPT or Copilot, or automated note-taking bots) during interviews.
These tools must not be used to record, assist with, or enhance responses in any way.
Our interviews are designed to evaluate your individual experience, thought process, and communication skills in real time.
Use of AI tools without prior instruction from the interviewer will result in disqualification from the hiring process.
This position may require access to technology and/or software subject to U.S. export control laws and regulations, including the Export Administration Regulations (EAR).
As such, applicants must be eligible to access export-controlled information as defined under applicable law.
Marvell may be required to obtain export licensing approval from the U.S.
Department of Commerce and/or the U.S.
Department of State.
Except for U.S. citizens, lawful permanent residents, or protected individuals as defined by 8 U.S.C. 1324b(a)(3), all applicants may be subject to an export license review process prior to employment. #LI-TT1

Sourced directly from Marvell Technology’s career page

Your application goes straight to Marvell Technology.

Similar Other roles

Samsung Semiconductor

AI/ML Data Engineer

What You'll Do

More from Marvell Technology (657 roles)

Senior Staff Firmware Engineer - - memory constrained embedded system development/data center interconnectivity

Senior Staff ASIC Product Engineer

Digital IC Design Intern

Marvell Technology

Get matched to roles like this

Similar Other roles

Senior Manager, Memory Sales

Senior Engineer, DRAM Applications

Director, SMB Memory

Senior / Principal DRAM Product Development Engineer – DEG Technology