● ML Engineer · RecSys · Reliability · Inference · Duke ECE PhD (2026)

Recommendation, reliability & inference — shipped.

ML Engineer — Recommendation · LLM/VLM Reliability · Inference

Duke ECE PhD and production ML engineer. I ship large-scale retrieval & ranking (GraphSAGE + Faiss, multitask BERT broad match — ~1% revenue uplift at Pinterest), build LLM/VLM reliability (cross-modal hallucination detection and selective prediction; TMLR 2026), and optimize inference & serving (quantization, LoRA, distillation, vLLM / SGLang) under real cost and latency budgets.

View work Get in touch

Patents

Publications

Revenue uplift @ Pinterest

3.4×

Cheaper memory (SAGE)

Open to

Where I can plug in

Recommendation & retrieval, LLM/VLM reliability, or agent memory — I bring production experience and research depth to all three.

Recommendation & Retrieval

Candidate generation, embedding retrieval, and ranking at production scale.

Pinterest GraphSAGE + Faiss · 1% revenue uplift · BERT broad-match CTR

LLM / VLM Reliability

Hallucination detection, uncertainty, and trust & safety for foundation models.

TMLR 2026 cross-modal consistency · GPT-4V, Qwen-VL, LLaMA-VL benchmarks

Agent Memory

Memory infrastructure for LLM agents across the write / retrieve / forget lifecycle.

SAGE — beats Mem0 7/7 · 3.4× cheaper · code public

The thesis

Recommendation, reliability, inference.

From ranking ads to serving agents, one engineer owns three surfaces — what to retrieve, whether to trust the output, and how to serve it within budget.

Recommendation

Retrieval & ranking at scale

GraphSAGE advertiser-similarity embeddings + Faiss ANN and a multitask BERT broad-match model, shipped into Pinterest's production Spinner auto-targeting — owned end-to-end from indexing to online serving and eval.

Pinterest · ~1% revenue uplift (A/B)

Reliability

Trustworthy LLM/VLM outputs

Cross-modal consistency for hallucination detection and selective prediction — knowing when a model should abstain or escalate rather than answer wrongly, benchmarked across GPT-4V, Qwen-VL, and LLaMA-VL.

TMLR 2026 · epistemic uncertainty

Inference

Efficient serving

Quantization, LoRA, and distillation with vLLM / SGLang serving; SAGE cuts agent-memory add-phase API cost ~3.4× and latency ~2.5× and skips ~16–18% of LLM calls under explicit budgets.

SAGE · 3.4× cost · 2.5× latency

Retrieve the right candidates, trust the output, serve it within budget — recommendation, reliability, and inference are one ML-engineering loop, and SAGE runs all three.

How I got here

The roadmap to agent memory

2019 — 2022→ the write path
Continual Learning
Research Fellow / Deep Learning Research Intern · Samsung Semiconductor · SOC R&D Lab
- Led continual & federated learning research — 3 patents, 2 publications.
- Communication-efficient federated learning via global-model quantization; server-side refinement without client data access.
- Sustainable continual learning: task-similarity detection + encoder reuse — the same problem class as bounded memory growth & forgetting in agent memory.
- GAN Memory with No Forgetting (NeurIPS 2020) — parameter-efficient generative replay.
2022 & 2023→ the read path
Recommendation Systems
Research Intern — Ads Retrieval & Targeting · Pinterest Labs
- 2023: Shipped a graph-based advertiser-similarity retrieval pipeline (GraphSAGE embeddings + Faiss ANN) into Pinterest's auto-targeting; 1% revenue uplift in A/B testing.
- 2022: Multitask BERT model for broad match — improved ad-query relevance with measurable CTR gains.
- Owned large-scale retrieval features end-to-end: ingestion, embedding indexing, candidate scoring, online serving, eval.
2020 — Present→ the serving path
Efficient & Reliable ML
PhD Research — Duke University · Advisor: Prof. Ricardo Henao
- Cross-modal consistency for hallucination detection in VLMs (TMLR 2026) — reliable identification of low-confidence / “unknown” predictions.
- Multi-source data-free transfer learning (IEEE MLSP 2025 Oral) — efficient model recycling under white-box & black-box access.
- Sustainable continual learning (IEEE MLSP 2025) — parameter reuse against superlinear model growth.
2025 — Now◀ where it all leads
Agentic Memory
Memory Management System for AI Agents · Duke University · the convergence
- SAGE — a novelty gate for efficient memory evolution in agentic LLMs (ARR under review).
- Cost-efficient, low-latency memory database updates: when to write, summarize, compress, or forget.
- Beats Mem0 on 7/7 settings · 3.4× lower cost · 2.5× lower latency. Repo public & live.

Selected work

Projects & research

Pinterest Ads Retrieval

Retrieval / RecSys · Shipped to production

Graph-based advertiser-similarity retrieval (GraphSAGE + Faiss ANN) plus a multitask BERT broad-match model, integrated into Pinterest's Spinner workflow.

1% revenue uplift (A/B)
Measurable CTR improvement
End-to-end: indexing → serving → eval

VLM Reliability

Trust & Safety · TMLR 2026

A cross-modal consistency framework that detects hallucinations in vision-language models by comparing visual- and text-grounded reasoning paths.

Benchmarked GPT-4V, Qwen-VL, LLaMA-VL
Quantified epistemic uncertainty
Fallback-enabled closed-set classification

SAGE

Agent Memory · ARR — under review · code public

A novelty gate for efficient memory evolution in agentic LLMs. Frames memory evolution as novelty detection via density estimation, so the system writes/consolidates only what matters.

Beats Mem0 on 7/7 settings
3.4× lower cost · 2.5× lower latency
Balances memory freshness vs. compute overhead

Continual & Federated Learning

Continual Learning · Samsung · 3 patents

Communication-efficient federated learning, sustainable continual learning, and continual few-shot learning — the write-path backbone of agent memory.

3 patents filed
GAN Memory w/ No Forgetting (NeurIPS 2020)
Bounded memory growth & anti-forgetting

Toolkit

Skills & publications

Memory & Retrieval

Embedding retrieval
Retrieval & ranking
Faiss (IVF / HNSW)
Graph indexing
Conflict resolution
Summarization & fusion
Memory lifecycle (write/update/compress/forget)
RAG pipelines

LLM & VLM

Prompt engineering
In-context learning
Hallucination & conflict resolution
Quantization · LoRA · distillation
Self-supervised learning
VLMs (LLaMA, Qwen-VL, GPT-class)

ML Foundations

Representation learning
Generative models
Continual / federated learning
Domain adaptation
Interpretable ML
Large-scale recommendation

Systems & Infra

Production ML pipelines
Online inference
A/B testing
Distributed training (Slurm)
Docker · AWS · Spark
Hugging Face

Languages & Tools

Python
C++
SQL
Bash
PyTorch
HF Transformers
Git
Linux

Selected publications

SAGE: A Novelty Gate for Efficient Memory Evolution in Agentic LLMs
ARR (under review)
Fallback-Enabled Closed-Set Classification: Cross-Modal Consistency in Vision-Language Models
TMLR 2026
GAN Memory with No Forgetting
NeurIPS 2020
Model Recycling Framework for Multi-source Data-free Supervised Transfer Learning
IEEE MLSP 2025 (Oral)
Toward Sustainable Continual Learning: Detection and Knowledge Repurposing of Similar Tasks
IEEE MLSP 2025
A Holistic Approach to Interpretability in Financial Lending
Decision Support Systems 2022

Let's talk

Hiring an ML engineer for RecSys, reliability, or inference?

Available June 2026. Production retrieval/ranking experience plus LLM/VLM reliability and efficient inference. Reach out.

scarlett.95.wang@gmail.com Résumé