● Agent Memory & Systems · Duke ECE PhD (2026)

Memory infrastructure for LLM agents.

ML Engineer / Researcher — Agent Memory · Retrieval · Production Systems

I build the write/retrieve/forget lifecycle for agent memory. SAGE — a novelty gate for efficient memory evolution — beats Mem0 7/7 on the LoCoMo benchmark at ~3.4× lower cost and ~2.5× lower latency. Public arXiv preprint + single-command reproducible code.

View work Get in touch

Patents

Publications

Revenue uplift @ Pinterest

3.4×

Cheaper memory (SAGE)

Open to

Where I can plug in

Recommendation & retrieval, LLM/VLM reliability, or agent memory — I bring production experience and research depth to all three.

Recommendation & Retrieval

Candidate generation, embedding retrieval, and ranking at production scale.

Pinterest GraphSAGE + Faiss · 1% revenue uplift · BERT broad-match CTR

LLM / VLM Reliability

Hallucination detection, uncertainty, and trust & safety for foundation models.

TMLR 2026 cross-modal consistency · GPT-4V, Qwen-VL, LLaMA-VL benchmarks

Agent Memory

Memory infrastructure for LLM agents across the write / retrieve / forget lifecycle.

SAGE — beats Mem0 7/7 · 3.4× cheaper · code public

The thesis

Three vertices. One machine.

Agent memory is what holds my work together — the same machinery I've shipped and researched for years, wearing an agent costume.

Read path

Retrieval & ranking

A memory system embeds, stores, and retrieves top-k under latency and cost constraints, then ranks by relevance to context. That's a two-tower / ANN problem — exactly the Pinterest GraphSAGE + Faiss systems I shipped to production.

Pinterest · GraphSAGE + Faiss · 1% revenue uplift

Write path

Continual learning

What to write, consolidate, and forget — and how to avoid catastrophic interference with old knowledge — is the continual-learning problem stated verbatim. My Samsung work (continual + federated, 3 patents) is the backbone of the write path.

Samsung · continual + federated · 3 patents

Serving path

Production systems & reliability

Memory at production scale means latency budgets, cost per query, and reliable serving. SAGE's cost and latency reductions are systems results — and my reliability research (TMLR 2026) keeps the read/write paths trustworthy.

SAGE · cost + latency wins · TMLR 2026 reliability

Retrieval (read) + continual learning (write) + systems (serve) = memory infrastructure. I have real artifacts at all three vertices — and they converge on SAGE.

How I got here

The roadmap to agent memory

2019 — 2022→ the write path
Continual Learning
Research Fellow / Deep Learning Research Intern · Samsung Semiconductor · SOC R&D Lab
- Led continual & federated learning research — 3 patents, 2 publications.
- Communication-efficient federated learning via global-model quantization; server-side refinement without client data access.
- Sustainable continual learning: task-similarity detection + encoder reuse — the same problem class as bounded memory growth & forgetting in agent memory.
- GAN Memory with No Forgetting (NeurIPS 2020) — parameter-efficient generative replay.
2022 & 2023→ the read path
Recommendation Systems
Research Intern — Ads Retrieval & Targeting · Pinterest Labs
- 2023: Shipped a graph-based advertiser-similarity retrieval pipeline (GraphSAGE embeddings + Faiss ANN) into Pinterest's auto-targeting; 1% revenue uplift in A/B testing.
- 2022: Multitask BERT model for broad match — improved ad-query relevance with measurable CTR gains.
- Owned large-scale retrieval features end-to-end: ingestion, embedding indexing, candidate scoring, online serving, eval.
2020 — Present→ the serving path
Efficient & Reliable ML
PhD Research — Duke University · Advisor: Prof. Ricardo Henao
- Cross-modal consistency for hallucination detection in VLMs (TMLR 2026) — reliable identification of low-confidence / “unknown” predictions.
- Multi-source data-free transfer learning (IEEE MLSP 2025 Oral) — efficient model recycling under white-box & black-box access.
- Sustainable continual learning (IEEE MLSP 2025) — parameter reuse against superlinear model growth.
2025 — Now◀ where it all leads
Agentic Memory
Memory Management System for AI Agents · Duke University · the convergence
- SAGE — a novelty gate for efficient memory evolution in agentic LLMs (ARR under review).
- Cost-efficient, low-latency memory database updates: when to write, summarize, compress, or forget.
- Beats Mem0 on 7/7 settings · 3.4× lower cost · 2.5× lower latency. Repo public & live.

Selected work

Projects & research

SAGE

Agent Memory · ARR — under review · code public

A novelty gate for efficient memory evolution in agentic LLMs. Frames memory evolution as novelty detection via density estimation, so the system writes/consolidates only what matters.

Beats Mem0 on 7/7 settings
3.4× lower cost · 2.5× lower latency
Balances memory freshness vs. compute overhead

Continual & Federated Learning

Continual Learning · Samsung · 3 patents

Communication-efficient federated learning, sustainable continual learning, and continual few-shot learning — the write-path backbone of agent memory.

3 patents filed
GAN Memory w/ No Forgetting (NeurIPS 2020)
Bounded memory growth & anti-forgetting

Pinterest Ads Retrieval

Retrieval / RecSys · Shipped to production

Graph-based advertiser-similarity retrieval (GraphSAGE + Faiss ANN) plus a multitask BERT broad-match model, integrated into Pinterest's Spinner workflow.

1% revenue uplift (A/B)
Measurable CTR improvement
End-to-end: indexing → serving → eval

VLM Reliability

Trust & Safety · TMLR 2026

A cross-modal consistency framework that detects hallucinations in vision-language models by comparing visual- and text-grounded reasoning paths.

Benchmarked GPT-4V, Qwen-VL, LLaMA-VL
Quantified epistemic uncertainty
Fallback-enabled closed-set classification

Toolkit

Skills & publications

Memory & Retrieval

Embedding retrieval
Retrieval & ranking
Faiss (IVF / HNSW)
Graph indexing
Conflict resolution
Summarization & fusion
Memory lifecycle (write/update/compress/forget)
RAG pipelines

LLM & VLM

Prompt engineering
In-context learning
Hallucination & conflict resolution
Quantization · LoRA · distillation
Self-supervised learning
VLMs (LLaMA, Qwen-VL, GPT-class)

ML Foundations

Representation learning
Generative models
Continual / federated learning
Domain adaptation
Interpretable ML
Large-scale recommendation

Systems & Infra

Production ML pipelines
Online inference
A/B testing
Distributed training (Slurm)
Docker · AWS · Spark
Hugging Face

Languages & Tools

Python
C++
SQL
Bash
PyTorch
HF Transformers
Git
Linux

Selected publications

SAGE: A Novelty Gate for Efficient Memory Evolution in Agentic LLMs
ARR (under review)
Fallback-Enabled Closed-Set Classification: Cross-Modal Consistency in Vision-Language Models
TMLR 2026
GAN Memory with No Forgetting
NeurIPS 2020
Model Recycling Framework for Multi-source Data-free Supervised Transfer Learning
IEEE MLSP 2025 (Oral)
Toward Sustainable Continual Learning: Detection and Knowledge Repurposing of Similar Tasks
IEEE MLSP 2025
A Holistic Approach to Interpretability in Financial Lending
Decision Support Systems 2022

Let's talk

Building agent memory or retrieval infra?

Available June 2026. Production systems experience plus research depth across retrieval, continual learning, and reliability. Reach out.

scarlett.95.wang@gmail.com Résumé