● ML Engineer · RecSys · Reliability · Inference · Duke ECE PhD (2026)

Recommendation, reliability & inference — shipped.

ML Engineer — Recommendation · LLM/VLM Reliability · Inference

Duke ECE PhD and production ML engineer. I ship large-scale retrieval & ranking (GraphSAGE + Faiss, multitask BERT broad match — ~1% revenue uplift at Pinterest), build LLM/VLM reliability (cross-modal hallucination detection and selective prediction; TMLR 2026), and optimize inference & serving (quantization, LoRA, distillation, vLLM / SGLang) under real cost and latency budgets.

3
Patents
7+
Publications
1%
Revenue uplift @ Pinterest
3.4×
Cheaper memory (SAGE)
Open to

Where I can plug in

Recommendation & retrieval, LLM/VLM reliability, or agent memory — I bring production experience and research depth to all three.

01

Recommendation & Retrieval

Candidate generation, embedding retrieval, and ranking at production scale.

Pinterest GraphSAGE + Faiss · 1% revenue uplift · BERT broad-match CTR
02

LLM / VLM Reliability

Hallucination detection, uncertainty, and trust & safety for foundation models.

TMLR 2026 cross-modal consistency · GPT-4V, Qwen-VL, LLaMA-VL benchmarks
03

Agent Memory

Memory infrastructure for LLM agents across the write / retrieve / forget lifecycle.

SAGE — beats Mem0 7/7 · 3.4× cheaper · code public
The thesis

Recommendation, reliability, inference.

From ranking ads to serving agents, one engineer owns three surfaces — what to retrieve, whether to trust the output, and how to serve it within budget.

Recommendation

Retrieval & ranking at scale

GraphSAGE advertiser-similarity embeddings + Faiss ANN and a multitask BERT broad-match model, shipped into Pinterest's production Spinner auto-targeting — owned end-to-end from indexing to online serving and eval.

Pinterest · ~1% revenue uplift (A/B)
Reliability

Trustworthy LLM/VLM outputs

Cross-modal consistency for hallucination detection and selective prediction — knowing when a model should abstain or escalate rather than answer wrongly, benchmarked across GPT-4V, Qwen-VL, and LLaMA-VL.

TMLR 2026 · epistemic uncertainty
Inference

Efficient serving

Quantization, LoRA, and distillation with vLLM / SGLang serving; SAGE cuts agent-memory add-phase API cost ~3.4× and latency ~2.5× and skips ~16–18% of LLM calls under explicit budgets.

SAGE · 3.4× cost · 2.5× latency
Retrieve the right candidates, trust the output, serve it within budget — recommendation, reliability, and inference are one ML-engineering loop, and SAGE runs all three.
How I got here

The roadmap to agent memory

  1. 2019 — 2022→ the write path

    Continual Learning

    Research Fellow / Deep Learning Research Intern · Samsung Semiconductor · SOC R&D Lab

    • Led continual & federated learning research — 3 patents, 2 publications.
    • Communication-efficient federated learning via global-model quantization; server-side refinement without client data access.
    • Sustainable continual learning: task-similarity detection + encoder reuse — the same problem class as bounded memory growth & forgetting in agent memory.
    • GAN Memory with No Forgetting (NeurIPS 2020) — parameter-efficient generative replay.
  2. 2022 & 2023→ the read path

    Recommendation Systems

    Research Intern — Ads Retrieval & Targeting · Pinterest Labs

    • 2023: Shipped a graph-based advertiser-similarity retrieval pipeline (GraphSAGE embeddings + Faiss ANN) into Pinterest's auto-targeting; 1% revenue uplift in A/B testing.
    • 2022: Multitask BERT model for broad match — improved ad-query relevance with measurable CTR gains.
    • Owned large-scale retrieval features end-to-end: ingestion, embedding indexing, candidate scoring, online serving, eval.
  3. 2020 — Present→ the serving path

    Efficient & Reliable ML

    PhD Research — Duke University · Advisor: Prof. Ricardo Henao

    • Cross-modal consistency for hallucination detection in VLMs (TMLR 2026) — reliable identification of low-confidence / “unknown” predictions.
    • Multi-source data-free transfer learning (IEEE MLSP 2025 Oral) — efficient model recycling under white-box & black-box access.
    • Sustainable continual learning (IEEE MLSP 2025) — parameter reuse against superlinear model growth.
  4. 2025 — Now◀ where it all leads

    Agentic Memory

    Memory Management System for AI Agents · Duke University · the convergence

    • SAGE — a novelty gate for efficient memory evolution in agentic LLMs (ARR under review).
    • Cost-efficient, low-latency memory database updates: when to write, summarize, compress, or forget.
    • Beats Mem0 on 7/7 settings · 3.4× lower cost · 2.5× lower latency. Repo public & live.
Selected work

Projects & research

Pinterest Ads Retrieval

Retrieval / RecSys · Shipped to production

Graph-based advertiser-similarity retrieval (GraphSAGE + Faiss ANN) plus a multitask BERT broad-match model, integrated into Pinterest's Spinner workflow.

  • 1% revenue uplift (A/B)
  • Measurable CTR improvement
  • End-to-end: indexing → serving → eval

VLM Reliability

Trust & Safety · TMLR 2026

A cross-modal consistency framework that detects hallucinations in vision-language models by comparing visual- and text-grounded reasoning paths.

  • Benchmarked GPT-4V, Qwen-VL, LLaMA-VL
  • Quantified epistemic uncertainty
  • Fallback-enabled closed-set classification

SAGE

Agent Memory · ARR — under review · code public

A novelty gate for efficient memory evolution in agentic LLMs. Frames memory evolution as novelty detection via density estimation, so the system writes/consolidates only what matters.

  • Beats Mem0 on 7/7 settings
  • 3.4× lower cost · 2.5× lower latency
  • Balances memory freshness vs. compute overhead

Continual & Federated Learning

Continual Learning · Samsung · 3 patents

Communication-efficient federated learning, sustainable continual learning, and continual few-shot learning — the write-path backbone of agent memory.

  • 3 patents filed
  • GAN Memory w/ No Forgetting (NeurIPS 2020)
  • Bounded memory growth & anti-forgetting
Toolkit

Skills & publications

Memory & Retrieval

  • Embedding retrieval
  • Retrieval & ranking
  • Faiss (IVF / HNSW)
  • Graph indexing
  • Conflict resolution
  • Summarization & fusion
  • Memory lifecycle (write/update/compress/forget)
  • RAG pipelines

LLM & VLM

  • Prompt engineering
  • In-context learning
  • Hallucination & conflict resolution
  • Quantization · LoRA · distillation
  • Self-supervised learning
  • VLMs (LLaMA, Qwen-VL, GPT-class)

ML Foundations

  • Representation learning
  • Generative models
  • Continual / federated learning
  • Domain adaptation
  • Interpretable ML
  • Large-scale recommendation

Systems & Infra

  • Production ML pipelines
  • Online inference
  • A/B testing
  • Distributed training (Slurm)
  • Docker · AWS · Spark
  • Hugging Face

Languages & Tools

  • Python
  • C++
  • SQL
  • Bash
  • PyTorch
  • HF Transformers
  • Git
  • Linux

Selected publications

  1. SAGE: A Novelty Gate for Efficient Memory Evolution in Agentic LLMs

    ARR (under review)

  2. Fallback-Enabled Closed-Set Classification: Cross-Modal Consistency in Vision-Language Models

    TMLR 2026

  3. GAN Memory with No Forgetting

    NeurIPS 2020

  4. Model Recycling Framework for Multi-source Data-free Supervised Transfer Learning

    IEEE MLSP 2025 (Oral)

  5. Toward Sustainable Continual Learning: Detection and Knowledge Repurposing of Similar Tasks

    IEEE MLSP 2025

  6. A Holistic Approach to Interpretability in Financial Lending

    Decision Support Systems 2022

Let's talk

Hiring an ML engineer for RecSys, reliability, or inference?

Available June 2026. Production retrieval/ranking experience plus LLM/VLM reliability and efficient inference. Reach out.