ML Engineer — Recommendation · LLM/VLM Reliability · Inference
Duke ECE PhD and production ML engineer. I ship large-scale retrieval & ranking (GraphSAGE + Faiss, multitask BERT broad match — ~1% revenue uplift at Pinterest), build LLM/VLM reliability (cross-modal hallucination detection and selective prediction; TMLR 2026), and optimize inference & serving (quantization, LoRA, distillation, vLLM / SGLang) under real cost and latency budgets.
Recommendation & retrieval, LLM/VLM reliability, or agent memory — I bring production experience and research depth to all three.
Candidate generation, embedding retrieval, and ranking at production scale.
Hallucination detection, uncertainty, and trust & safety for foundation models.
Memory infrastructure for LLM agents across the write / retrieve / forget lifecycle.
From ranking ads to serving agents, one engineer owns three surfaces — what to retrieve, whether to trust the output, and how to serve it within budget.
GraphSAGE advertiser-similarity embeddings + Faiss ANN and a multitask BERT broad-match model, shipped into Pinterest's production Spinner auto-targeting — owned end-to-end from indexing to online serving and eval.
Cross-modal consistency for hallucination detection and selective prediction — knowing when a model should abstain or escalate rather than answer wrongly, benchmarked across GPT-4V, Qwen-VL, and LLaMA-VL.
Quantization, LoRA, and distillation with vLLM / SGLang serving; SAGE cuts agent-memory add-phase API cost ~3.4× and latency ~2.5× and skips ~16–18% of LLM calls under explicit budgets.
Research Fellow / Deep Learning Research Intern · Samsung Semiconductor · SOC R&D Lab
Research Intern — Ads Retrieval & Targeting · Pinterest Labs
PhD Research — Duke University · Advisor: Prof. Ricardo Henao
Memory Management System for AI Agents · Duke University · the convergence
Graph-based advertiser-similarity retrieval (GraphSAGE + Faiss ANN) plus a multitask BERT broad-match model, integrated into Pinterest's Spinner workflow.
A cross-modal consistency framework that detects hallucinations in vision-language models by comparing visual- and text-grounded reasoning paths.
A novelty gate for efficient memory evolution in agentic LLMs. Frames memory evolution as novelty detection via density estimation, so the system writes/consolidates only what matters.
Communication-efficient federated learning, sustainable continual learning, and continual few-shot learning — the write-path backbone of agent memory.
SAGE: A Novelty Gate for Efficient Memory Evolution in Agentic LLMs
ARR (under review)
Fallback-Enabled Closed-Set Classification: Cross-Modal Consistency in Vision-Language Models
TMLR 2026
GAN Memory with No Forgetting
NeurIPS 2020
Model Recycling Framework for Multi-source Data-free Supervised Transfer Learning
IEEE MLSP 2025 (Oral)
Toward Sustainable Continual Learning: Detection and Knowledge Repurposing of Similar Tasks
IEEE MLSP 2025
A Holistic Approach to Interpretability in Financial Lending
Decision Support Systems 2022
Available June 2026. Production retrieval/ranking experience plus LLM/VLM reliability and efficient inference. Reach out.