Research Scientist / Postdoc · Duke ECE PhD (May 2026)

Teaching learning systems what to keep, what to question, and when to say “I don't know.”

Research Scientist / Postdoc — Reliability · Continual Learning · Memory

Almost everything I've published is one question in different costumes: how should a learning system manage its own knowledge — what to keep, what to merge, what to discard, and when to admit it doesn't know? I call it epistemic self-modeling, and it runs from credit-risk interpretability to agent memory.

Read the journey Get in touch

Patents

Publications

Revenue uplift @ Pinterest

3.4×

Cheaper memory (SAGE)

The thread

One obsession, in different costumes.

I didn't set out with a grand thesis — I kept circling the same drain and only now can name it. The two pillars usually pitched as separate interests, agent memory and LLM reliability, are one problem seen from opposite ends.

A reliable agent has to know what it knows. A good memory is what it knows. SAGE governs what gets into the knowledge store; the reliability work governs what comes out as an answer. Both are acts of epistemic control.

One question, eight years

The research journey

2018FICO · Prof. Cynthia Rudin
Reasoning a model can answer for
Globally consistent explanations for credit decisions, with no accuracy loss for full interpretability. The seed wasn't “interpretability” as a topic — it was the instinct that a model should be accountable for why it believes what it believes.
NeurIPS 2018 Workshop · DSS 2022
2020–22GAN Memory · Sustainable Continual Learning
What to keep
Catastrophic forgetting is, underneath, a what-to-keep problem. Task-similarity detection asks: is this task genuinely novel, or similar enough to reuse what I already have? — answered with a lightweight test instead of retraining everything.
NeurIPS 2020 · IEEE MLSP 2025
2019–22Federated Learning · Samsung
Minimal sufficient knowledge
Learning under hard constraints: compress the global model, preserve accuracy, never see the client's data. The discipline of keeping only what is sufficient — and nothing more.
3 patents filed
2025Cross-Modal Consistency · TMLR
What to refuse
The clean inversion — not what to keep, but what to reject. VLMs hand out a confident in-set label even when the image belongs to no category they were given. The fix: accept an answer only when the visual and textual arms agree — a cheap, principled rule for knowing when you don't know.
TMLR 2026
2025–nowSAGE
The lineage snaps into focus
The field poured itself into the read path — retrieval, vector indexes, knowledge graphs — and left the write decision to an expensive LLM call per fact. I framed memory evolution as novelty detection: a closed-form von Mises–Fisher gate routes clearly-new facts to ADD, redundant ones to NOOP, and sends only the ambiguous cases to the LLM. It is the continual-learning task-similarity test, reincarnated for agent memory.
ACL ARR 2026 · 7/7 over Mem0 · ~3.4× cheaper

The throughline is a taste: faced with an expensive deliberation, I reach for a lightweight, principled decision rule — not a bigger model or more API calls. The same geometric, statistics-grounded instinct, eight years apart.

The program

What I'm building toward

One research program converging from two directions — toward memory that knows how sure it is.

Direction I

Memory as a control problem

extending SAGE

Forgetting & compression as decisions, not curves — what should a long-horizon agent let decay, and on what evidence?
Conflict resolution when stored memories contradict — which wins, and how to represent “this used to be true”?
Calibrated thresholds that track their own error rate, not just the store's geometry.

Direction II

Reliability for agents, not just classifiers

extending TMLR

Selective action — an agent that abstains, asks a clarifying question, or escalates instead of confabulating a tool call.
Cross-path consistency: accept an action only when independent reasoning routes agree.
Calibration of operations — knowing when a write or a retrieval is uncertain, not just a final answer.

The endpoint is uncertainty-aware memory: a store where every item carries a calibrated confidence, and novelty, trust, and abstention share one coherent epistemic state. A memory item that knows it might be wrong; a retrieval that propagates the doubt; an agent that abstains because its memory is unsure.

Evaluation as a thread

Evaluation is its own thread. Today's benchmarks (LoCoMo, LongMemEval) score end-task QA — they can't isolate write-decision quality or memory-conflict handling. Building evaluation that measures the thing directly is part of the program.

What grounds it

Two load-bearing papers and a coherent lineage — not a decade of agent-memory hype. And research that becomes a runnable, reproducible artifact, sometimes a shipped system with measured impact (Pinterest, ~1% revenue uplift).

The record

Selected publications

SAGE: A Novelty Gate for Efficient Memory Evolution in Agentic LLMs
ARR (under review)
Fallback-Enabled Closed-Set Classification: Cross-Modal Consistency in Vision-Language Models
TMLR 2026
GAN Memory with No Forgetting
NeurIPS 2020
Model Recycling Framework for Multi-source Data-free Supervised Transfer Learning
IEEE MLSP 2025 (Oral)
Toward Sustainable Continual Learning: Detection and Knowledge Repurposing of Similar Tasks
IEEE MLSP 2025
A Holistic Approach to Interpretability in Financial Lending
Decision Support Systems 2022

Let's talk

Hiring a research scientist or postdoc?

Available June 2026 (flexible through end of 2026). I'm looking for research roles where I can own problems end-to-end — formulation, evaluation design, and reproducible artifacts. Let's talk.

scarlett.95.wang@gmail.com Résumé

Teaching learning systems what to keep, what to question, and when to say “I don't know.”

One obsession, in different costumes.

The research journey

Reasoning a model can answer for

What to keep

Minimal sufficient knowledge

What to refuse

The lineage snaps into focus