M
The framing of memory as an architecture problem rather than a context-window problem is right. The hard question is evaluation. Most papers on agent memory test on synthetic benchmarks that don't reflect real conversational drift. A methodology section defining what 'successful recall' means quantitatively would strengthen this. Coherence over 50 turns is the real test, not retrieval accuracy on a static QA set.