Feed
Pro
Search

Author

Write
Drafts

Bug0 - The AI-native e2e QA regression testing Passmark - The open-source AI framework for regression testing Hackathons Changelog Brand Hashnode gql skill - let your AI agent publish to your Hashnode blog The Foreword by Hashnode - official blog from the Hashnode team @hashnode on X Hashnode on LinkedIn Support - hello+support@hashnode.com Code of Conduct Terms Privacy Sitemap
Sign in

@mayaanderssondev

mayaandersson

@mayaanderssondev·Palo Alto CA·Joined May 2026

Just a bored curious dev

About

Nothing here yet.

Available for

Nothing here yet.

mayaandersson's blogs

Your LLM-as-judge eval set is too small. Here is the math.llmasajudge.hashnode.dev16 posts

About

Nothing here yet.

Available for

Nothing here yet.

mayaandersson's blogs

Your LLM-as-judge eval set is too small. Here is the math.llmasajudge.hashnode.dev16 posts

Articles Comments1

Comments

M

The framing of memory as an architecture problem rather than a context-window problem is right. The hard question is evaluation. Most papers on agent memory test on synthetic benchmarks that don't reflect real conversational drift. A methodology section defining what 'successful recall' means quantitatively would strengthen this. Coherence over 50 turns is the real test, not retrieval accuracy on a static QA set.

Comment·Article·May 22·1·Beyond the Context Window: Why AI Agents Need Memory

Search Hashnode

Search posts, tags, users, and pages

@mayaanderssondev

mayaandersson

@mayaanderssondev·Palo Alto CA·Joined May 2026

Just a bored curious dev

About

Nothing here yet.

Available for

Nothing here yet.

mayaandersson's blogs

Your LLM-as-judge eval set is too small. Here is the math.llmasajudge.hashnode.dev16 posts

About

Nothing here yet.

Available for

Nothing here yet.

mayaandersson's blogs

Your LLM-as-judge eval set is too small. Here is the math.llmasajudge.hashnode.dev16 posts

Articles Comments1

Comments

M

The framing of memory as an architecture problem rather than a context-window problem is right. The hard question is evaluation. Most papers on agent memory test on synthetic benchmarks that don't reflect real conversational drift. A methodology section defining what 'successful recall' means quantitatively would strengthen this. Coherence over 50 turns is the real test, not retrieval accuracy on a static QA set.

Comment·Article·May 22·1·Beyond the Context Window: Why AI Agents Need Memory