Agent Memory And Retrieval - Agent Systems Handbook by Prompthon

Need help?

Ask in Discord Open a GitHub issue Support guide

Summary

Agent memory and retrieval are the persistence layer around a model call. They decide what survives beyond the current turn, what can be pulled back later, and what should stay outside the prompt until it is needed.

Why It Matters

Without memory, agents reset to zero every time they act. Without retrieval, they are limited to whatever happened to fit inside the current prompt window. That makes long-running work brittle, personalization shallow, and factual grounding weak. A good design separates three jobs:

Keep the current task coherent.
Preserve durable signals worth reusing later.
Fetch external knowledge only when it improves the answer or action.

Mental Model

Treat memory and retrieval as different but related systems.

working memory holds the active state of the current task or session.
episodic memory stores bounded events, outcomes, and experiences.
semantic memory stores distilled facts, rules, and relationships that stay useful across tasks.
retrieval reaches into external knowledge sources that do not belong inside the agent itself.

The practical distinction is not just short-term versus long-term. It is also about ownership.

Session memory belongs to the running task.
Durable memory belongs to the agent’s operating history.
Retrieval indexes belong to external documents, databases, or data products.
Notes and artifacts belong to explicit work products such as TODO lists, summaries, reports, or decision logs.

That last category matters because many systems fail by forcing everything into “memory”. A research note, a task checklist, or a generated report is usually better treated as an artifact than as an invisible memory entry.

Architecture Diagram

The design goal is not to stuff more information into the model. It is to stage the right information at the right time.

Tool Landscape

Common patterns show up across most agent systems:

Working memory often uses lightweight in-process state with capacity and time limits.
Episodic memory usually combines structured metadata with similarity search so events remain searchable by both meaning and recency.
Semantic memory often benefits from richer normalization because durable facts become more valuable when duplicates are merged and relationships are explicit.
Retrieval systems usually start with chunking, indexing, and relevance scoring, then add reranking, expansion, or query rewriting only when the base pipeline is not enough.
Notes and artifacts are best stored in human-readable forms when humans may need to inspect, edit, or approve them later.

In practice, hybrid retrieval is common because no single method handles every case well. Keyword matching helps with exact entities and literals. Dense retrieval helps with semantic similarity. Structured filters help with time range, user scope, or document type. The right system usually combines them.

Retrieval Boundaries

Current hosted file-search tools make the memory-versus-retrieval split easier to explain. A managed file-search store is not the agent’s memory. It is an external corpus with its own indexing rules, filters, and citation surface. Useful distinctions:

Use memory for continuity across tasks, decisions, and prior actions.
Use retrieval for external evidence that should stay outside the prompt until a question or task asks for it.
Use metadata filters to narrow scope before semantic matching, not as a replacement for relevance ranking.
Use citations as part of the output artifact when users need to verify what document or page grounded the answer.

The May 2026 Gemini File Search update is a concrete example of the direction. Google now exposes multimodal file search with gemini-embedding-2, custom metadata filtering, and page-level citations. The docs also make the lifecycle boundary explicit: raw uploaded files expire on a shorter path, while imported file-search store data persists until deletion. That is useful for verifiable RAG, but it is still retrieval, not memory. Managed file-search products also impose architecture choices. Google’s current docs note that File Search cannot yet be combined with tools like Google Search or URL Context in the same call, and that audio and video are not currently supported. Teams still need orchestration above the retrieval layer to decide which tool, corpus, and evidence path fits the task.

Tradeoffs

Session memory is fast and cheap, but it disappears on restart and should not be trusted as the durable record.
Durable memory improves continuity, but it introduces write quality problems: bad memories are expensive because they keep returning.
Retrieval gives freshness and breadth, but it also creates latency, ranking errors, and citation risk.
Managed file-search tools reduce infrastructure work, but they constrain corpus lifecycle, tool composition, and storage shape.
Notes and artifacts improve traceability, but they require governance so the agent does not create an unbounded pile of stale documents.

Three design choices matter repeatedly:

promotion: not every working-memory item should become durable.
forgetting: low-value or stale memory must decay, expire, or be archived.
boundary: not every knowledge problem is a RAG problem.

RAG is not enough when the agent needs durable state, action history, or explicit task checkpoints. Retrieval can answer “what do the documents say?”, but it does not replace task memory, decision logs, or artifact management.

Citations

Official source: Gemini API File Search is now multimodal
Official source: Gemini API File Search documentation
Official source: OpenAI Responses API tools and remote MCP support
High-signal repository: google-gemini/cookbook

Reading Extensions

Update Log

2026-05-06: Added a retrieval-boundary refresh covering multimodal file search, metadata filters, citation-bearing RAG, and the difference between managed retrieval stores and agent memory.
2026-04-21: Initial repo-native draft based on imported reference material and lab rewrite rules.

Documentation Index

​Summary

​Why It Matters

​Mental Model

​Architecture Diagram

​Tool Landscape

​Retrieval Boundaries

​Tradeoffs

​Citations

​Reading Extensions

​Update Log