Documentation Index
Fetch the complete documentation index at: https://labs.prompthon.io/llms.txt
Use this file to discover all available pages before exploring further.
Summary
Agent memory and retrieval are the persistence layer around a model call. They decide what survives beyond the current turn, what can be pulled back later, and what should stay outside the prompt until it is needed.Why It Matters
Without memory, agents reset to zero every time they act. Without retrieval, they are limited to whatever happened to fit inside the current prompt window. That makes long-running work brittle, personalization shallow, and factual grounding weak. A good design separates three jobs:- Keep the current task coherent.
- Preserve durable signals worth reusing later.
- Fetch external knowledge only when it improves the answer or action.
Mental Model
Treat memory and retrieval as different but related systems.working memoryholds the active state of the current task or session.episodic memorystores bounded events, outcomes, and experiences.semantic memorystores distilled facts, rules, and relationships that stay useful across tasks.retrievalreaches into external knowledge sources that do not belong inside the agent itself.
- Session memory belongs to the running task.
- Durable memory belongs to the agent’s operating history.
- Retrieval indexes belong to external documents, databases, or data products.
- Notes and artifacts belong to explicit work products such as TODO lists, summaries, reports, or decision logs.
Architecture Diagram
The design goal is not to stuff more information into the model. It is to stage the right information at the right time.Tool Landscape
Common patterns show up across most agent systems:- Working memory often uses lightweight in-process state with capacity and time limits.
- Episodic memory usually combines structured metadata with similarity search so events remain searchable by both meaning and recency.
- Semantic memory often benefits from richer normalization because durable facts become more valuable when duplicates are merged and relationships are explicit.
- Retrieval systems usually start with chunking, indexing, and relevance scoring, then add reranking, expansion, or query rewriting only when the base pipeline is not enough.
- Notes and artifacts are best stored in human-readable forms when humans may need to inspect, edit, or approve them later.
Retrieval Boundaries
Current hosted file-search tools make the memory-versus-retrieval split easier to explain. A managed file-search store is not the agent’s memory. It is an external corpus with its own indexing rules, filters, and citation surface. Useful distinctions:- Use
memoryfor continuity across tasks, decisions, and prior actions. - Use
retrievalfor external evidence that should stay outside the prompt until a question or task asks for it. - Use
metadata filtersto narrow scope before semantic matching, not as a replacement for relevance ranking. - Use
citationsas part of the output artifact when users need to verify what document or page grounded the answer.
gemini-embedding-2, custom
metadata filtering, and page-level citations. The docs also make the lifecycle
boundary explicit: raw uploaded files expire on a shorter path, while imported
file-search store data persists until deletion. That is useful for verifiable
RAG, but it is still retrieval, not memory.
Managed file-search products also impose architecture choices. Google’s current
docs note that File Search cannot yet be combined with tools like Google Search
or URL Context in the same call, and that audio and video are not currently
supported. Teams still need orchestration above the retrieval layer to decide
which tool, corpus, and evidence path fits the task.
Tradeoffs
- Session memory is fast and cheap, but it disappears on restart and should not be trusted as the durable record.
- Durable memory improves continuity, but it introduces write quality problems: bad memories are expensive because they keep returning.
- Retrieval gives freshness and breadth, but it also creates latency, ranking errors, and citation risk.
- Managed file-search tools reduce infrastructure work, but they constrain corpus lifecycle, tool composition, and storage shape.
- Notes and artifacts improve traceability, but they require governance so the agent does not create an unbounded pile of stale documents.
promotion: not every working-memory item should become durable.forgetting: low-value or stale memory must decay, expire, or be archived.boundary: not every knowledge problem is a RAG problem.
Citations
- Official source: Gemini API File Search is now multimodal
- Official source: Gemini API File Search documentation
- Official source: OpenAI Responses API tools and remote MCP support
- High-signal repository: google-gemini/cookbook
Reading Extensions
- Agent Memory Retrieval Starter
- Context Engineering
- Customer Support Agents
- Deep Research Agents
- Patterns Overview
Update Log
- 2026-05-06: Added a retrieval-boundary refresh covering multimodal file search, metadata filters, citation-bearing RAG, and the difference between managed retrieval stores and agent memory.
- 2026-04-21: Initial repo-native draft based on imported reference material and lab rewrite rules.
