{"id":333366,"date":"2025-12-07T19:15:19","date_gmt":"2025-12-07T19:15:19","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/333366\/"},"modified":"2025-12-07T19:15:19","modified_gmt":"2025-12-07T19:15:19","slug":"architecting-efficient-context-aware-multi-agent-framework-for-production","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/333366\/","title":{"rendered":"Architecting efficient context-aware multi-agent framework for production"},"content":{"rendered":"<p data-block-key=\"rnr6y\">The landscape of AI agent development is shifting fast. We\u2019ve moved beyond prototyping single-turn chatbots. Today, organizations are deploying sophisticated, autonomous agents to handle long-horizon tasks: automating workflows, conducting deep research, and maintaining complex codebases.<\/p>\n<p data-block-key=\"1q47u\">That ambition immediately runs into a bottleneck: context.<\/p>\n<p data-block-key=\"ataek\">As agents run longer, the amount of information they need to track\u2014chat history, tool outputs, external documents, intermediate reasoning\u2014explodes. The prevailing \u201csolution\u201d has been to lean on ever-larger context windows in foundation models. But simply giving agents more space to paste text can not be the single scaling strategy.<\/p>\n<p data-block-key=\"5qg6b\">To build production-grade agents that are reliable, efficient, and debuggable, the industry is exploring a new discipline:<\/p>\n<p data-block-key=\"45itv\">Context engineering \u2014 treating context as a first-class system with its own architecture, lifecycle, and constraints.<\/p>\n<p data-block-key=\"855nn\">Based on our experience scaling complex single- or multi-agentic systems, we designed and evolved the context stack in <a href=\"https:\/\/github.com\/google\/adk-python\" rel=\"nofollow noopener\" target=\"_blank\">Google Agent Development Kit (ADK)<\/a> to support that discipline. ADK is an open-source, multi-agent-native framework built to make active context engineering achievable in real systems.<\/p>\n<p>The scaling bottleneck<\/p>\n<p data-block-key=\"4ch28\">A large context window will help context-related problems but won&#8217;t address all context-related problems. In practice, the naive pattern\u2014append everything into one giant prompt\u2014collapses under a three-way pressure:<\/p>\n<p>Cost and latency spirals: Model cost and time-to-first-token grow quickly with context size. &#8220;Shoveling&#8221; raw history and verbose tool payloads into the window makes agents prohibitively slow and expensive.Signal degradation (\u201clost in the middle\u201d): A context window flooded with irrelevant logs, stale tool outputs, or deprecated state can distract the model, causing it to fixate on past patterns rather than the immediate instruction. To ensure robust decision-making, we must maximize the density of relevant information.Physical limits: Real-world workloads\u2014involving full RAG results, intermediate artifacts, and long conversation traces\u2014eventually overflow even the largest fixed windows.<\/p>\n<p data-block-key=\"ba04n\">Throwing more tokens at the problem buys time, but it doesn\u2019t change the shape of the curve. To scale, we need to change how context is represented and managed, not just how much of it we can cram into a single call.<\/p>\n<p>The design thesis: context as a compiled view<\/p>\n<p data-block-key=\"3vcre\">In the previous generation of agent frameworks, context was treated like a mutable string buffer. ADK is built around a different thesis: Context is a compiled view over a richer stateful system.<\/p>\n<p data-block-key=\"7dd7n\">In that view:<\/p>\n<p>Sessions, memory, and artifacts (files) are the sources\u2013 the full, structured state of the interaction and its data.Flows and processors are the compiler pipeline \u2013 a sequence of passes that transform that state.The working context is the compiled view you ship to the LLM for this one invocation.<\/p>\n<p data-block-key=\"3os7n\">Once you adopt this mental model, context engineering stops being prompt gymnastics and starts looking like systems engineering. You are forced to ask standard systems questions: What is the intermediate representation? Where do we apply compaction? How do we make transformations observable?<\/p>\n<p data-block-key=\"8qnbe\">ADK\u2019s architecture answers these questions via three design principles:<\/p>\n<p>Separate storage from presentation: We distinguish between durable state (Sessions) and per-call views (working context). This allows you to evolve storage schemas and prompt formats independently.Explicit transformations: Context is built through named, ordered processors, not ad-hoc string concatenation. This makes the &#8220;compilation&#8221; step observable and testable.Scope by default: Every model call and sub-agent sees the minimum context required. Agents must reach for more information explicitly via tools, rather than being flooded by default.<\/p>\n<p data-block-key=\"4jdr3\">ADK\u2019s tiered structure, its relevance mechanisms, and its multi-agent handoff semantics\u2014is essentially an application of this &#8220;compiler&#8221; thesis and these three principles:<\/p>\n<p>Structure \u2013 a tiered model that separates how information is stored from what the model sees.Relevance \u2013 agentic and human controls that decide what matters now.Multi-agent context \u2013 explicit semantics for handing off the right slice of context between agents.<\/p>\n<p data-block-key=\"b55h9\">The next sections walk through each of these pillars in turn.<\/p>\n<p>1. Structure: The tiered model<\/p>\n<p data-block-key=\"8jcfi\">Most early agent systems implicitly assume a single window of context. ADK goes the other way. It separates storage from presentation and organizes context into distinct layers, each with a specific job:<\/p>\n<p>Working context \u2013 the immediate prompt for this model call: system instructions, agent identity, selected history, tool outputs, optional memory results, and references to artifacts.Session \u2013 the durable log of the interaction: every user message, agent reply, tool call, tool result, control signal, and error, captured as structured Event objects.Memory \u2013 long-lived, searchable knowledge that outlives a single session: user preferences, and past conversations.Artifacts \u2013 large binary or textual data associated with the session or user (files, logs, images), addressed by name and version rather than pasted into the prompt.1.1 Working context as a recomputed view<\/p>\n<p data-block-key=\"9rl2f\">For each invocation, ADK rebuilds the Working Context from the underlying state. It starts with instructions and identity, pulls in selected Session events, and optionally attaches memory results. This view is ephemeral (thrown away after the call), configurable (you can change formatting without migrating storage), and model-agnostic.<\/p>\n<p data-block-key=\"5boas\">This flexibility is the first win of the compiler view: you stop hard-coding &#8220;the prompt&#8221; and start treating it as a derived representation you can iterate on.<\/p>\n<p>1.2 Flows and processors: context processing as a pipeline<\/p>\n<p data-block-key=\"b87td\">Once you separate storage from presentation, you need machinery to &#8220;compile&#8221; one into the other. In ADK, every LLM-based agent is backed by an LLM Flow, which maintains ordered lists of processors.<\/p>\n<p data-block-key=\"c77qc\">A (simplified) SingleFlow might look like:<\/p>\n<p data-block-key=\"rnr6y\">These flows are ADK&#8217;s machinery to compile context. The order matters: each processor builds on the outputs of the previous steps. This gives you natural insertion points for custom filtering, compaction strategies, caching, and multi-agent routing. You are no longer rewriting giant &#8220;prompt templates&#8221;; you\u2019re just adding or reordering processors.<\/p>\n<p>1.3 Session and events: structured, language-agnostic history<\/p>\n<p data-block-key=\"3psje\">An ADK Session represents the definitive state of a conversation or workflow instance. Concretely, it acts as a container for session metadata (IDs, app names), a state scratchpad for structured variables, and\u2014most importantly\u2014a chronological list of Events.<\/p>\n<p data-block-key=\"det3d\">Instead of storing raw prompt strings, ADK captures every interaction\u2014user messages, agent replies, tool calls, results, control signals, and errors\u2014as strongly-typed Event records. This structural choice pays three distinct advantages:<\/p>\n<p>Model agnosticism: You can swap underlying models without rewriting the history, as the storage format is decoupled from the prompt format.Rich operations: Downstream components like compaction, time-travel debugging, and memory ingestion can operate over a rich event stream rather than parsing opaque text.Observability: It provides a natural surface for analytics, allowing you to inspect precise state transitions and actions.<\/p>\n<p data-block-key=\"27p9o\">The bridge between this session and the working context is the contents processor. It performs the heavy lifting of transforming the Session into the history portion of the working context by executing three critical steps:<\/p>\n<p>Selection: It filters the event stream to drop irrelevant events, partial events, and framework noise that shouldn&#8217;t reach the model.Transformation: It flattens the remaining events into Content objects with the correct roles (user\/assistant\/tool) and annotations for the specific model API being used.Injection: It writes the formatted history into llm_request.contents, ensuring downstream processors\u2014and the model itself\u2014receive a clean, coherent conversational trace.<\/p>\n<p data-block-key=\"9qh2s\">In this architecture, the Session is your ground truth; the working context is merely a computed projection that you can refine and optimize over time.<\/p>\n<p>1.4 Context compaction and filtering at the session layer<\/p>\n<p data-block-key=\"56ua8\">If you keep appending raw events indefinitely, latency and token usage will inevitably spiral out of control. ADK\u2019s Context Compaction feature attacks this problem at the Session layer.<\/p>\n<p data-block-key=\"d0hcc\">When a configurable threshold (such as the number of invocations) is reached, ADK triggers an asynchronous process. It uses an LLM to summarize older events over a sliding window\u2014defined by compaction intervals and overlapping size\u2014and writes the resulting summary back into the Session as a new event with a &#8220;compaction&#8221; action. Crucially, this allows the system to prune or de-prioritize the raw events that were summarized.<\/p>\n<p data-block-key=\"376qf\">Because compaction operates on the Event stream itself, the benefits cascade downstream:<\/p>\n<p>Scalability: Sessions remain physically manageable even for extremely long-running conversations.Clean views: The contents processor automatically works over a history that is already compacted, requiring no complex logic at query time.Decoupling: You can tune compaction prompts and strategies without touching a single line of agent code or template logic.<\/p>\n<p data-block-key=\"dib0p\">This creates a scalable lifecycle for long contexts. For strictly rule-based reduction, ADK offers a sibling operation\u2014Filtering\u2014where prebuilt plugins can globally drop or trim context based on deterministic rules before it ever reaches the model.<\/p>\n<p>1.5 Context caching<\/p>\n<p data-block-key=\"cmvos\">Modern models support context caching (prefix caching), which allows the inference engine to reuse attention computation across calls. ADK\u2019s separation of &#8220;Session&#8221; (storage) and &#8220;Working Context&#8221; (view) provides a natural substrate for this optimization.<\/p>\n<p data-block-key=\"dfeqm\">The architecture effectively divides the context window into two zones:<\/p>\n<p>Stable prefixes: System instructions, agent identity, and long-lived summaries.Variable suffixes: The latest user turn, new tool outputs, and small incremental updates.<\/p>\n<p data-block-key=\"bqhkq\">Because ADK flows and processors are explicit, you can treat cache-friendliness as a hard design constraint. You can order your pipeline to keep frequently reused segments stable at the front of the context window, while pushing highly dynamic content toward the end. To enforce this rigor, we introduced static instruction, a primitive that guarantees immutability for system prompts, ensuring that the cache prefix remains valid across invocations.<\/p>\n<p data-block-key=\"4apgu\">This is a prime example of context engineering acting as systems work across the full stack: you are not only deciding what the model sees, but optimizing how often the hardware has to re-compute the underlying tensor operations.<\/p>\n<p>2. Relevance: Agentic management of what matters now<\/p>\n<p data-block-key=\"ap84v\">Once the structure is established, the core challenge shifts to relevance: Given a tiered context architecture, what specific information belongs in the model\u2019s active window right now?<\/p>\n<p data-block-key=\"73aeh\">ADK answers this through a collaboration between human domain knowledge and agentic decision-making. Relying solely on hard-coded rules is cost-effective but rigid; relying solely on the agent to browse everything is flexible but prohibitively expensive and unstable.<\/p>\n<p data-block-key=\"4tfmt\">An optimal Working Context is a negotiation between the two. Human engineers define the architecture\u2014where data lives, how it is summarized, and what filters apply. The Agent then provides the intelligence, deciding dynamically when to &#8220;reach&#8221; for specific memory blocks or artifacts to satisfy the immediate user request.<\/p>\n<p>2.1 Artifacts: externalizing large state<\/p>\n<p data-block-key=\"eqnlm\">Early agent implementations often fall into the &#8220;context dumping&#8221; trap: placing large payloads\u2014a 5MB CSV, a massive JSON API response, or a full PDF transcript\u2014directly into the chat history. This creates a permanent tax on the session; every subsequent turn drags that payload along, burying critical instructions and inflating costs.<\/p>\n<p data-block-key=\"1d4fi\">ADK solves this by treating large data as Artifacts: named, versioned binary or text objects managed by an ArtifactService.<\/p>\n<p data-block-key=\"74old\">Conceptually, ADK applies a handle pattern to large data. Large data lives in the artifact store, not the prompt. By default, agents see only a lightweight reference (a name and summary) via the request processor. When\u2014and only when\u2014an agent requires the raw data to answer a question, it uses the LoadArtifactsTool. This action temporarily loads the content into the Working Context.<\/p>\n<p data-block-key=\"cbo9h\">Crucially, ADK supports ephemeral expansion. Once the model call or task is complete, the artifact is offloaded from the working context by default. This turns &#8220;5MB of noise in every prompt&#8221; into a precise, on-demand resource. The data can be huge, but the context window remains lean.<\/p>\n<p>2.2 Memory: long-term knowledge, retrieved on demand<\/p>\n<p data-block-key=\"cp1f0\">Where Artifacts handle discrete, large objects, ADK&#8217;s Memory layer manages long-lived, semantic knowledge that extends beyond a single session\u2014user preferences, past decisions, and domain facts.<\/p>\n<p data-block-key=\"88p86\">We designed the MemoryService around two principles: memory must be searchable (not permanently pinned), and retrieval should be agent-directed.<\/p>\n<p data-block-key=\"fvjl5\">The MemoryService ingests data\u2014often from finished Sessions\u2014into a vector or keyword corpus. Agents then access this knowledge via two distinct patterns:<\/p>\n<p>Reactive recall: The agent recognizes a knowledge gap (&#8220;What is the user&#8217;s dietary restriction?&#8221;) and explicitly calls the load_memory_tool to search the corpus.Proactive recall: The system uses a pre-processor to run a similarity search based on the latest user input, injecting likely relevant snippets via the preload_memory_tool before the model is even invoked.<\/p>\n<p data-block-key=\"3835r\">This approach replaces the &#8220;context stuffing&#8221; anti-pattern with a &#8220;memory-based&#8221; workflow. Agents recall exactly the snippets they need for the current step, rather than carrying the weight of every conversation they have ever had.<\/p>\n<p>3. Multi-agent context: who sees what, when<\/p>\n<p data-block-key=\"2u7ja\">Single-agent systems struggle with context bloat; multi-agent systems amplify it. If a root agent passes its full history to a sub-agent, and that sub-agent does the same, you trigger a context explosion. The token count skyrockets, and sub-agents get confused by irrelevant conversational history.<\/p>\n<p data-block-key=\"ebahj\">Whenever an agent invokes another agent, ADK lets you explicitly scope what the callee sees\u2014maybe just the latest user query and one artifact\u2014while suppressing most of the ancestral history.<\/p>\n<p>3.1 Two multi-agent interaction patterns<\/p>\n<p data-block-key=\"df3gg\">At a high level, ADK maps multi-agent interactions into two distinct architectural patterns.<\/p>\n<p data-block-key=\"aag1q\">The first is Agents as Tools. Here, the root agent treats a specialized agent strictly as a function: call it with a focused prompt, get a result, and move on. The callee sees only the specific instructions and necessary artifacts\u2014no history.<\/p>\n<p data-block-key=\"bcv7n\">The second is Agent Transfer (Hierarchy). Here, control is fully handed off to a sub-agent to continue the conversation. The sub-agent inherits a view over the Session and can drive the workflow, calling its own tools or transferring control further down the chain.<\/p>\n<p>3.2 Scoped handoffs for agent transfer<\/p>\n<p data-block-key=\"mqs6\">Handoff behavior is controlled by knobs like include_contents on the callee, which determine how much context flows from the root agent to a sub-agent. In the default mode, ADK passes the full contents of the caller\u2019s working context\u2014useful when the sub-agent genuinely benefits from the entire history. In none mode, the sub-agent sees no prior history; it only receives the new prompt you construct for it (for example, the latest user turn plus a couple of tool calls and responses). Specialized agents get the minimal context they need, rather than inheriting a giant transcript by default.<\/p>\n<p data-block-key=\"1mc9t\">Because a sub-agent\u2019s context is also built via processors, these handoff rules plug into the same flow pipeline as single-agent calls. You don\u2019t need a separate multi-agent machinery layer; you\u2019re just changing how much upstream state the existing context compiler is allowed to see.<\/p>\n<p>3.3 Translating conversations for agent transfer<\/p>\n<p data-block-key=\"9g8gn\">Foundation models operate on a fixed role schema: system, user, and assistant. They do not natively understand &#8220;Assistant A&#8221; vs. &#8220;Assistant B.&#8221;<\/p>\n<p data-block-key=\"8q23n\">When ADK transfers control, it must often reframe the existing conversation so the new agent sees a coherent working context. If the new agent simply sees a stream of &#8220;Assistant&#8221; messages from the previous agent, it will hallucinate that it performed those actions.<\/p>\n<p data-block-key=\"adafn\">To prevent this, ADK performs an active translation during handoff:<\/p>\n<p>Narrative casting: Prior &#8220;Assistant&#8221; messages may be re-cast as narrative context (e.g., modifying the role or injecting a tag like [For context]: Agent B said&#8230;) rather than appearing as the new agent\u2019s own outputs.Action attribution: Tool calls from other agents are marked or summarized so the new agent acts on the results without confusing the execution with its own capabilities.<\/p>\n<p data-block-key=\"f8k74\">Effectively, ADK builds a fresh Working Context from the sub-agent\u2019s point of view, while preserving the factual history in the Session. This ensures correctness, allowing each agent to assume the &#8220;Assistant&#8221; role without misattributing the broader system&#8217;s history to itself.<\/p>\n<p>Conclusion<\/p>\n<p data-block-key=\"79jgi\">As we push agents to tackle longer horizons, &#8220;context management&#8221; can no longer mean &#8220;string manipulation.&#8221; It must be treated as an architectural concern alongside storage and compute.<\/p>\n<p data-block-key=\"c4qf6\">ADK\u2019s context architecture\u2014tiered storage, compiled views, pipeline processing, and strict scoping\u2014is our answer to this challenge. It encapsulates the rigorous systems engineering required to move agents from interesting prototypes to scalable, reliable production systems.<\/p>\n","protected":false},"excerpt":{"rendered":"The landscape of AI agent development is shifting fast. We\u2019ve moved beyond prototyping single-turn chatbots. Today, organizations are&hellip;\n","protected":false},"author":2,"featured_media":333367,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-333366","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/333366","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=333366"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/333366\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/333367"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=333366"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=333366"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=333366"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}