{"id":221004,"date":"2025-10-13T07:46:19","date_gmt":"2025-10-13T07:46:19","guid":{"rendered":"https:\/\/www.newsbeep.com\/us\/221004\/"},"modified":"2025-10-13T07:46:19","modified_gmt":"2025-10-13T07:46:19","slug":"agentic-context-engineering-ace-self-improving-llms-via-evolving-contexts-not-fine-tuning","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/us\/221004\/","title":{"rendered":"Agentic Context Engineering (ACE): Self-Improving LLMs via Evolving Contexts, Not Fine-Tuning"},"content":{"rendered":"<p>TL;DR: A team of researchers from Stanford University, SambaNova Systems and UC Berkeley introduce <a href=\"https:\/\/arxiv.org\/abs\/2510.04618\" rel=\"nofollow noopener\" target=\"_blank\">ACE framework<\/a> that improves LLM performance by editing and growing the input context instead of updating model weights. Context is treated as a living \u201cplaybook\u201d maintained by three roles\u2014Generator, Reflector, Curator\u2014with small delta items merged incrementally to avoid brevity bias and context collapse. Reported gains: +10.6% on AppWorld agent tasks, +8.6% on finance reasoning, and ~86.9% average latency reduction vs strong context-adaptation baselines. On the AppWorld leaderboard snapshot (Sept 20, 2025), ReAct+ACE (59.4%) \u2248 IBM CUGA (60.3%, GPT-4.1) while using DeepSeek-V3.1.<\/p>\n<p><img decoding=\"async\" data-attachment-id=\"75241\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/10\/10\/agentic-context-engineering-ace-self-improving-llms-via-evolving-contexts-not-fine-tuning\/screenshot-2025-10-10-at-4-38-44-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-10-at-4.38.44-AM-1.png\" data-orig-size=\"1974,856\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Screenshot 2025-10-10 at 4.38.44\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-10-at-4.38.44-AM-1-300x130.png\" data-large-file=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-10-at-4.38.44-AM-1-1024x444.png\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-10-at-4.38.44-AM-1-1024x444.png\" alt=\"\" class=\"wp-image-75241 lazyload\" style=\"width:940px;height:auto\"\/>https:\/\/arxiv.org\/pdf\/2510.04618<\/p>\n<p>What ACE changes?<\/p>\n<p>ACE positions \u201ccontext engineering\u201d as a first-class alternative to parameter updates. Instead of compressing instructions into short prompts, ACE accumulates and organizes domain-specific tactics over time, arguing that higher context density improves agentic tasks where tools, multi-turn state, and failure modes matter.<\/p>\n<p>Method: Generator \u2192 Reflector \u2192 Curator<\/p>\n<p>Generator executes tasks and produces trajectories (reasoning\/tool calls), exposing helpful vs harmful moves.<\/p>\n<p>Reflector distills concrete lessons from those traces.<\/p>\n<p>Curator converts lessons into typed delta items (with helpful\/harmful counters) and merges them deterministically, with de-duplication and pruning to keep the playbook targeted.<\/p>\n<p>Two design choices\u2014incremental delta updates and grow-and-refine\u2014preserve useful history and prevent \u201ccontext collapse\u201d from monolithic rewrites. To isolate context effects, the research team fixes the same base LLM (non-thinking DeepSeek-V3.1) across all three roles. <\/p>\n<p>Benchmarks<\/p>\n<p>AppWorld (agents): Built on the official ReAct baseline, ReAct+ACE outperforms strong baselines (ICL, GEPA, Dynamic Cheatsheet), with +10.6% average over selected baselines and ~+7.6% over Dynamic Cheatsheet in online adaptation. On the Sept 20, 2025 leaderboard, ReAct+ACE 59.4% vs IBM CUGA 60.3% (GPT-4.1); ACE surpasses CUGA on the harder test-challenge split, while using a smaller open-source base model.<\/p>\n<p>Finance (XBRL): On FiNER token tagging and XBRL Formula numerical reasoning, ACE reports +8.6% average over baselines with ground-truth labels for offline adaptation; it also works with execution-only feedback, though quality of signals matters. <\/p>\n<p><img decoding=\"async\" data-attachment-id=\"75243\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/10\/10\/agentic-context-engineering-ace-self-improving-llms-via-evolving-contexts-not-fine-tuning\/screenshot-2025-10-10-at-4-39-48-am-2\/\" data-orig-file=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-10-at-4.39.48-AM-1.png\" data-orig-size=\"1670,870\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Screenshot 2025-10-10 at 4.39.48\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-10-at-4.39.48-AM-1-300x156.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-10-at-4.39.48-AM-1-1024x533.png\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-10-at-4.39.48-AM-1.png\" alt=\"\" class=\"wp-image-75243 lazyload\" style=\"width:780px;height:auto\"\/>https:\/\/arxiv.org\/pdf\/2510.04618<\/p>\n<p><img decoding=\"async\" data-attachment-id=\"75245\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/10\/10\/agentic-context-engineering-ace-self-improving-llms-via-evolving-contexts-not-fine-tuning\/screenshot-2025-10-10-at-4-40-07-am-2\/\" data-orig-file=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-10-at-4.40.07-AM-1.png\" data-orig-size=\"1660,990\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Screenshot 2025-10-10 at 4.40.07\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-10-at-4.40.07-AM-1-300x179.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-10-at-4.40.07-AM-1-1024x611.png\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-10-at-4.40.07-AM-1.png\" alt=\"\" class=\"wp-image-75245 lazyload\" style=\"width:786px;height:auto\"\/>https:\/\/arxiv.org\/pdf\/2510.04618<\/p>\n<p>Cost and latency<\/p>\n<p>ACE\u2019s non-LLM merges plus localized updates reduce adaptation overhead substantially:<\/p>\n<p>Offline (AppWorld): \u221282.3% latency and \u221275.1% rollouts vs GEPA.<\/p>\n<p>Online (FiNER): \u221291.5% latency and \u221283.6% token cost vs Dynamic Cheatsheet.<\/p>\n<p><img decoding=\"async\" data-attachment-id=\"75247\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/10\/10\/agentic-context-engineering-ace-self-improving-llms-via-evolving-contexts-not-fine-tuning\/screenshot-2025-10-10-at-4-40-26-am\/\" data-orig-file=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-10-at-4.40.26-AM.png\" data-orig-size=\"1686,1110\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Screenshot 2025-10-10 at 4.40.26\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-10-at-4.40.26-AM-300x198.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-10-at-4.40.26-AM-1024x674.png\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-10-at-4.40.26-AM.png\" alt=\"\" class=\"wp-image-75247 lazyload\" style=\"width:718px;height:auto\"\/>https:\/\/arxiv.org\/pdf\/2510.04618<\/p>\n<p>Key Takeaways<\/p>\n<p>ACE = context-first adaptation: Improves LLMs by incrementally editing an evolving \u201cplaybook\u201d (delta items) curated by Generator\u2192Reflector\u2192Curator, using the same base LLM (non-thinking DeepSeek-V3.1) to isolate context effects and avoid collapse from monolithic rewrites. <\/p>\n<p>Measured gains: ReAct+ACE reports +10.6% over strong baselines on AppWorld and achieves 59.4% vs IBM CUGA 60.3% (GPT-4.1) on the Sept 20, 2025 leaderboard snapshot; finance benchmarks (FiNER + XBRL Formula) show +8.6% average over baselines. <\/p>\n<p>Lower overhead than reflective-rewrite baselines: ACE reduces adaptation latency by ~82\u201392% and rollouts\/token cost by ~75\u201384%, contrasting with Dynamic Cheatsheet\u2019s persistent memory and GEPA\u2019s Pareto prompt evolution approaches. <\/p>\n<p>Conclusion<\/p>\n<p>ACE positions context engineering as a first-class alternative to weight updates: maintain a persistent, curated playbook that accumulates task-specific tactics, yielding measurable gains on AppWorld and finance reasoning while cutting adaptation latency and token rollouts versus reflective-rewrite baselines. The approach is practical\u2014deterministic merges, delta items, and long-context\u2013aware serving\u2014and its limits are clear: outcomes track feedback quality and task complexity. If adopted, agent stacks may \u201cself-tune\u201d primarily through evolving context rather than new checkpoints.<\/p>\n<p>Check out the\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2510.04618\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">PAPER here<\/a>. Feel free to check out our\u00a0<a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">GitHub Page for Tutorials, Codes and Notebooks<\/a>.\u00a0Also,\u00a0feel free to follow us on\u00a0<a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Twitter<\/a>\u00a0and don\u2019t forget to join our\u00a0<a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">100k+ ML SubReddit<\/a>\u00a0and Subscribe to\u00a0<a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">our Newsletter<\/a>. Wait! are you on telegram?\u00a0<a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">now you can join us on telegram as well.<\/a><\/p>\n<p><a class=\"m-a-box-avatar-url\" href=\"https:\/\/www.marktechpost.com\/author\/6flvq\/\" rel=\"nofollow noopener\" target=\"_blank\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/10\/Screen-Shot-2021-09-14-at-9.02.24-AM-150x150.png\" class=\"avatar avatar-150 photo\" alt=\"\"   data-attachment-id=\"17663\" data-permalink=\"https:\/\/www.marktechpost.com\/?attachment_id=17663\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2019\/06\/Screen-Shot-2021-09-14-at-9.02.24-AM.png\" data-orig-size=\"832,778\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Screen Shot 2021-09-14 at 9.02.24 AM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2019\/06\/Screen-Shot-2021-09-14-at-9.02.24-AM-300x281.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2019\/06\/Screen-Shot-2021-09-14-at-9.02.24-AM.png\"\/><\/a><\/p>\n<p>Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.<\/p>\n<p><a href=\"https:\/\/www.google.com\/preferences\/source?q=MARKTECHPOST.com\" rel=\"nofollow noopener\" target=\"_blank\"> \ud83d\ude4c Follow MARKTECHPOST: Add us as a preferred source on Google.<\/a>        <\/p>\n","protected":false},"excerpt":{"rendered":"TL;DR: A team of researchers from Stanford University, SambaNova Systems and UC Berkeley introduce ACE framework that improves&hellip;\n","protected":false},"author":2,"featured_media":221005,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[45],"tags":[182,181,507,74],"class_list":{"0":"post-221004","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/221004","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/comments?post=221004"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/221004\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media\/221005"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media?parent=221004"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/categories?post=221004"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/tags?post=221004"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}