{"id":329901,"date":"2025-12-05T18:09:13","date_gmt":"2025-12-05T18:09:13","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/329901\/"},"modified":"2025-12-05T18:09:13","modified_gmt":"2025-12-05T18:09:13","slug":"helping-ai-have-long-term-memory","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/329901\/","title":{"rendered":"Helping AI have long-term memory"},"content":{"rendered":"<p data-block-key=\"il1w2\">The <a href=\"https:\/\/en.wikipedia.org\/wiki\/Transformer_(deep_learning)\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">Transformer architecture<\/a> revolutionized <a href=\"https:\/\/medium.com\/machine-learning-basics\/sequence-modelling-b2cdf244c233\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">sequence modeling<\/a> with its introduction of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Attention_%28machine_learning%29\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">attention<\/a>, a mechanism by which models look back at earlier inputs to prioritize relevant input data. However, computational cost increases drastically with sequence length, which limits the ability to scale Transformer-based models to extremely long contexts, such as those required for full-document understanding or genomic analysis.<\/p>\n<p data-block-key=\"36kb5\">The research community explored various approaches for solutions, such as efficient linear <a href=\"https:\/\/www.d2l.ai\/chapter_recurrent-modern\/index.html\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">recurrent neural networks<\/a> (RNNs) and <a href=\"https:\/\/huggingface.co\/blog\/lbourdois\/get-on-the-ssm-train\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">state space models<\/a> (SSMs) like <a href=\"https:\/\/arxiv.org\/pdf\/2405.21060\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">Mamba-2<\/a>. These models offer fast, linear scaling by compressing context into a fixed-size. However, this fixed-size compression cannot adequately capture the rich information in very long sequences.<\/p>\n<p data-block-key=\"40m00\">In two new papers, <a href=\"https:\/\/arxiv.org\/abs\/2501.00663\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">Titans<\/a> and <a href=\"https:\/\/arxiv.org\/pdf\/2504.13173\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">MIRAS<\/a>, we introduce an architecture and theoretical blueprint that combine the speed of RNNs with the accuracy of transformers. Titans is the specific architecture (the tool), and MIRAS is the theoretical framework (the blueprint) for generalizing these approaches. Together, they advance the concept of test-time memorization, the ability of an AI model to maintain long-term memory by incorporating more powerful \u201csurprise\u201d metrics (i.e., unexpected pieces of information) while the model is running and without dedicated offline retraining.<\/p>\n<p data-block-key=\"eic3n\">The MIRAS framework, as demonstrated by Titans, introduces a meaningful shift toward real-time adaptation. Instead of compressing information into a static state, this architecture actively learns and updates its own parameters as data streams in. This crucial mechanism enables the model to incorporate new, specific details into its core knowledge instantly.<\/p>\n","protected":false},"excerpt":{"rendered":"The Transformer architecture revolutionized sequence modeling with its introduction of attention, a mechanism by which models look back&hellip;\n","protected":false},"author":2,"featured_media":42712,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-329901","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/329901","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=329901"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/329901\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/42712"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=329901"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=329901"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=329901"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}