{"id":233406,"date":"2026-01-12T00:18:15","date_gmt":"2026-01-12T00:18:15","guid":{"rendered":"https:\/\/www.newsbeep.com\/il\/233406\/"},"modified":"2026-01-12T00:18:15","modified_gmt":"2026-01-12T00:18:15","slug":"use-multiple-models-by-nathan-lambert","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/il\/233406\/","title":{"rendered":"Use multiple models &#8211; by Nathan Lambert"},"content":{"rendered":"<p>I\u2019ll start by explaining my current AI stack and how it\u2019s changed in recent months. For chat, I\u2019m using a mix of:<\/p>\n<p>GPT 5.2 Thinking \/ Pro: My most frequent AI use is getting information. This is often a detail about a paper I\u2019m remembering, a method I\u2019m verifying for my <a href=\"https:\/\/rlhfbook.com\/\" rel=\"nofollow noopener\" target=\"_blank\">RLHF Book<\/a>, or some other niche fact. I know GPT 5.2 can find it if it exists, and I use Thinking for queries that I think are easier and Pro when I want to make sure the answer is right. Particularly GPT Pro has been the indisputable king for research for quite some time \u2014 Simon Willison\u2019s coining of it as his \u201c<a href=\"https:\/\/simonwillison.net\/2025\/Sep\/6\/research-goblin\/\" rel=\"nofollow noopener\" target=\"_blank\">research goblin<\/a>\u201d still feels right.<\/p>\n<p>I never use GPT 5 without thinking or other OpenAI chat models. Maybe I need to invest more in custom instructions, but the non-thinking models always come across a bit sloppy relative to the competition out there and I quickly churn. I\u2019ve heard gossip that the Thinking and non-Thinking GPT models are even developed by different teams, so it would make sense that they can end up being meaningfully different.<\/p>\n<p>I also rarely use Deep Research from any provider, opting for GPT 5.2 Pro and more specific instructions. In the first half of 2025 I almost exclusively used ChatGPT\u2019s thinking models \u2014 Anthropic and Google have done good work to win back some of my attention.<\/p>\n<p>Claude 4.5 Opus: Chatting with Claude is where I go for basic code questions, visualizing simple data, and getting richer feedback on my work or decisions. Opus\u2019s tone is particularly refreshing when trying to push the models a bit (in a way that GPT 4.5 used to provide for me, as I was a power user of that model in H1 2025). Claude Opus 4.5 isn\u2019t particularly fast relative to a lot of models out there, but when you\u2019re used to using the GPT Thinking models like me, it feels way faster (even with extended thinking always on, as I do) and sufficient for this type of work.<\/p>\n<p>Gemini 3 Pro: Gemini is for everything else \u2014 explaining concepts I know are well covered in the training data (and minor hallucinations are okay, e.g. my former Google rabbit holes), multimodality, and sometimes very long-context capabilities (but GPT 5.2 Thinking took a big step here, so it\u2019s a bit closer). I still open and use the Gemini app regularly, but it\u2019s a bit less locked-in than the other two.<\/p>\n<p>Relative to ChatGPT, sometimes I feel like the search mode of Gemini is a bit off. It could be a product decision with how the information is presented to the user, but GPT\u2019s thorough, repeated search over multiple sources instills a confidence I don\u2019t get from Gemini for recent or research information.<\/p>\n<p>Grok 4: I use Grok ~monthly to try and find some piece of AI news or Alpha I recall from browsing X. Grok is likely underrated in terms of its intelligence (particularly Grok 4 was an impressive technical release), but it hasn\u2019t had sticky product or differentiating features for me.<\/p>\n<p>For images I\u2019m using a mix of mostly Nano Banana Pro and sometimes GPT Image 1.5 when Gemini can\u2019t quite get it. <\/p>\n<p>For coding, I\u2019m primarily using Claude Opus 4.5 in Claude Code, but still sometimes find myself needing OpenAI\u2019s Codex or even multi-LLM setups like <a href=\"https:\/\/ampcode.com\/\" rel=\"nofollow noopener\" target=\"_blank\">Amp<\/a>. Over the holiday break, Claude Opus helped me update all the plots for <a href=\"https:\/\/atomproject.ai\/\" rel=\"nofollow noopener\" target=\"_blank\">The ATOM Project<\/a>, which included substantial processing of our raw data from scraping HuggingFace, perform substantive edits for the RLHF Book (where I felt it was a quite good editor when provided with detailed instructions on what it should do), and other side projects and life organization tasks. I recently published a piece explaining my current obsession with Claude Opus 4.5, I recommend you read it if you haven\u2019t had the chance:<\/p>\n<p><a native=\"true\" href=\"https:\/\/www.interconnects.ai\/p\/claude-code-hits-different?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web\" rel=\"nofollow noopener\" class=\"embedded-post\" target=\"_blank\"><\/p>\n<p>Claude Code Hits Different<\/p>\n<p>There is an incredible amount of hype for Claude Code with Opus 4.5 across the web right now, which I for better or worse entirely agree with. Having used coding agents extensively for the past 6-9 months, where it felt like sometimes OpenAI\u2019s Codex was the best and sometimes Claude, there was some meaningful jump over the last few weeks. The jump is we\u2026<\/p>\n<p>Read more<\/p>\n<p>2 days ago \u00b7 39 likes \u00b7 19 comments \u00b7 Nathan Lambert<\/p>\n<p><\/a><\/p>\n<p>A summary of this is that I pay for the best models and greatly value the marginal intelligence over speed \u2014 particularly because, for a lot of the tasks I do, I find that the models are just starting to be able to do them well. As these capabilities diffuse in 2026, speed will become more of a determining factor in model selection.<\/p>\n<p><a href=\"https:\/\/open.substack.com\/users\/5933616-peter-wildeford?utm_source=mentions\" target=\"_blank\" rel=\"noopener nofollow\" data-attrs=\"{&quot;name&quot;:&quot;Peter Wildeford&quot;,&quot;id&quot;:5933616,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https:\/\/substackcdn.com\/image\/fetch\/f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe19fc707-675c-45ca-bc5e-22de9b6d4bfa_250x320.png&quot;,&quot;uuid&quot;:&quot;dcde3240-5392-4cab-855f-694fbbdcab9d&quot;}\" data-component-name=\"MentionUser\" class=\"mention-pnpTE1\">Peter Wildeford<\/a> had a post on X with a nice graphic that reflected a very similar usage pattern:<\/p>\n<p><a href=\"https:\/\/x.com\/peterwildeford\/status\/2009287226925121947\" target=\"_blank\" rel=\"noopener noreferrer nofollow\" data-component-name=\"Twitter2ToDOM\" class=\"pencraft pc-display-contents pc-reset\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2026\/01\/https:\/\/pbs.substack.com\/profile_images\/2001373063951114240\/_zA9rfjb.jpg\"  alt=\"X avatar for @peterwildeford\"  width=\"40\" height=\"40\" draggable=\"false\" class=\"img-OACg1c object-fit-cover-u4ReeV pencraft pc-reset\"\/><\/p>\n<p>Peter Wildeford\ud83c\uddfa\ud83c\uddf8\ud83d\ude80@peterwildeford<\/p>\n<p>Here&#8217;s currently how I&#8217;m using each of the LLMs <\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2026\/01\/G-HHYUVW4AA00Vj.jpg\" class=\"image-c_FmAR\"\/><\/p>\n<p>3:33 PM \u00b7 Jan 8, 2026 \u00b7 70.6K Views<\/p>\n<p>38 Replies \u00b7 45 Reposts \u00b7 720 Likes<\/p>\n<p><\/a><\/p>\n<p data-attrs=\"{&quot;url&quot;:&quot;https:\/\/www.interconnects.ai\/p\/use-multiple-models?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\" class=\"button-wrapper\"><a href=\"https:\/\/www.interconnects.ai\/p\/use-multiple-models?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share\" rel=\"nofollow noopener\" class=\"button primary\" target=\"_blank\">Share<\/a><\/p>\n<p>Across all of these categories, it doesn\u2019t feel like I could get away with just using one of these models without taking a substantial haircut in capabilities. This is a very strong endorsement for the notion of AI being <a href=\"https:\/\/helentoner.substack.com\/p\/taking-jaggedness-seriously\" rel=\"nofollow noopener\" target=\"_blank\">jagged<\/a> \u2014 i.e. with very strong capabilities spread out unevenly \u2014 while also being a bit of an unusual way to need to use a product. Each model is jagged in its own way. Through 2023, 2024, and the earlier days of modern AI, it quite often felt like there was always just one winning model and keeping up was easier. Today, it takes a lot of work and fiddling to make sure you\u2019re not missing out on capabilities.<\/p>\n<p>The working pattern that I\u2019ve formed that most reinforces this using multiple models era is how often my problem with an AI model is solved by passing the same query to a peer model. Models get stuck, some can\u2019t find bugs, some coding agents keep getting stuck on some weird, suboptimal approach, and so on. In these cases, it feels quite common to boot up a peer model or agent and get it to unblock project.<\/p>\n<p>This multi-model approach or agent-switching happening occasionally would be what I\u2019d expect, but with it happening regularly it means that the models are actually all quite close to being able to solve the tasks I\u2019m throwing at them \u2014 they\u2019re just not quite there. The intuition here is that if we view each task as having a probability of success, if said the probability was low for each model, switching would almost always fail. For switching to regularly solve the task, each model must have a fairly high probability of success.<\/p>\n<p>For the time being, it seems like tasks at the frontier of AI capabilities will always keep this model-switching meta, but it\u2019s a moving suite of capabilities. The things I need to switch on now will soon be solved by all the next-generation of models.<\/p>\n<p>I\u2019m very happy with the value I\u2019m getting out of my hundreds of dollars of AI subscriptions, and you should likely consider doing the same if you work in a domain that sounds similar to mine.<\/p>\n<p>On the opposite side of the frontier models pushing to make current cutting edge tasks 100% reliable are open models pushing to undercut the price of frontier models. The coding plans on open models tend to cost 10X (or more) less than the frontier lab plans. It\u2019s a boring take, but for the next few years I expect this gap to largely remain steady, where a lot of people get an insane value out of the cutting edge of models. It\u2019ll take longer for the open model undercut to hit the frontier labs, even though from basic principles it looks like a precarious position for them to be in, in terms of costs of R&amp;D and deployment. Open models haven\u2019t been remotely close to Claude 4.5 Opus or GPT 5.2 Thinking in my use.<\/p>\n<p>The other factor is that 2025 gave us all of Deep Research agents, code\/CLI agents, search (and Pro) tool use models, and there will almost certainly be new form factors we end up using almost every day in released 2026. Historically, closed labs have been better at shipping new products into the world, but with better open models this should be more diffused, as good product capabilities are very diffuse across the tech ecosystem.  To capitalize on this, you need to invest time (and money) trying all the cutting-edge AI tools you can get your hands on. Don\u2019t be loyal to one provider.<\/p>\n<p><a target=\"_blank\" href=\"https:\/\/substackcdn.com\/image\/fetch\/$s_!pt_9!,f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa87c6753-015d-496a-913c-9fa03b0d14eb_2848x1504.png\" data-component-name=\"Image2ToDOM\" rel=\"nofollow noopener\" class=\"image-link image2 is-viewable-img can-restack\"><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2026\/01\/https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/a87c6753-015d-496a-913c-9fa03b0d14eb_2848.jpeg\" width=\"1456\" height=\"769\" data-attrs=\"{&quot;src&quot;:&quot;https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/a87c6753-015d-496a-913c-9fa03b0d14eb_2848x1504.png&quot;,&quot;srcNoWatermark&quot;:&quot;https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/265feb30-690d-4064-a5b0-0d98c44ad58e_2848x1504.jpeg&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:769,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2055244,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image\/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https:\/\/www.interconnects.ai\/i\/183585383?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F265feb30-690d-4064-a5b0-0d98c44ad58e_2848x1504.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" alt=\"\"   loading=\"lazy\" class=\"sizing-normal\"\/><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"I\u2019ll start by explaining my current AI stack and how it\u2019s changed in recent months. For chat, I\u2019m&hellip;\n","protected":false},"author":2,"featured_media":233407,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[345,343,344,85,46,125],"class_list":{"0":"post-233406","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-il","12":"tag-israel","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts\/233406","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/comments?post=233406"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts\/233406\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/media\/233407"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/media?parent=233406"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/categories?post=233406"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/tags?post=233406"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}