{"id":167207,"date":"2025-11-30T07:14:12","date_gmt":"2025-11-30T07:14:12","guid":{"rendered":"https:\/\/www.newsbeep.com\/ie\/167207\/"},"modified":"2025-11-30T07:14:12","modified_gmt":"2025-11-30T07:14:12","slug":"how-to-scale-your-llm-usage","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/ie\/167207\/","title":{"rendered":"How to Scale Your LLM Usage"},"content":{"rendered":"<p class=\"wp-block-paragraph\"> has perhaps been the most important word when it comes to Large Language Models (LLMs), with the release of ChatGPT. ChatGPT was made so successful, largely because of the scaled pre-training OpenAI did, making it a powerful language model.<\/p>\n<p class=\"wp-block-paragraph\">Following that, Frontier LLM labs started scaling the\u00a0post-training,\u00a0with supervised fine-tuning and RLHF, where models got increasingly better at instruction following and performing complex tasks.<\/p>\n<p class=\"wp-block-paragraph\">And just when we thought LLMs were about to plateau, we started doing inference-time scaling with the release of reasoning models, where spending thinking tokens gave huge improvements to the quality of outputs.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/ie\/wp-content\/uploads\/2025\/11\/image-309.png\" alt=\"Infographic: Scaling LLM Usage\" class=\"wp-image-633078\"\/>This infographic highlights the main contents of this article. I\u2019ll first discuss why you should scale your LLM usage, highlighting how it can lead to increased productivity. Continuing, I\u2019ll specify how you can increase your LLM usage, covering techniques like running parallel coding agents and using deep research mode in Gemini 3 Pro. Image by Gemini<\/p>\n<p class=\"wp-block-paragraph\">I now argue we should continue this scaling with a new scaling paradigm: usage-based scaling, where you scale how much you\u2019re using LLMs:<\/p>\n<p>Run more coding agents in parallel<\/p>\n<p>Always start a deep research on a topic of interest<\/p>\n<p>Run information fetching workflows<\/p>\n<p class=\"wp-block-paragraph\">If you\u2019re not firing off an agent before going to lunch, or going to sleep, you\u2019re wasting time<\/p>\n<p class=\"wp-block-paragraph\">In this article, I\u2019ll discuss why scaling LLM usage can lead to increased productivity, especially when working as a programmer. Furthermore, I\u2019ll discuss specific techniques you can use to scale your LLM usage, both personally, and for companies you\u2019re working for. I\u2019ll keep this article high-level, aiming to inspire how you can maximally utilize AI to your advantage.<\/p>\n<p>Why you should scale LLM usage<\/p>\n<p class=\"wp-block-paragraph\">We have already seen scaling be incredibly powerful previously with:<\/p>\n<p>pre-training<\/p>\n<p>post-training<\/p>\n<p>inference time scaling<\/p>\n<p class=\"wp-block-paragraph\">The reason for this is that it turns out the more computing power you spend on something, the better output quality you\u2019ll achieve. This, of course, assumes you\u2019re able to spend the computer effectively. For example, for pre-training, being able to scale computing relies on<\/p>\n<p>Large enough models (enough weights to train)<\/p>\n<p>Enough data to train on<\/p>\n<p class=\"wp-block-paragraph\">If you scale compute without these two components, you won\u2019t see improvements. However, if you do scale all three, you get amazing results, like the frontier LLMs we\u2019re seeing now, for example, with the release of Gemini 3.<\/p>\n<p class=\"wp-block-paragraph\">I thus argue you should look to scale your own LLM usage as much as possible. This could, for example, be firing off several agents to code in parallel, or starting Gemini deep research on a topic you\u2019re interested in.<\/p>\n<p class=\"wp-block-paragraph\">Of course, the usage must still be of value. There\u2019s no point in starting a coding agent on some obscure task you have no need for. Rather, you should start a coding agent on:<\/p>\n<p>A linear issue you never felt you had time to sit down and do yourself<\/p>\n<p>A quick feature was requested in the last sales call<\/p>\n<p>Some UI improvements, you know, today\u2019s coding agents handle easily<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/ie\/wp-content\/uploads\/2025\/11\/image-306.png\" alt=\"\" class=\"wp-image-633072\"\/> This image shows scaling laws, showing how we can see increased performance with increased scaling. I argue the same thing will happen when scaling our LLM usage. Image from <a href=\"https:\/\/www.node-masters.com\/blog\/why-openai-is-so-confident-on-future-ai-progress\" rel=\"nofollow noopener\" target=\"_blank\">NodeMasters.<\/a><\/p>\n<p class=\"wp-block-paragraph\">In a world with abundance of resources, we should look to maximize our use of them<\/p>\n<p class=\"wp-block-paragraph\">My main point here is that the threshold to perform tasks has decreased significantly since the release of LLMs. Previously, when you got a bug report, you had to sit down for 2 hours in deep concentration, thinking about how to solve that bug.<\/p>\n<p class=\"wp-block-paragraph\">However, today, that\u2019s no longer the case. Instead, you can go into Cursor, put in the bug report, and ask Claude Sonnet 4.5 to attempt to fix it. You can then come back 10 minutes later, test if the problem is fixed, and create the pull request.<\/p>\n<p class=\"wp-block-paragraph\">How many tokens can you spend while still doing something useful with the tokens<\/p>\n<p>How to scale LLM usage<\/p>\n<p class=\"wp-block-paragraph\">I talked about why you should scale LLM usage by running more coding agents, deep research agents, and any other AI agents. However, it can be hard to think of exactly what LLMs you should fire off. Thus, in this section, I\u2019ll discuss specific agents you can fire off to scale your LLM usage.<\/p>\n<p>Parallel coding agents<\/p>\n<p class=\"wp-block-paragraph\">Parallel coding agents are one of the simplest ways to scale LLM usage for any programmer. Instead of only working on one problem at a time, you start two or more agents at the same time, either using Cursor agents, Claude code, or any other agentic coding tool. This is typically made very easy to do by utilizing Git worktrees.<\/p>\n<p class=\"wp-block-paragraph\">For example, I typically have one main task or project that I\u2019m working on, where I\u2019m sitting in Cursor and programming. However, sometimes I get a bug report coming in, and I automatically route it to Claude Code to make it search for why the problem is happening and fix it if possible. Sometimes, this works out of the box; sometimes, I have to help it a bit.<\/p>\n<p class=\"wp-block-paragraph\">However, the cost of starting this bug fixing agent is super low (I can literally just copy the Linear issue into Cursor, which can read the issue using Linear MCP). Similarly, I also have a script automatically researching relevant prospects, which I have running in the background.<\/p>\n<p>Deep research<\/p>\n<p class=\"wp-block-paragraph\">Deep research is a functionality you can use in any of the frontier model providers like Google Gemini, OpenAI ChatGPT, and Anthropic\u2019s Claude. I prefer Gemini 3 deep research, though there are many other solid deep research tools out there. <\/p>\n<p class=\"wp-block-paragraph\">Whenever I\u2019m interested in learning more about a topic, finding information, or anything similar, I fire off a deep research agent with Gemini.<\/p>\n<p class=\"wp-block-paragraph\">For example, I was interested in finding some prospects given a specific ICP. I then quickly pasted the ICP information into Gemini, gave it some contextual information, and had it start researching, so that it could run while I was working on my main programming project.<\/p>\n<p class=\"wp-block-paragraph\">After 20 minutes, I had a brief report from Gemini, which turned out to contain loads of useful information.<\/p>\n<p>Creating workflows with n8n<\/p>\n<p class=\"wp-block-paragraph\">Another way to scale LLM usage is to create workflows with n8n or any similar workflow-building tool. With n8n, you can build specific workflows that, for example, read Slack messages and perform some action based on those Slack messages.<\/p>\n<p class=\"wp-block-paragraph\">You could, for instance, have a workflow that reads a bug report group on Slack and automatically starts a Claude code agent for a given bug report. Or you could create another workflow that aggregates information from a lot of different sources and provides it to you in an easily readable format. There are essentially endless opportunities with workflow-building tools.<\/p>\n<p>More<\/p>\n<p class=\"wp-block-paragraph\">There are many other techniques you can use to scale your LLM usage. I\u2019ve only listed the first few items that came to mind for me when I\u2019m working with LLMs. I recommend always keeping in mind what you can automate using AI, and how you can leverage it to become more effective. How to scale LLM usage will vary widely from different companies, job titles, and many other factors.<\/p>\n<p>Conclusion<\/p>\n<p class=\"wp-block-paragraph\">In this article, I\u2019ve discussed how to scale your LLM usage to become a more effective engineer. I argue that we\u2019ve seen scaling work incredibly well in the past, and it\u2019s highly likely we can see increasingly powerful results by scaling our own usage of LLMs. This could be firing off more coding agents in parallel, running deep research agents while eating lunch. In general, I believe that by increasing our LLM usage, we can become increasingly productive.<\/p>\n<p class=\"wp-block-paragraph\">\ud83d\udc49 Find me on socials:<\/p>\n<p class=\"wp-block-paragraph\">\ud83d\udcda\u00a0<a href=\"https:\/\/eivindkjosbakken.com\/ebook\" rel=\"nofollow noopener\" target=\"_blank\">Get my free Vision Language Models ebook<\/a><\/p>\n<p class=\"wp-block-paragraph\">\ud83d\udcbb <a href=\"https:\/\/www.eivindkjosbakken.com\/webinar\" rel=\"nofollow noopener\" target=\"_blank\">My webinar on Vision Language Models<\/a><\/p>\n<p class=\"wp-block-paragraph\">\ud83d\udce9\u00a0<a href=\"https:\/\/eivindkjosbakken.com\/newsletter\" rel=\"nofollow noopener\" target=\"_blank\">Subscribe to my newsletter<\/a><\/p>\n<p class=\"wp-block-paragraph\">\ud83e\uddd1\u200d\ud83d\udcbb\u00a0<a href=\"https:\/\/eivindkjosbakken.com\/\" rel=\"nofollow noopener\" target=\"_blank\">Get in touch<\/a><\/p>\n<p class=\"wp-block-paragraph\">\ud83d\udd17 <a href=\"https:\/\/www.linkedin.com\/in\/eivind-kjosbakken\/\" rel=\"nofollow noopener\" target=\"_blank\">LinkedIn<\/a><\/p>\n<p class=\"wp-block-paragraph\">\ud83d\udc26 <a href=\"https:\/\/x.com\/EivindKjos\" rel=\"nofollow\">X \/ Twitter<\/a><\/p>\n<p class=\"wp-block-paragraph\">\u270d\ufe0f <a href=\"https:\/\/oieivind.medium.com\/\" rel=\"nofollow noopener\" target=\"_blank\">Medium<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"has perhaps been the most important word when it comes to Large Language Models (LLMs), with the release&hellip;\n","protected":false},"author":2,"featured_media":167208,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[220,92833,218,219,2670,2210,61,60,17544,92834,80],"class_list":{"0":"post-167207","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-ai-scaling","10":"tag-artificial-intelligence","11":"tag-artificialintelligence","12":"tag-chatgpt","13":"tag-gemini","14":"tag-ie","15":"tag-ireland","16":"tag-llm","17":"tag-llm-agents","18":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts\/167207","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/comments?post=167207"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts\/167207\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/media\/167208"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/media?parent=167207"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/categories?post=167207"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/tags?post=167207"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}