{"id":251364,"date":"2025-10-25T20:34:11","date_gmt":"2025-10-25T20:34:11","guid":{"rendered":"https:\/\/www.newsbeep.com\/us\/251364\/"},"modified":"2025-10-25T20:34:11","modified_gmt":"2025-10-25T20:34:11","slug":"burning-out-by-nathan-lambert","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/us\/251364\/","title":{"rendered":"Burning out &#8211; by Nathan Lambert"},"content":{"rendered":"<p>One of the obvious topics of the Valley today is <a href=\"https:\/\/www.nytimes.com\/2025\/09\/28\/business\/996-hustle-culture-tech.html\" rel=\"nofollow noopener\" target=\"_blank\">how hard everyone works<\/a>. We\u2019re inundated with comments on \u201cThe Great Lock In\u201d, <a href=\"https:\/\/jasmi.news\/i\/174127733\/\" rel=\"nofollow noopener\" target=\"_blank\">996<\/a>, 997, and now even a snarky 002 (midnight to midnight with a 2 hour break). Plenty of this is performative flexing on social media, but enough of it is real and reflecting how trends are unfolding in the LLM space. I\u2019m affected. My friends are affected.<\/p>\n<p>All of this hard work is downstream of ever increasing pressure to be relevant in the most exciting technology of our generation. This is all reflective of the LLM game changing. The time window to be a player at the most cutting edge is actually a closing window, not just what feels like one. There are many different sizes and types of models that matter, but as the market is now more fleshed out with resources, all of them are facing a constantly rising bar in quality of technical output. People are racing to stay above the rising tide \u2014 often damning any hope of life balance.<\/p>\n<p>AI is going down the path that other industries have before, but on steroids. There\u2019s a famous section of the book Apple in China, where the author Patrick McGee describes the programs Apple put in place to save the marriages of engineers traveling so much to China and working incredible hours. In an <a href=\"https:\/\/www.chinatalk.media\/p\/apple-in-china\" rel=\"nofollow noopener\" target=\"_blank\">interview on ChinaTalk<\/a>, McGee added \u201cNever mind the divorces, you need to look at the deaths.\u201d This is a grim reality that is surely playing out in AI.<\/p>\n<p>The Wall Street Journal recently published a piece on how <a href=\"https:\/\/www.wsj.com\/tech\/ai\/ai-race-tech-workers-schedule-1ea9a116\" rel=\"nofollow noopener\" target=\"_blank\">AI Workers Are Putting In 100-Hour Workweeks to Win the New Tech Arms Race<\/a>. The opening of the article is excellent to capture how the last year or two has felt if you\u2019re participating in the dance:<\/p>\n<p>Josh Batson no longer has time for social media. The AI researcher\u2019s only comparable dopamine hit these days is on Anthropic\u2019s Slack workplace-messaging channels, where he explores chatter about colleagues\u2019 theories and experiments on large language models and architecture.<\/p>\n<p>Work addicts abound in AI. I often count myself, but take a lot of effort to make it such that work expands to fill available time and not that I fill everything in around work. This WSJ article had a bunch of crazy comments that show the mental limits of individuals and the culture they act in, such as:<\/p>\n<p>Several top researchers compared the circumstances to war.<\/p>\n<p>Comparing current AI research to war is out of touch (especially with the grounding of actual wars happening simultaneously to the AI race!). What they really are learning is that pursuing an activity in a collective environment at an elite level over multiple years is incredibly hard. It is! War is that and more.<\/p>\n<p>In the last few months I\u2019ve been making an increasing number of analogies to how working at the sharp end of LLMs today is similar to training with a team to be elite athletes. The goals are far out and often singular, there are incredibly fine margins between success and failure, much of the grinding feels over tiny tasks that add up over time but you don\u2019t want to do in the moment, and you can never quite know how well your process is working until you compare your outputs with your top competition, which only happens a few times a year in both sports and language modeling.<\/p>\n<p>In college I was a D1 lightweight rower at Cornell University. I walked onto a team and we ended up winning 3 championships in 4 years. Much of this was happenstance, as much greatness is, but it\u2019s a crucial example in understanding how similar mentalities can apply in different domains across a life. My mindset around the LLM work I do today feels incredibly similar \u2014 complete focus and buy in \u2014 but I don\u2019t think I\u2019ve yet found a work environment where the culture is as cohesive as athletics. Where OpenAI\u2019s culture is often described as culty, there are often many signs that the core team members there absolutely love it, even if they\u2019re working 996, 997, or 002. When you love it, it doesn\u2019t feel like work. This is the same as why training 20 hours a week while a full time student can feel easy.<\/p>\n<p>Many AI researchers can learn from athletics and appreciate the value of rest. Your mental acuity can drop off faster than your physical peak performance does when not rested. Working too hard forces you to take narrower and less creative approaches. The deeper into the hole of burnout I get in trying to make you the next Olmo model, the worse my writing gets. My ability to spot technical dead ends goes with it. If the intellectual payoffs to rest are hard to see, your schedule doesn\u2019t have the space for creativity and insight.<\/p>\n<p>Crafting the team culture in both of these environments is incredibly difficult. It\u2019s the quality of the team culture that determines the outcome more than the individual components. Yes, with LLMs you can take brief shortcuts by hiring talent with years of experience from another frontier lab, but that doesn\u2019t change the long-term dynamic. Yes, you obviously need as much compute as you can get. At the same time, culture is incredibly fickle. It\u2019s easier to lose than it is to build.<\/p>\n<p>Some argue that starting a new lab today can be an advantage against the established labs because you get to start from scratch with a cleaner codebase, but this is cope. Three core ingredients of training: Internal tools (recipes, code-bases, etc.), resources (compute, data), and personnel. Leadership sets the direction and culture, where management executes with this direction. All elements are crucial and cannot be overlooked. The further along the best models get, the harder starting from scratch is going to become. Eventually, this dynamic will shift back in favor of starting from scratch, because public knowhow and tooling will catch up, but in the meantime the closed tools are getting better at a far faster rate than the fully open tools.<\/p>\n<p data-attrs=\"{&quot;url&quot;:&quot;https:\/\/www.interconnects.ai\/p\/burning-out?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\" class=\"button-wrapper\"><a href=\"https:\/\/www.interconnects.ai\/p\/burning-out?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share\" rel=\"nofollow noopener\" class=\"button primary\" target=\"_blank\">Share<\/a><\/p>\n<p>The likes of SSI, Thinky, and Reflection are likely the last efforts that are capitalized enough to maybe catch up in the near term, but the odds are not on their side. Getting infinite compute into a new company is meaningless if you don\u2019t already have your code, data, and pretraining architectures ready. Eventually the clock will run out for company plans to be just catching up to the frontier, and then figure it out from there. The more these companies raise, the more the expectations on their first output will increase as well. It\u2019s not an enviable position, but it\u2019s certainly ambitious.<\/p>\n<p>In many ways I see the culture of Chinese technology companies (and education systems) as being better suited for this sort of catch up work. Many top AI researchers trained in the US want to work on a masterpiece, where what it takes in language modeling is often extended grinding to stabilize and replicate something that you know definitely can work. <\/p>\n<p>I used to think that the AI bubble would pop financially, as seen through a series of economic mergers, acquisitions, and similar deals. I\u2019m shifting to see more limitations on the human capital than the financial capital thrown at today\u2019s AI companies. As the technical standard of relevance increases (i.e. how good the models people want to use are, or the best open model of a given size category), it simply takes more focused work to get a model there. This work is hard to cheat in time.<\/p>\n<p>This all relates to how I, and other researchers, always comment on the low hanging fruit we see to keep improving the models. As the models have gotten better, our systems to build them have gotten more refined, complex, intricate, and numerically sensitive. While I see a similar amount of low-hanging fruit today as I did a year ago, the efforts (or physical resources, GPUs) it can take to unlock them have increased. This pushes people to keep going one step closer to their limits. This is piling on to more burnout. This is also why the WSJ reported that top researchers \u201csaid repeatedly that they work long hours by choice.\u201d The best feel like they need to do this work or they\u2019ll fall behind. It\u2019s running one more experiment, running one more vibe test, reviewing one more colleague\u2019s PR, reading one more paper, chasing down one more data contract. The to-do list is never empty.<\/p>\n<p>The amount of context that you need to keep in your brain to perform well in many LM training contexts is ever increasing. For example, leading post-training pipelines around the launch of ChatGPT looked like two or maybe three well separated training stages. Now there are tons of checkpoints flying around getting merged, sequenced, and chopped apart in part of the final project. Processes that used to be managed by one or two people now have teams coordinating many data and algorithmic efforts that are trying to land in just a few models a year. I\u2019ve personally transitioned from a normal researcher to something like a tech lead who is always trying to predict blockers before they come up (at any point in the post-training process) and get resources to fix them. I bounce in and out of problems to wherever the most risk is.<\/p>\n<p>Cramming and keeping technical context pushes out hobbies and peace of mind.<\/p>\n<p>Training general language models you hope others will adopt \u2014 via open weights or API \u2014 is becoming very much an all-in or all-out domain. Half-assing it is becoming an expensive way to make a model that no one will use. This wasn\u2019t the case two years ago, where playing around with a certain part of the pipeline was legitimately impactful.<\/p>\n<p>Culture is a fine line between performance and toxicity, and it\u2019s often hard to know which you are until you get to a major deliverable to check in versus competitors.<\/p>\n<p>Personally, I\u2019m fighting off a double-edged sword of this. I feel immense responsibility to make all the future Olmo models of the world great, while simultaneously trying to do a substantial amount of ecosystem work to create an informed discussion around the state of open models. My goal around this discussion is for more real things to be built. <a href=\"https:\/\/www.atomproject.ai\/\" rel=\"nofollow noopener\" target=\"_blank\">ATOM Project<\/a> is a manifestation of me feeling that both the U.S. ecosystem generally and the Olmo project are falling behind.<\/p>\n<p>It doesn\u2019t really seem like there will be an immediate fix or end goal at this, but looking back I\u2019m sure it\u2019ll be clear what the key moments were and whether or not my efforts here and elsewhere met my goals.<\/p>\n<p>Will it all be worth it? How long do you plan to go on like this? It\u2019s not like we\u2019re really going to suddenly reach AGI and then all pack it up and go home. AI progress is a long-haul now.<\/p>\n<p>For me, the only reason to keep going is to try and make AI a wonderful technology for the world. Some feel the same. Others are going because they\u2019re locked in on a path to generational wealth. Plenty don\u2019t have either of these alignments, and the wall of effort comes sooner.<\/p>\n<p><a target=\"_blank\" href=\"https:\/\/substackcdn.com\/image\/fetch\/$s_!aO6G!,f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd20d40d7-f57d-457f-ac63-aec8bac096cd_2400x1600.jpeg\" data-component-name=\"Image2ToDOM\" rel=\"nofollow noopener\" class=\"image-link image2 is-viewable-img\"><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/10\/https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/d20d40d7-f57d-457f-ac63-aec8bac096cd_2400.jpeg\" width=\"1456\" height=\"971\" data-attrs=\"{&quot;src&quot;:&quot;https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/d20d40d7-f57d-457f-ac63-aec8bac096cd_2400x1600.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:377931,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image\/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https:\/\/www.interconnects.ai\/i\/177056592?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd20d40d7-f57d-457f-ac63-aec8bac096cd_2400x1600.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" alt=\"\"   loading=\"lazy\" class=\"sizing-normal\"\/><\/a><\/p>\n<p>Thanks to Ross Taylor, Jordan Schneider, and Jasmine Sun for feedback on this post.<\/p>\n","protected":false},"excerpt":{"rendered":"One of the obvious topics of the Valley today is how hard everyone works. We\u2019re inundated with comments&hellip;\n","protected":false},"author":2,"featured_media":251365,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[45],"tags":[182,181,507,74],"class_list":{"0":"post-251364","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/251364","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/comments?post=251364"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/251364\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media\/251365"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media?parent=251364"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/categories?post=251364"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/tags?post=251364"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}