{"id":62447,"date":"2025-10-09T00:32:19","date_gmt":"2025-10-09T00:32:19","guid":{"rendered":"https:\/\/www.newsbeep.com\/il\/62447\/"},"modified":"2025-10-09T00:32:19","modified_gmt":"2025-10-09T00:32:19","slug":"a-cloud-built-for-python-data-scientists-not-infrastructure-engineers","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/il\/62447\/","title":{"rendered":"A Cloud Built for Python Data Scientists, Not Infrastructure Engineers"},"content":{"rendered":"<p>The cloud is incredibly useful \u2014 but what if you\u2019re a Python-loving data scientist?<\/p>\n<p>The prevailing advice has been that if you want to run industrial-grade <a href=\"https:\/\/thenewstack.io\/what-is-python\/\" class=\"local-link\" rel=\"nofollow noopener\" target=\"_blank\">Python,<\/a> then run it on Kubernetes.<\/p>\n<p>\u201cWe just think that\u2019s dead wrong,\u201d said <a href=\"https:\/\/matthewrocklin.com\/\" class=\"ext-link\" rel=\"external  nofollow noopener\" onclick=\"this.target=&#039;_blank&#039;;\" target=\"_blank\">Matthew Rocklin<\/a>.<\/p>\n<p>In 2020, Rocklin co-founded <a href=\"https:\/\/coiled.io\/\" class=\"ext-link\" rel=\"external  nofollow noopener\" onclick=\"this.target=&#039;_blank&#039;;\" target=\"_blank\">Coiled.io<\/a> to offer an even easier way to unlock the cloud\u2019s potential. \u201cThe answer is just \u2018Go use raw VMs [virtual machines]\u2019,\u201d Rocklin said <a href=\"https:\/\/talkpython.fm\/episodes\/show\/519\/data-science-cloud-lessons-at-scale#transcript-section\" class=\"ext-link\" rel=\"external  nofollow noopener\" onclick=\"this.target=&#039;_blank&#039;;\" target=\"_blank\">on the \u201cTalk Python\u201d podcast<\/a>. \u201cThey\u2019re actually pretty good, if you do a few things around them.\u201d (Like configuring the right software environments and appropriate logs.)<\/p>\n<p>In 2015, Rocklin created <a href=\"https:\/\/en.wikipedia.org\/wiki\/Dask_(software)\" class=\"ext-link\" rel=\"external  nofollow noopener\" onclick=\"this.target=&#039;_blank&#039;;\" target=\"_blank\">Dask<\/a>, a Python library to spin up lots of VMs for analyzing and manipulating data. And after years contributing to Python projects for data science (like Tools, Multiple Dispatch, and SimPy), Rocklin co-founded Coiled.io to make it even easier to deploy that VM-creating software.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignright size-medium wp-image-22801994\" src=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2025\/10\/88920bde-talkpython-300x225.png\" alt=\"\" width=\"300\" height=\"225\"\/><\/p>\n<p>He explained their mission last month <a href=\"https:\/\/www.youtube.com\/live\/omBibVGLzyo?si=Nmx6q00_j4A_i5ZH\" class=\"ext-link\" rel=\"external  nofollow noopener\" onclick=\"this.target=&#039;_blank&#039;;\" target=\"_blank\">in a podcast episode<\/a> explaining \u201cThe messy truth of cloud-scale Python.\u201d Podcast host <a href=\"https:\/\/www.linkedin.com\/in\/mkennedy\/\" class=\"ext-link\" rel=\"external  nofollow noopener\" onclick=\"this.target=&#039;_blank&#039;;\" target=\"_blank\">Michael Kennedy<\/a> agreed that much of today\u2019s cloud infrastructure seems focused on web and API developers. Even the tutorials for data scientists aren\u2019t emphasizing <a href=\"https:\/\/thenewstack.io\/docker-containers-that-could-be-essential-for-your-small-business\/\" class=\"local-link\" rel=\"nofollow noopener\" target=\"_blank\">Docker<\/a> and <a href=\"https:\/\/thenewstack.io\/introduction-to-linux-operating-system\/\" class=\"local-link\" rel=\"nofollow noopener\" target=\"_blank\">Linux skills<\/a>, Kennedy believes \u2014 although Rocklin sees another possible response. \u201cMaybe we shouldn\u2019t solve this by educating people.<\/p>\n<p>\u201cMaybe we should solve it by building better tooling.\u201d<\/p>\n<p>It\u2019s a fresh perspective straight from the heart of the Python community. And throughout the podcast, Rocklin made the case that data scientists have their own unique set of concerns.<\/p>\n<p>And that a VM-oriented solution like Coiled could be the right tool for the job.<\/p>\n<p>Why Docker and Kubernetes Aren\u2019t Ideal for Data Scientists<\/p>\n<p>Ask ChatGPT for some commands you can cut-and-paste to launch 100 virtual machines, he said, \u201cand it\u2019ll type at you for a couple of minutes! And it\u2019s not the kind of typing that most data scientist people who have just used Python for a couple of years can do.<\/p>\n<p>\u201cI was actually quite shocked at how hard this relatively commonplace thing was to do.\u201d<\/p>\n<p>Rocklin acknowledges Docker is a great tool, but not necessarily for data scientists, since it\u2019s \u201cvery much specialized to provide a really stable system that can run for decades.\u201d Data scientists, though, want \u201ca system that can change every five minutes. The choices that tools like Docker, Kubernetes or Terraform make are actually quite different than the choices you would make if you were building sort of middleware for this audience.<\/p>\n<p>\u201cIt\u2019s designed for cloud infrastructure engineers.\u201d (And while middleware exists, \u201cit\u2019s not designed for our use cases.\u201d)<\/p>\n<p>So, \u201cWe rolled our own.\u201d<\/p>\n<p>And during the podcast, he quickly spun up a 1,000-core EC2 cluster from a notebook computer \u2014 twice.<\/p>\n<p>A Simple Demo: Spinning up a Cluster With Python Decorators<\/p>\n<p>During that demo, podcast host Kennedy marvelled at how much capability was packed into simple Python statements.<\/p>\n<p> vm_type=&#8221;g5.xlarge&#8221;,<br \/>keepalive=&#8221;20 minutes&#8221;<br \/>region=&#8221;us-west-2&#8243;,<\/p>\n<p>And while they spoke, Rocklin switched off the ARM hardware just by typing one character (changing the decorator statement with the ARM flag into a comment).<\/p>\n<p> # arm=True, <\/p>\n<p>And then he began bringing up a new cluster.<\/p>\n<p>Python\u2019s decorators have always allowed you to extend a function\u2019s behavior \u2014 so these statements extend the VM-defining Coiled function (that\u2019s available after importing the Coiled library). \u201cWhat we joke about internally is that our core competency is turning VMs on and off,\u201d Rocklin said. \u201cOnce you have that technology, writing APIs around it is pretty cheap.\u201d<\/p>\n<p>Rocklin also believes that if you put a Docker push cycle into the data science work cycle, \u201cIt gums everything up. People end up not doing it.\u201d So instead of using Docker, Coiled\u2019s VMs copy a user\u2019s environment.<\/p>\n<p>The end result of this demo? A thousand machines that look just like the user\u2019s original machine, \u201cjust more numerous or bigger or with GPUs, or whatever you like.\u201d<\/p>\n<p>The first 1,000-vm cluster cost $1.39, Rocklin said (adding that the second one \u201cis costing me 45 cents so far \u2026 \u201c). \u201cThe cloud is both way cheaper and way more expensive than I realized going in, based on whether or not you\u2019re doing it correctly, or doing it incorrectly. There\u2019s like several orders of magnitude difference.\u201d<\/p>\n<p>Later, Rocklin even puts a number to it. \u201cServerless, Lambda and similar technologies typically have like a 4X to 5X premium on cost. They also have limitations like you can\u2019t get big machines, you can\u2019t get GPUs, your software environments have to be of a certain size.\u201d<\/p>\n<p>How To Avoid Unexpected Cloud Billing<\/p>\n<p>Also joining them on the podcast was Coiled staff software engineer <a href=\"https:\/\/www.linkedin.com\/in\/nat-tabris\/\" class=\"ext-link\" rel=\"external  nofollow noopener\" onclick=\"this.target=&#039;_blank&#039;;\" target=\"_blank\">Nat Tabris<\/a>, who sees that as another difficulty of the cloud: its lack of guardrails, especially for people who don\u2019t know where the risks are.<\/p>\n<p>Rocklin smiled, remembered being a grad student using <a href=\"https:\/\/aws.amazon.com\/?utm_content=inline+mention\" class=\"ext-link\" target=\"_blank\" rel=\"external  nofollow noopener\" onclick=\"this.target=&#039;_blank&#039;;\">Amazon Web Services<\/a>\u2018 free tier, who created some VMs, turned them off, \u201cand then three months later I get a bill for $400. And it wasn\u2019t the VMs, it was the attached storage to the VMs or some networking resource that had stuck around \u2014 that I had no concept of.\u201d<\/p>\n<p>Kennedy adds that there are \u201call sorts of little other services\u201d that can surprise you with fees (including databases and database storage). \u201cAnd so part of what we try to do,\u201d Tabris said, \u201cis put in defaults, put in controls so that you can\u2019t accidentally spend that much money.\u201d<\/p>\n<p>Ironically, that essential compute time \u201ctends to be a fairly predictable part of the cost.\u201d The surprisingly large bills come from \u201call of these other things that you don\u2019t even think about \u2014 like, \u2018If I flip this setting, now I\u2019m hitting this S3 API a lot, and it turns out you pay per API call.&#8217;\u201d<\/p>\n<p>Tabris remembers a customer using a 1,000-node cluster who\u2019d set up debug-level logging, which created \u201cvery chatty logs \u2026 I think it was like a $15,000 bill.\u201d (Although that story \u201chad a happy ending, because we talked to AWS and they ended up eating that cost for the customer.\u201d) Rocklin points out that\u2019s another good lesson for handling these sudden surprise bills: If you talk to AWS, they can give you money back.<\/p>\n<p>And Coiled now has a warning if it sees chatty logs.<\/p>\n<p>So when Kennedy asks what the workflow is to make sure his 2,000 machines didn\u2019t run all day or unnecessarily, Rocklin points out Coil watches for that automatically \u2014 and shuts down machines if they aren\u2019t being used.<\/p>\n<p>The Freedom To Experiment<\/p>\n<p>But something happens when VMs are easy to create, said Rocklin: It gives users \u201ca lot of ability for the user to start experimenting with hardware.\u201d (One user ran through every region in their cloud trying to find A100 GPU instances.) \u201cWe often see people playing with ARM versus Intel versus AMD, playing with every GPU type.\u201d<\/p>\n<p>And you can also experiment with regions. For example, if your data set is stored in one region, Tabris said, \u201cit makes an orders-of-magnitude difference how quickly you can download it if you are close to it, than if you are far from it.\u201d<\/p>\n<p>Tabris came from the web development world, but realized that for data scientists, \u201cit actually makes sense to try out different instance types to explore. \u2018What\u2019s this GPU do for me?&#8217;\u201d Different CPUs can also make a difference \u2014 even small changes like going from the ARMv8 to ARMv7. \u201cSome of that actually really does make a difference for data science workloads, because it has to do with those wide instructions.\u201d<\/p>\n<p>Some CPUS have better memory \u2014 DDR5 instead of DDR4. \u201cDoes that make a difference for my workload? Is it going to save money?\u201d It may be hard to know in advance, but \u201cIt\u2019s really easy to just try.\u201d<\/p>\n<p>Rocklin later calls it \u201cthe joy of this \u2026 It\u2019s that variety that\u2019s actually really a core part of the cloud,\u201d calling it something Coiled cares a great deal about.<\/p>\n<p>The Philosophy of Making Cloud Computing Playful<\/p>\n<p>Podcast host Kennedy appreciated the extra ease, since variety and experimentation are ultimately a key part of the data science ethos. \u201cWe\u2019re going to experiment, we\u2019re going to explore, we\u2019re going to play.\u201d<\/p>\n<p>And Rocklin agreed. \u201cI think a lot of why Python became popular is that it feels like play, often. We\u2019re given these libraries that are both easy to use and powerful. And that feels like play.\u201d<\/p>\n<p>In contrast, working with the Boto library in AWS or writing YAML in Kubernetes \u201cdoes not feel like play \u2026 But here today we got to play with making 2,000 VMs \u2014 half ARM, half Intel. Half on the U.S. east coast, half on the U.S. west coast \u2026 And now suddenly the cloud is like play.<\/p>\n<p>\u201cAnd you just do different things when things become playful. You behave differently. Folks have fun. And the cloud is a really fun tool to use. Once you get past all the pain.\u201d<\/p>\n<p>When asked for final thoughts, Rocklin said the cloud\u2019s great promise, for a delightful and powerful data tool \u2014 isn\u2019t always delivered well. He urges data scientists not to settle.<\/p>\n<p>Kennedy acknowledged, \u201cIt\u2019s gotten really complex \u2014 but it doesn\u2019t have to be.\u201d And Tabris added that \u201cThis message of \u2018Things are supposed to be delightful\u2019 is important to us.\u201d<\/p>\n<p>Rocklin agrees that the cloud \u201ccan be a delightful experience \u2026 We should all be playing. If you don\u2019t want to use Coiled, that\u2019s fine. But there\u2019s other ways to do things. Go play.\u201d<\/p>\n<p>\t<a class=\"row youtube-subscribe-block\" href=\"https:\/\/youtube.com\/thenewstack?sub_confirmation=1\" target=\"_blank\" rel=\"nofollow noopener\"><\/p>\n<p>\n\t\t\t\tYOUTUBE.COM\/THENEWSTACK\n\t\t\t<\/p>\n<p>\n\t\t\t\tTech moves fast, don&#8217;t miss an episode. Subscribe to our YouTube<br \/>\n\t\t\t\tchannel to stream all our podcasts, interviews, demos, and more.\n\t\t\t<\/p>\n<p>\t\t\t\tSUBSCRIBE<\/p>\n<p>\t<\/a><\/p>\n<p>    Group<br \/>\n    Created with Sketch.<\/p>\n<p>\t\t<a href=\"https:\/\/thenewstack.io\/author\/destiny\/\" class=\"author-more-link\" rel=\"nofollow noopener\" target=\"_blank\"><\/p>\n<p>\t\t\t\t\t<img decoding=\"async\" class=\"post-author-avatar\" src=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2025\/10\/82081813-7zddypfe_400x400.jpg\"\/><\/p>\n<p>\n\t\t\t\t\t\t\tDavid Cassel is a proud resident of the San Francisco Bay Area, where he&#8217;s been covering technology news for more than two decades. Over the years his articles have appeared everywhere from CNN, MSNBC, and the Wall Street Journal Interactive&#8230;\t\t\t\t\t\t<\/p>\n<p>\t\t\t\t\t\tRead more from David Cassel\t\t\t\t\t\t<\/p>\n<p>\t\t<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"The cloud is incredibly useful \u2014 but what if you\u2019re a Python-loving data scientist? The prevailing advice has&hellip;\n","protected":false},"author":2,"featured_media":62448,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21],"tags":[353,85,46,125],"class_list":{"0":"post-62447","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-computing","8":"tag-computing","9":"tag-il","10":"tag-israel","11":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts\/62447","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/comments?post=62447"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts\/62447\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/media\/62448"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/media?parent=62447"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/categories?post=62447"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/tags?post=62447"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}