{"id":206094,"date":"2025-10-07T02:43:09","date_gmt":"2025-10-07T02:43:09","guid":{"rendered":"https:\/\/www.newsbeep.com\/us\/206094\/"},"modified":"2025-10-07T02:43:09","modified_gmt":"2025-10-07T02:43:09","slug":"aws-announces-expanded-support-for-custom-slurm-settings-in-aws-parallel-computing-service","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/us\/206094\/","title":{"rendered":"AWS Announces Expanded Support for Custom Slurm Settings in AWS Parallel Computing Service"},"content":{"rendered":"<p>Oct. 6, 2025 \u2014 <a href=\"https:\/\/aws.amazon.com\/\" rel=\"nofollow noopener\" target=\"_blank\">AWS<\/a> today announced expanded support for custom Slurm settings in AWS Parallel Computing Service (PCS).<\/p>\n<p>In the following post, Brendan Bouffler, head of Developer Relations in HPC Engineering at AWS, describes how expanded parameter support brings more control, consistency, and policy flexibility to HPC workloads running in the cloud.<\/p>\n<p><a href=\"https:\/\/www.hpcwire.com\/wp-content\/uploads\/2020\/06\/shutterstock_aws-1.jpg\" rel=\"nofollow noopener\" target=\"_blank\"><img fetchpriority=\"high\" decoding=\"async\" class=\"wp-image-89944 size-medium\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/10\/shutterstock_aws-1-300x200.jpg\" alt=\"\" width=\"300\" height=\"200\"  \/><\/a>Credit: Shutterstock<\/p>\n<p>Today we\u2019re excited to announce expanded support for custom Slurm settings in <a href=\"https:\/\/aws.amazon.com\/pcs\/\" rel=\"nofollow noopener\" target=\"_blank\">AWS Parallel Computing Service<\/a> (PCS). With this launch, PCS now enables you to configure over 65 Slurm parameters. And for the first time, you can also apply custom settings to queue resources, giving you partition-specific control over scheduling behavior.<\/p>\n<p>This release responds directly to customer feedback. Many organizations running HPC workloads on PCS told us they needed more flexibility to enforce access policies, implement fair-share scheduling, or just optimize job lifecycles. Others were blocked from critical use cases because they couldn\u2019t set the parameters they rely on in their on-premises Slurm clusters.<\/p>\n<p>In this post, we\u2019ll show you how new capabilities remove those limitations, making it possible to align your cloud-based environment with the operational and research requirements you already know from traditional HPC.<\/p>\n<p>Why Custom Slurm Settings Matter<\/p>\n<p>At its core, Slurm is about policy. Schedulers decide which jobs run, where they run, and how resources are shared among competing users and projects. Fine-tuning Slurm settings can mean the difference between a cluster that feels sluggish and unfair, and one that consistently delivers results while keeping everyone happy.<\/p>\n<p>With this release, PCS exposes many more knobs and dials. At the\u00a0cluster level, you can now set parameters that tune fair-share and quality-of-service, enable license management, customize the job lifecycle, and manage preemption. At the\u00a0queue level, you can implement resource limits, access controls, and further configure fair-share and priority behavior. At the\u00a0compute node group level, you can tag nodes with features, reserve resources, and adjust utilization parameters.<\/p>\n<p>These controls unlock important HPC scenarios. For example, a university research center can track compute usage per department for monthly chargeback, while still enforcing fair-share policies across the entire cluster. An industrial R&amp;D lab can assign higher QoS to safety-critical simulations, ensuring they preempt background batch work. Machine learning teams can dedicate GPU partitions to certified users, preventing general workloads from misusing scarce accelerators.<\/p>\n<p>How It Works<\/p>\n<p>You can apply Custom Slurm Settings through the AWS Console, CLI, or SDKs during resource creation or later via update operations.<\/p>\n<p>In the console, you\u2019ll find a new\u00a0Additional scheduler settings\u00a0section on the create or edit page for clusters, queues, and compute node groups. Adding a parameter is as simple as choosing\u00a0Add new setting, selecting the parameter from the drop-down (which includes descriptions and the associated Slurm config file), and entering its value. To remove a parameter, choose\u00a0Remove\u00a0and update the resource. Figure 1 shows the current console view for setting these configurations.<\/p>\n<p><a href=\"https:\/\/www.hpcwire.com\/wp-content\/uploads\/2025\/10\/image-copy-scaled.png\" rel=\"nofollow noopener\" target=\"_blank\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-184057 size-column\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/10\/image-copy-600x573.png\" alt=\"\" width=\"600\" height=\"573\"  \/><\/a>Figure 1 \u2013 You can edit Slurm custom settings through the console, CLI, or SDK. Here we show the AWS PCS console with various settings being changed. Whichever method you choose, settings are validated for correctness before being accepted.<\/p>\n<p>For those of you who prefer programmatic control, PCS now supports a\u00a0SlurmCustomSettings\u00a0field in create and update calls. This makes it easy to version, automate, and audit configuration changes as part of your DevOps pipeline.<\/p>\n<p>Examples in Action<\/p>\n<p>Let\u2019s look at a couple of concrete cases.<\/p>\n<p>Suppose you want every job in a given cluster to run a prolog script that sets up the environment. You can now update the cluster like this:<\/p>\n<p>This ensures that before any job starts, your script executes, handling tasks such as loading environment modules, staging data, or checking license availability.<\/p>\n<p>Now imagine you want a specific queue to be the default for users who don\u2019t specify a partition. That\u2019s straightforward too:<\/p>\n<p>This helps direct jobs automatically to the right partition, reducing user error and ensuring workloads land on the appropriate resources.<\/p>\n<p>Finally, if you maintain a compute node group with GPUs and NVMe storage, you can tag those features explicitly:<\/p>\n<p>Built-in Validation<\/p>\n<p>Configuring Slurm is powerful, but anyone who has wrestled with\u00a0slurm.conf\u00a0knows how easy it is to introduce mistakes. PCS adds a layer of protection with synchronous validation of custom settings. Each parameter is checked for correct data types, allowed values, and context. For example, time values must match Slurm\u2019s expected format, and parameters that only make sense with accounting enabled are validated against the current cluster state.<\/p>\n<p>If you submit an invalid configuration, PCS responds with a clear\u00a0ValidationException\u00a0that lists the problematic fields and provides actionable error messages. This helps administrators avoid downtime and misconfiguration while still retaining the flexibility they need.<\/p>\n<p>While PCS helps prevent errors through validation, it\u2019s still possible to submit an invalid configuration. If an update operation fails, the resource may enter an\u00a0UPDATE_FAILED\u00a0state. When this happens, review your configuration settings and ensure all related resources are in an\u00a0ACTIVE\u00a0state before submitting a corrected update request.<\/p>\n<p>What You Can Configure<\/p>\n<p>The list of\u00a0<a href=\"http:\/\/docs.aws.amazon.com\/pcs\/latest\/userguide\/slurm-custom-settings.html\" rel=\"nofollow noopener\" target=\"_blank\">supported parameters is extensive<\/a>. Cluster-level options include accounting controls like\u00a0AccountingStorageEnforce\u00a0and\u00a0AccountingStorageTRES, health monitoring (HealthCheckProgram,\u00a0HealthCheckInterval), job lifecycle hooks (TaskProlog,\u00a0TaskEpilog), and priority weights for fair-share and QoS scheduling.<\/p>\n<p>Queue-level options give fine-grained partition control. You can set\u00a0DefaultTime\u00a0and\u00a0MaxTime\u00a0for different workloads, define\u00a0AllowAccounts\u00a0and\u00a0AllowQoS\u00a0access restrictions, configure\u00a0PreemptMode\u00a0and\u00a0PriorityTier, or enforce differentiated billing with\u00a0TRESBillingWeights.<\/p>\n<p>At the compute node group level, you can assign Features tags, reserve memory with\u00a0MemSpecLimit, or pin specialized CPUs with\u00a0CpuSpecList.<\/p>\n<p>Together, these controls let you re-create the nuanced policies that many HPC organizations depend on, but now with the elasticity and manageability of PCS.<\/p>\n<p>Real-World Use Cases<\/p>\n<p>Consider a national weather agency running daily forecasts. By assigning high QoS to the forecasting partition and enabling preemption, urgent simulation jobs always displace less critical workloads, guaranteeing timely results.<\/p>\n<p>A pharmaceutical company running molecular dynamics simulations may use queue-level fair-share policies to ensure that no single research team monopolizes the cluster, while still allowing priority projects to move ahead.<\/p>\n<p>An academic HPC center could use this as a source of data to support their own chargeback mechanism so their research group can reconcile their consumption using actual compute time.<\/p>\n<p>These are just a few ways the expanded parameter set can map cloud HPC environments to the realities of institutional policy and workload diversity.<\/p>\n<p>Getting Started<\/p>\n<p>To explore the new functionality, log in to the\u00a0<a href=\"https:\/\/console.aws.amazon.com\/pcs\/home?#\/clusters\" rel=\"nofollow noopener\" target=\"_blank\">AWS PCS console<\/a>\u00a0and open the\u00a0Additional scheduler settings\u00a0section when creating or editing clusters, queues, or node groups. For a programmatic approach, add\u00a0SlurmCustomSettings\u00a0to your CLI or SDK calls.Documentation for each parameter is available in the\u00a0<a href=\"http:\/\/slurm.schedmd.com\/slurm.conf.html\" rel=\"nofollow noopener\" target=\"_blank\">official Slurm manual<\/a>.<\/p>\n<p>The expanded support for custom Slurm settings in PCS represents a major step forward for cloud-based HPC. Customers can now bring the same sophistication and nuance from their on-premises clusters into AWS, while gaining the elasticity and operational advantages of a managed service. By exposing dozens of additional parameters and extending them to queues, PCS gives HPC administrators the fine-grained control they need to enforce policies, allocate resources intelligently, and deliver results faster.<\/p>\n<p>If you\u2019re completely new to AWS Parallel Computing Service, you can get started really quickly \u2013 and with virtually no learning curve \u2013 by using the <a class=\"c-link\" href=\"https:\/\/us-east-2.console.aws.amazon.com\/pcs\/home?region=us-east-2#\/quick-launch\" target=\"_blank\" rel=\"noopener noreferrer nofollow\" data-stringify-link=\"https:\/\/us-east-2.console.aws.amazon.com\/pcs\/home?region=us-east-2#\/quick-launch\" data-sk=\"tooltip_parent\">console quick launch<\/a>, or using one of the\u00a0<a class=\"c-link\" href=\"https:\/\/github.com\/aws-samples\/aws-hpc-recipes\/tree\/main\/recipes#arrow_right-pcs-aws-parallel-computing-service\" target=\"_blank\" rel=\"noopener noreferrer nofollow\" data-stringify-link=\"https:\/\/github.com\/aws-samples\/aws-hpc-recipes\/tree\/main\/recipes#arrow_right-pcs-aws-parallel-computing-service\" data-sk=\"tooltip_parent\">one-click launch recipes for PCS<\/a>\u00a0from the\u00a0<a class=\"c-link\" href=\"https:\/\/github.com\/aws-samples\/aws-hpc-recipes\/tree\/main\" target=\"_blank\" rel=\"noopener noreferrer nofollow\" data-stringify-link=\"https:\/\/github.com\/aws-samples\/aws-hpc-recipes\/tree\/main\" data-sk=\"tooltip_parent\">HPC Recipes Library<\/a>\u00a0on GitHub.<\/p>\n<p><a href=\"https:\/\/aws.amazon.com\/blogs\/hpc\/announcing-expanded-support-for-custom-slurm-settings-in-aws-parallel-computing-service\/\" rel=\"nofollow noopener\" target=\"_blank\">Source<\/a>: Brendan Bouffler, AWS<\/p>\n","protected":false},"excerpt":{"rendered":"Oct. 6, 2025 \u2014 AWS today announced expanded support for custom Slurm settings in AWS Parallel Computing Service&hellip;\n","protected":false},"author":2,"featured_media":206095,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[46],"tags":[191,74],"class_list":{"0":"post-206094","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-computing","8":"tag-computing","9":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/206094","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/comments?post=206094"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/206094\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media\/206095"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media?parent=206094"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/categories?post=206094"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/tags?post=206094"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}