Welcome to the Google Codelab for creating two federated Slurm clusters on Google Cloud Platform! By the end of this cloud lab you should have a solid understanding of the ease of provisioning and configuring Slurm clusters with federation, and submitting jobs between clusters.
Google Cloud teamed up with SchedMD to release a set of tools that make it easier to launch the Slurm workload manager on Compute Engine, and to expand your existing cluster when you need extra resources. This integration was built by the experts at SchedMD in accordance with Slurm best practices.
If you're planning on using the Slurm on Google Cloud Platform integrations, or if you have any questions, please consider joining our Google Cloud & Slurm Community Discussion Group!
Basic architectural diagram of two federated Slurm Clusters in Google Cloud Platform.
Slurm is one of the leading workload managers for HPC clusters around the world. Slurm provides an open-source, fault-tolerant, and highly-scalable workload management and job scheduling system for small and large Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions:
1. It allocates exclusive or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work.
2. It provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes.
3. It arbitrates contention for resources by managing a queue of pending work.
Before proceeding with this codelab you must first complete the codelab "Deploy an Auto-Scaling HPC Cluster with Slurm". Do not delete the deployment or the project after completing the codelab.
Once you have completed that codelab you should have a Google Cloud Platform based cluster including a controller node, login node, and possibly some number of compute nodes. This cluster should be SSH-accessible, and the Slurm toolset should be working correctly.
Many users have existing clusters in their environment, either on-premise with user-managed hardware, or already running in Google Cloud Platform. In these cases many users want to supplement their existing resources with burstable cloud-based nodes, possibly in different regions, and with different hardware or software configurations. This use case is solved by using Slurm's federation capabilities alongside the new Slurm Auto-Scaling capabilities in Google Cloud Platform.
In order to test Slurm's federation capabilities in Google Cloud Platform we'll need to set up another Slurm cluster to federate to.
Please set up a new project and deploy a new Slurm cluster according to the instructions beginning in the "Setup" section of the "Deploy an Auto-Scaling HPC Cluster with Slurm" codelab. This second Slurm cluster must have a different slurm-network subnet range and Slurm cluster name than the first Slurm cluster's deployment.
In this lab we assume:
on-prem Cluster | 10.10.0.0/16 |
gcp Cluster | 10.20.0.0/16 |
Once you have a second project with a second "gcp" Slurm cluster deployed through deployment manager, we can set up the networking in preparation for federating the on-prem and gcp clusters.
We must first open ports using Firewall Rules on both projects.
First, open ports on the on-prem project so that the gcp project can communicate with the on-prem's slurmctld (tcp:6817) and slurmdbd (tcp:6819).
Name | slurm |
Network | slurm-network |
Priority | 1000 |
Direction of traffic | Ingress |
Action to match | Allow |
Targets | Specified target tags |
Target tags | controller |
Source Filter | IP ranges |
Source IP Ranges | 10.20.0.0/16 |
Second source filter | none |
Protocols and ports | tcp:6817,6819 |
We'll follow a similar process for the "gcp" project.
First, open ports on the gcp project so that the gcp project can communicate with the on-prem's slurmctld (tcp:6817).
Name | slurm |
Network | slurm-network |
Priority | 1000 |
Direction of traffic | Ingress |
Action to match | Allow |
Targets | Specified target tags |
Target tags | controller |
Source Filter | IP ranges |
Source IP Ranges | 10.10.0.0/16 |
Second source filter | none |
Protocols and ports | tcp:6817 |
Optionally, if you require the ability to execute cross-cluster srun jobs, or cross-cluster interactive jobs then ports need to be opened for srun to be able to communicate with the slurmd's on the gcp cluster. Ports also need to be opened for the slurmd's to be able to talk back to the login nodes on the gcp cluster. During these operations srun opens several ephemeral ports for communications. It's recommended to define which ports srun can use when using a firewall. This is done by defining SrunPortRange=<IP Range> in both slurm.conf files.
SrunPortRange=60001-63000
Then we need to configure our firewall rules.
Name | slurmd |
Network | slurm-network |
Priority | 1000 |
Direction of traffic | Ingress |
Action to match | Allow |
Targets | Specified target tags |
Target tags | compute |
Source Filter | IP ranges |
Source IP Ranges | 10.20.0.0/16 |
Second source filter | none |
Protocols and ports | tcp:6818 |
Name | srun |
Network | slurm-network |
Priority | 1000 |
Direction of traffic | Ingress |
Action to match | Allow |
Targets | Specified target tags |
Target tags | compute |
Source Filter | IP ranges |
Source IP Ranges | 10.20.0.0/16 |
Second source filter | none |
Protocols and ports | tcp:60001-63000 |
Next let's set up the VPN to connect our two projects. We'll follow the steps taken from the public Google Cloud VPN documentation:
Confirm that the VPN shows as "Established" in the Google Cloud Console VPN page. If the VPN connection does not successfully establish, see the VPN Troubleshooting page.
In order to federate between the two clusters we need to configure Slurm cluster accounting on our gcp cluster to report to our on-prem Slurm controller.
For more information about Slurm accounting, see the Slurm Accounting page.
First, we must set both cluster's slurm.conf files to point to the on-prem's controller as the "Accounting Storage Host".
Log in to both cluster's "login1" node. In the on-prem cluster execute the following command to retrieve the on-prem's controller node IP.
sudo -i sacctmgr show clusters format=cluster,controlhost,controlport
You should see the following output where 10.10.0.4 is the IP of our on-prem's controller:
Cluster ControlHost ControlPort ---------- --------------- ------------ on-prem 10.10.0.4 6817
Copy this IP address to your clipboard for use in our next step.
On the gcp cluster's login1 node, open the slurm.conf using your preferred text editor:
sudo vim /apps/slurm/current/etc/slurm.conf
Edit the AccountStorageHost entry to match:
AccountingStorageHost=<IP of on-prem's controller instance>
Now that we have the gcp cluster pointing to the on-prem cluster we can add the cluster to the on-prem's Slurm DB, and set up user accounts.
On the on-prem cluster's login1 node, execute the following command to add the gcp cluster to the on-prem:
sudo -i sacctmgr add cluster gcp
Next, run these commands with your cluster name and user to configure your account on the gcp cluster:
sudo -i sacctmgr add account=default cluster=gcp sudo -i sacctmgr add user <user> account=default cluster=gcp
Finally, we must restart the Slurm controller daemon on both clusters. SSH into the controller node of both clusters. You may do this either by using the "SSH" button next to the controller node in Google Cloud Console, or by setting up SSH keys from the login1 node by taking advantage of the common /home folder:
ssh-keygen -q -f ~/.ssh/id_rsa -N "" cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ssh controller
Once logged into both controller nodes, execute the following command on both nodes to restart slurmctld:
sudo systemctl restart slurmctld
Within a few minutes, the two clusters will report in and will appear in the Slurm accounting. Execute the following command to list clusters:
sudo -i sacctmgr show clusters format=cluster,controlhost,controlport
The output should contain both clusters and report IPs for each:
Cluster ControlHost ControlPort ---------- --------------- ------------ on-prem 10.10.0.4 6817 gcp 10.20.0.5 6817
If the IPs don't populate immediately please allow a few minutes for the clusters to report in. If you believe there's an issue, please review the Slurm Troubleshooting Guide.
Log out of the controller nodes, and back into the login1 nodes to continue the codelab.
Now that we have our Slurm cluster accounting configured, we can create the Slurm federation.
For more information about Slurm federation, see the Slurm Federation page.
We will use the name "cloudburst" for our federation. On the on-prem gcp's login1 node execute the following command:
sudo -i sacctmgr add federation cloudburst clusters=on-prem,gcp
You've now created a Slurm federation capable of bursting to Google Cloud Platform! Let's view the configration to verify it was created correctly:
sudo -i sacctmgr show federation
The output should appear as:
Federation Cluster ID Features FedState ---------- ---------- -- -------------------- ------------ cloudburst gcp 2 ACTIVE cloudburst on-prem 1 ACTIVE
Now that we have a federation set up between our on-prem and gcp clusters we can submit jobs that are federated (distributed) across clusters according to whichever cluster is capable of responding to the job request fastest.
To see the resources available in the federation, run sinfo with the --federation flag:
sinfo --federation
The output should appear as:
PARTITION CLUSTER AVAIL TIMELIMIT NODES STATE NODELIST debug* on-prem up infinite 8 idle~ compute[3-10] debug* gcp up infinite 18 idle~ compute[3-20] debug* on-prem up infinite 2 idle compute[1-2] debug* gcp up infinite 2 idle compute[1-2]
Let's also check the queues on both of our clusters using squeue:
squeue --federation
You should see the following output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
We can see that the queue is empty, and that all nodes are listed as "idle" or "idle~". The "idle" state signifies that the node up online and idle, ready to have jobs allocated to it. The "idle~" state signifies that the node is offline and does not yet exist in the cloud, but that it could be created if necessary to meet demand.
Now that our cluster federation is ready, let's go ahead and submit a job.
First, we can submit a job specifically to a cluster we chose using the -M flag in sbatch:
sbatch -M gcp hostname_batch
You can now run squeue to check the status of the clusters. The -M flag works in squeue as well as sbatch and sinfo. Let's run squeue with the --federation flag, and a few other options:
squeue --federation -O jobid,username,timeused,cluster,numnodes,nodelist
JOBID USER TIME CLUSTER NODES NODELIST 134217737 user 0:04 gcp 4 compute[1-4]
You'll notice that the gcp cluster was allocated this job, and there are four nodes allocated to the job.
Now let's check sinfo to verify that the gcp cluster is spinning up the necessary nodes to complete the job.
PARTITION CLUSTER AVAIL TIMELIMIT NODES STATE NODELIST debug* on-prem up infinite 8 idle~ compute[3-10] debug* gcp up infinite 16 idle~ compute[5-20] debug* on-prem up infinite 2 idle compute[1-2] debug* gcp up infinite 4 idle compute[1-4]
We can also allow Slurm's federation to allocate the jobs to whichever cluster is able to respond the fastest by simply submitting the job to the federation we're now part of:
sbatch hostname_batch
Once the job is submitted check squeue to see where the job was allocated:
squeue --federation -O jobid,username,timeused,cluster,numnodes,nodelist
JOBID USER TIME CLUSTER NODES NODELIST 134217738 user 0:20 on-prem 4 compute[1-4]
Submit the job once or twice more to see where Slurm places the job. It may allocate jobs to the on-prem cluster the first or second time, but then it will begin distributing the jobs evenly.
Congratulations, you've created a federated Slurm cluster out of two independent Slurm Clusters, and federated jobs between two auto-scaling clusters! You can use this technique in a variety of environments, including between an existing on-premise cluster and a Google Cloud auto-scaled cluster. Furthermore, you could have multiple clusters with each tailored to any given workload and resource profile in a single federation. You could then use Slurm accounting to assign users to cloud-specific partitions, or you might use cluster specification on a per-job basis. All this while transparently allowing users to continue submitting jobs through the same workflow they're used to, logging into the same Slurm cluster.
Try testing some more interesting code like a Prime Number Generator, the OSU MPI Benchmarks, or your own code! To learn more about how to customize the code for your usage, and how to run your workloads most affordably, contact the Google Cloud team today through Google Cloud's High Performance Computing Solutions website!
Congratulations, you've created two Slurm clusters on Google Cloud Platform and used its latest features to auto-scale your clusters and federate jobs to meet workload demand! You can use this model to run any variety of jobs, and it scales to hundreds of instances in minutes using just one command.
Are you building something cool using Slurm's new GCP-native functionality? Have questions? Have a feature suggestion? Reach out to the Google Cloud team today through Google Cloud's High Performance Computing Solutions website, or chat with us in the Google Cloud & Slurm Discussion Group!
Logout of the slurm nodes:
exit
Let any auto-scaled nodes scale down before deleting the deployment. You can also delete these nodes manually using "gcloud compute instances delete computeN computeN+1 ...".
You can easily clean up the deployments after we're done by executing the following command from your Google Cloud Shell, after logging out of login1:
gcloud deployment-manager deployments delete slurm
When prompted, type Y to continue. This operation can take some time, please be patient.
To cleanup, we simply delete our project.
If you need support using these integrations in testing or production environments please contact SchedMD directly using their contact page here: https://www.schedmd.com/contact.php
You may also use SchedMD's Troubleshooting guide here: https://slurm.schedmd.com/troubleshoot.html
Finally you may also post your question to the Google Cloud & Slurm Discussion Group found here: https://groups.google.com/forum/#!forum/google-cloud-slurm-discuss
Please submit feedback about this codelab using this link. Feedback takes less than 5 minutes to complete. Thank you!