Using Slurm to Submit Jobs to a Cluster¶
What is Slurm?¶
Slurm is a job scheduler that coordinates computational resources across a cluster by acting as the central authority for resource distribution. This ensures every user receives dedicated access to hardware without interfering with other active processes.
The scheduler serves as the system logic that manages the queue, prioritizing tasks based on predefined policies and hardware availability. This framework is used to evenly distribute resources among many individual users while maintaining the flexibility to aggregate those same resources for much larger computational tasks.
Submitting Jobs¶
There are two commands to submit jobs via slurm sbatch and srun
sbatch is used for submitting batch jobs, which are non-interactive. The sbatch command requires writing a job script to use in job submission. When invoked, sbatch creates a job allocation (resources such as nodes and processors) before running the commands specified in the job script.
srun is used for starting interactive sessions or job steps. An interactive job is a job that returns a command line prompt (instead of running a script) when the job runs. Interactive jobs are useful when debugging or interacting with an application.
sbatch¶
To use sbatch you will first need to make your script - myjob.sh
[username@em-viz ~]$ vi myjob.sh
#!/bin/bash
# Name of the job
#SBATCH --job-name=sbgrid_job
# Number of compute nodes
#SBATCH --nodes=1
# Number of cores, in this case one
#SBATCH --ntasks-per-node=1
# Request the GPU partition/Queue you have permission to use
#SBATCH --partition=partition_name
# Request the GPU resources
#SBATCH --gres=gpu:2
# This is where your program runs:
sbgrid-program --input protein.pdb --out results/
srun¶
srun -n 1 -N 1 -c 4 -p partition-name --gres=gpu:1 sbgrid-program
| flag | full name | funciton |
|---|---|---|
| -n 1 | --ntasks | Requests 1 task (process) to be launched |
| -N 1 | --nodes | Requests that the task stays on 1 physical machine |
| -c 4 | --cpus-per-task | Allocates 4 CPU cores to that single task |
| -p partition-name | --partition | Sends the job to a specific queue |
| --gres=gpu:1 | --gres | Generic Resource Scheduling, this specifically asks for 1 GPU card |
Helpful commands for managing your job¶
| command | usage | funciton |
|---|---|---|
| sbatch | sbatch <job script> |
Submit a batch job to the queue |
| squeue | squeue |
Show status of Slurm batch jobs |
| scancel | scancel *JOBID* |
Cancel job |
| sinfo | sinfo |
Show information about partitions |
| scontrol | scontrol show job *JOBID* |
Check the status of a running or idle job |