Skip to content

Using Slurm to Submit Jobs to a Cluster

What is Slurm?

Slurm is a job scheduler that coordinates computational resources across a cluster by acting as the central authority for resource distribution. This ensures every user receives dedicated access to hardware without interfering with other active processes.

The scheduler serves as the system logic that manages the queue, prioritizing tasks based on predefined policies and hardware availability. This framework is used to evenly distribute resources among many individual users while maintaining the flexibility to aggregate those same resources for much larger computational tasks.

Submitting Jobs


There are two commands to submit jobs via slurm sbatch and srun

sbatch is used for submitting batch jobs, which are non-interactive. The sbatch command requires writing a job script to use in job submission. When invoked, sbatch creates a job allocation (resources such as nodes and processors) before running the commands specified in the job script.

srun is used for starting interactive sessions or job steps. An interactive job is a job that returns a command line prompt (instead of running a script) when the job runs. Interactive jobs are useful when debugging or interacting with an application.

sbatch


To use sbatch you will first need to make your script - myjob.sh

[username@em-viz ~]$ vi myjob.sh

#!/bin/bash

# Name of the job
#SBATCH --job-name=sbgrid_job

# Number of compute nodes
#SBATCH --nodes=1

# Number of cores, in this case one
#SBATCH --ntasks-per-node=1

# Request the GPU partition/Queue you have permission to use
#SBATCH --partition=partition_name

# Request the GPU resources
#SBATCH --gres=gpu:2


# This is where your program runs:
sbgrid-program --input protein.pdb --out results/
All of the lines that begin with a #SBATCH are directives to Slurm. The meaning of the directives in the sample script are exampled in a comment line that precedes the directive.

srun


srun -n 1 -N 1 -c 4 -p partition-name --gres=gpu:1 sbgrid-program

flag full name funciton
-n 1 --ntasks Requests 1 task (process) to be launched
-N 1 --nodes Requests that the task stays on 1 physical machine
-c 4 --cpus-per-task Allocates 4 CPU cores to that single task
-p partition-name --partition Sends the job to a specific queue
--gres=gpu:1 --gres Generic Resource Scheduling, this specifically asks for 1 GPU card

Helpful commands for managing your job

command usage funciton
sbatch sbatch <job script> Submit a batch job to the queue
squeue squeue Show status of Slurm batch jobs
scancel scancel *JOBID* Cancel job
sinfo sinfo Show information about partitions
scontrol scontrol show job *JOBID* Check the status of a running or idle job