SLURM

The tool we use to manage the submission, scheduling and management of jobs in HPC2021 and AI-Research is called SLURM. On a login node, user writes a batch script and submit it to the queue manager to schedule for execution in the compute nodes. The submitted job then queue up until the requested system resources is allocated. The queue manager will schedule a job to run on the queue (or partition in SLURM) according to a predetermined site policy designated to balance competing user needs and to maximize efficient use of cluster resources.

Each job’s position in the queue is determined through the fairshare algorithm, which depends on a number of factors (e.g. size of job, time requirement, job queuing time, resource usage in previous month etc). The HPC system is set up to support large computation jobs. Maximum CPUs and processing time limits are summarized in the tables below. Please note that the limits are subject to change without notice.


Partitions

A partition is a set of compute nodes grouped logically based on their hardware features. The table below shows the available partitions and their properties / features in HPC2021 and AI-Research systems respectively.

For AI-Research System
Partition Default / Max Job duration # of nodes cores per node RAM(GB) per node RAM(GB) per core Features
debug 4 Days 1 256 1024 4 EPYC7742
For HPC2021 System
Partition Default / Max Job duration # of nodes cores per node RAM(GB) per node RAM(GB) per core Features
intel (default) 1 Day / 1 Week 84 32 192 6 GOLD6626R
amd 1 Day / 1 Week 28 64 256 4 EPYC7542
28 128 512 4 EPYC7742
gpu 1 Day / 1 Week 4 32 384 12 4x V100
3 32 384 12 8x V100
hugemem 1 Day / 1 Week 2 128 2048 16 EPYC7742 + 2TB RAM

Quality of Service (QoS)

Each QoS is assigned a set of limits to be applied to the job, dictating the limit in the resources and partitions that a job is entitled to request. The table below shows the available QoS in HPC2021 and their allowed partitions / resources limits.

For AI-Research System
QoS Supported Partition(s) Max Job Duration Max Resources per job
debug (default) debug 4 days
For HPC2021 System
QoS Supported Partition(s) Max Job Duration Max Resources per job
debug intel, amd, gpu 30min 2 nodes, 2 GPUs
normal (default) intel, amd 1 Week 1024 cores
long intel, amd 2 Weeks 1 node
^ special intel, amd 1 Day 2048 cores
^ gpu gpu 1 Week 1 node, 4 GPUs
^ hugemem hugemem 1 Week 1 node, 2TB RAM

^ Require special approval

Users are advised to specify a suitable QoS depending on the job’s requirement.

  • For those jobs supporting parallel computing that utilizes computing resources across multiple nodes (e.g. via MPI) , then the “normal” QoS is a desirable one as the job may request for a handful of CPU cores.
  • For those serial jobs or multi-threaded (OpenMP) jobs that can only be executed on a single node and it is expected to take a longer running time, then the “long” QoS is a more preferable one as the job may request for a node with a longer job duration (up to two weeks).

Job script

To execute a program in the cluster system, a user has to write a batch script and submit it to the SLURM job scheduler. Sample of general SLURM scripts are located in each user’s hpc2021 home directory ~/slurm-samples and user guide for individual software can be referenced.

Sample job script

In the example job script (script.cmd) below, it requests the following resources and actual programs/commands to be executed.

  1. Name the job as “pilot_study” for easy reference.
  2. Request for notifications to be emailed to tmchan@hku.hk when job starts, ends or fails.
  3. Request for the “amd” partition (i.e. general compute nodes with AMD CPUs).
  4. Request for “normal” QoS.
  5. Request for allocation of 64 CPU cores contributed from a total of one compute node.
  6. Request for 10GB physical RAM.
  7. Request for job execution 3 days and 10 hours ( The job would be terminated by SLURM after the specified amount of time no matter it has finished or not).
  8. Write the standard output and standard error to the file “pilot_study_2021.out” and “pilot_study_2021.err” respectively under the folder where the job is submitted. The path supports the use of replacement symbols.
#!/bin/bash
#SBATCH --job-name=pilot_study        # 1. Job name
#SBATCH --mail-type=BEGIN,END,FAIL    # 2. Send email upon events (Options: NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=tmchan@hku.hk     #    Email address to receive notification
#SBATCH --partition=amd               # 3. Request a partition
#SBATCH --qos=normal                  # 4. Request a QoS
#SBATCH --ntasks=64                   # 5. Request total number of tasks (MPI workers)
#SBATCH --nodes=1                     #    Request number of node(s)
#SBATCH --mem=10G                     # 6. Request total amount of RAM
#SBATCH --time=3-10:00:00             # 7. Job execution duration limit day-hour:min:sec
#SBATCH --output=%x_%j.out            # 8. Standard output log as $job_name_$job_id.out
#SBATCH --error=%x_%j.err             #    Standard error log as $job_name_$job_id.err
 
# print the start time
date
command1 ...
command2 ...
command3 ...
# print the end time
date
Running Serial / Single Threaded Jobs using a CPU on a node

Serial or single CPU core jobs are those jobs that can only make use of one CPU on a node. A SLURM batch script below will request for a single CPU on a node with default amount of RAM (i.e. 3GB) for 30 minutes in default partition (i.e. “intel).

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --time=00:30:00
command1...
Running Multi-threaded Jobs using multiple CPU cores on a node

For those jobs that can leverage multiple CPU cores on a node via creating multiple threads within a process (e.g. OpenMP), a SLURM batch script below may be used that requests for allocation to a task with 8 CPU cores on a single node and 6GB RAM per core (Totally 6GB x 8 = 48GB RAM on a node ) for 1 hour in default partition (i.e. “intel) and default qos (i.e. “normal”).

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=6G
#SBATCH --time=01:00:00

# For jobs supporting OpenMP, assign the value of the requested CPU cores to the OMP_NUM_THREADS variable
# that would be automatically passed to your command supporting OpenMP
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
command1 ...

# For jobs not supporting OpenMP, supply the value of the requested CPU cores as command-line argument to the command
command2 -t ${SLURM_CPUS_PER_TASK} ...
Running MPI jobs using multiple nodes

Message Passing Interface (MPI) is a standardized and portable message-passing standard designed to allow for execution of programs using CPUs on multiple nodes where CPUs across nodes communicate over the network. The MPI standard defines the syntax and semantics of library routines that are useful to a wide range of users writing portable message-passing programs in C, C++, and Fortran. Intel MPI and OpenMPI are available in HPC2021 system and SLURM jobs may make use of either MPI implementations.

❗Requesting for multiple nodes and /or loading any MPI modules may not necessarily make your code faster, your code must be MPI aware to use MPI. Even though running a non-MPI code with mpirun might possibly succeed, you will most likely have every core assigned to your job running the exact computation, duplicating each others work, and wasting resources.

The version of the MPI commands you run must match the version of the MPI library used in compiling your code, or your job is likely to fail. And the version of the MPI daemons started on all the nodes for your job must also match. For example, an MPI program compiled with Intel MPI compilers should be executed using Intel MPI runtime instead of Open MPI runtime.

A SLURM batch script below requests for allocation to 64 tasks (MPI processes) each use a single core from two nodes and 3GB RAM per core for 1 hour in default partition (i.e. “intel) and default qos (i.e. “normal”).

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks=64
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=3G
#SBATCH --time=01:00:00

cd ${SLURM_SUBMIT DIR}
# Load the environment for Intel MPI
module load impi/2021.4

# run the program supporting MPI with the "mpirun" command
# The -n option is not required since mpirun will automatically determine from SLURM settings
mpirun ./program_mpi

This example make use of all the cores on two, 32-core nodes in the “intel” partition. If same number of tasks (i.e. 64) is requested from partition “amd”, you should set “--nodes=1” so that all 64 cores will be allocated from a single AMD (64-core or 128-core) node . Otherwise, SLURM will assign 64 CPUs from 2 compute nodes which would induce unnecessary inter-node communication overhead.

 

Running hybrid OpenMP/MPI jobs using multiple nodes

For those jobs that support both OpenMP and MPI, a SLURM batch script may specify the number of MPI tasks to run and the number of CPU core that each task should use. A SLURM batch script below requests for allocation of 2 nodes and 64 CPU cores in total for 1 hour in default partition (i.e. “intel) and default qos (i.e. “normal”). Each compute node runs 2 MPI tasks, where each MPI task uses 16 CPU core and each core uses  3GB RAM. This would make use of all the cores on two, 32-core nodes in the “intel” partition.

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=16
#SBATCH --mem-per-cpu=3G
#SBATCH --time=01:00:00

cd ${SLURM_SUBMIT_DIR}
# Load the environment for Intel MPI
module load impi/2021.4

# assign the value of the requested CPU cores per task to the OMP_NUM_THREADS variable
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# run the program supporting MPI with the "mpirun" command.
# The -n option is not required since mpirun will automatically determine from SLURM settings
mpirun ./program_mpi-omp

Sample MPI & hybrid MPI/OpenMP codes and the corresponding SLURM scripts are available at  user home directory ~/slurm-samples/demo-MPI/.

 

Running jobs using GPU

A SLURM batch script below request for 8 CPU cores and 2 GPU cards from one compute node in the “gpu” partition using “gpu” qos.
❗Your code must be GPU aware to benefit from nodes with GPU, otherwise other partition without GPU should be used.

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --nodes=1
#SBATCH --partition=gpu
#SBATCH --qos=gpu
#SBATCH --gres=gpu:2

# Load the environment module for Nvidia CUDA
module load cuda
gpu_program gpu=y ...
Running jobs requiring large amount of RAM

A SLURM batch script below request for 64 CPU cores (default one CPU core per task) on single node with AMD EPYC 7742 CPU in the “amd” partition and a total of 300GB RAM.  This would make use of 64 cores on a 128-core AMD node. If “--nodes=1" is not defined, 64 cores may be assigned from separate compute nodes that may result in performance drop due to (MPI) inter-node communication overhead or unused CPUs if program do not support multi-node parallelization.

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=64
#SBATCH --partition=amd
#SBATCH --mem=300G
#SBATCH --constraint="CPU_SKU:7742"

When the requested resources are not available and/or the limits in the QoS are exceeded, a submitted job is put in pending state. By the time a pending job is eligible to execute in the cluster, SLURM will allocate the requested resources to the job for the duration of the requested wall time and any commands  put after the last SLURM directives (#SBATCH) in the script file will be executed.


Job Directives

A SLURM script includes a list of SLURM job directives at the top of the file, where each line starts with #SBATCH followed by option name to value pairs to tell the job scheduler the resources that a job requests.

Long Option Short Option Default value Description
--job-name -J file name of job script User defined name to identify a job
--partition -p intel Partition where a job to be executed
--time -t 24:00:00 Specify a limit on the maximum execution time (walltime) for the job (D-HH:MM:SS) .
For example, -t 1- is one day, -t 6:00:00 is 6 hours
--nodes -N Total number of node(s)
--ntasks -n 1 Number of tasks (MPI workers)
--ntasks-per-node Number of tasks per node
--cpus-per-task -c 1 Number of CPUs required per task
--mem Amount of memory allocated per node. Different units can be specified using the suffix [K|M|G|T]
--mem-per-cpu 3G Amount of memory allocated per cpu per code (For multicore jobs). Different units can be specified using the suffix [K|M|G|T]
--constraint -C Nodes with requested features. Multiple constraints may be specified with AND, OR, Matching OR. For example, --constraint="CPU_MNF:AMD", --constraint="CPU_MNF:INTEL&CPU_GEN:CLX"
--exclude -x Explicitly exclude certain nodes from the resources granted to the job.  For example, --exclude=SPG-2-[1-3], --exclude=SPG-2-1,SPG-2-2,SPG-2-3

More SLURM directives are available here.


Job Submission

To submit a batch job for later execution, run the sbatch command followed by the path to the script file on a login node.

$ sbatch script.cmd
submitted batch 2021

Upon successful submission of a job, a unique job ID ( i.e. 2021 in this example) is assigned by SLURM, which may be referred to for job management such as job status checking and cancellation.

By default, SLURM directs both standard output and standard error during job execution to a single file, named slurm-%j.out and slurm-%A_%a.out for standalone and array jobs respectively. The default path may be override with –output=%x_%j.out and –error=%x_%j.err for path to standard output and standard error respectively. User may compose a path using any combination of replacement symbols (% followed by a letter).

Replacement symbols

Replacement Symbol Description
%A Job array’s master job allocation number
%a Job array ID (index) number
%J JobID.stepid of the running job (e.g. “128.0”)
%j JobID of the running job
%x Job name

Besides putting SLURM directives inside a script file, they may be supplied to job submission commands like sbatch, srun and salloc as command-line arguments, which will take precedence over any specified values inside a script file.

e.g. Request 1 compute node with 32 cores per task and 4GB RAM, where any specified value for -N, -c, and --mem inside script.cmd are ignored.

$ sbatch -N1 -c32 --mem=4G script.cmd

Job Management

After a job is submitted to SLURM, user may check the job status with commands sq or showq as described below.

Show any running/pending jobs
$ sq
JOBID PARTITION NAME   ST USER     QOS   NODES CPUS TRES_PER_NODE TIME_LIMIT TIME_LEFT NODELIST(REASON) 
123   intel     test1  R  hku_user normal 1     32   N/A          4-00:00:00 3-21:21:20 GPI-1-19 
124   gpu       para_g R  hku_user gpu    1     8   gpu:2         4-00:00:00 3-21:29:39 SPG-1-1
Show specific job, sq -j <JobID>
$ sq -j 123456
Show jobs in a specific partition, sq -p <partition>
$ sq -p intel
Show running job
$ sq -t R
Show pending job
$ sq -t PD
Job information provided
  • JOBID: Job ID
  • PARTITION: Partition
  • NAME: Job name given
  • ST (status):
Status Description
R Running
PD Pending (queuing)
CD Completed (exit code 0 — without error)
F Failure (exit code non-zero)
DL Failure (job terminated on deadline)
  • NODES: Number of nodes requested
  • CPUs: Number of CPUs requested
  • TRES_PER_NODE: Resources
  • TIME_TIME: Requested wall time
  • TIME_LEFT: Remaining wall time
  • NODELIST: List of the nodes which the job is using
  • NODELIST(REASON): Show the reason that explain the current job status
Reason Description
Priority The job is waiting for higher priority job(s) to complete
Dependency The job is waiting for a dependent job to complete
Resources The job is waiting for resources to become available
InvalidQoS The job’s QoS is invalid. Cancel it and rerun with correct QoS
QOSGrpMaxJobsLimit Maximum number of jobs for your job’s QoS are in use
PartitionCpuLimit All CPU assigned to your jobs’ specified partition are in use
PartitionMaxJobsLimit Maximum number of jobs for your job’s specified partition are have been met
$ showq
SUMMARY OF JOBS FOR USER: <hku_user> 
ACTIVE JOBS-------------------- 
JOBID     JOBNAME   USERNAME     STATE   CORE   NODE QUEUE         REMAINING STARTTIME 
=================================================================================================== 
10721     hpl       hku_user     Running 64     2   intel           2:06:56 Mon Aug  9 17:50:21 
WAITING JOBS------------------------ 
JOBID     JOBNAME   USERNAME     STATE   CORE HOST QUEUE           WCLIMIT QUEUETIME
=================================================================================================== 
Total Jobs: 1     Active Jobs: 1     Idle Jobs: 0     Blocked Jobs: 0

Show resource usage

After a job is submitted to SLURM, user may check a list of current jobs’ CPU/RAM/GPU usage (updated every minute)  with commands showjob as described below.

$ showjob -h
Usage: showjob [OPTIONS]
                -x    show finished job(s) in last x day
                      default is to show running/pending job(s)

                -p    comma separated list of partitions to view
                      default is any partitions

                -j    comma separated list of jobs IDs
                      default is any job

                -s    filter jobs with specific state
                          CA  CANCELLED
                          CD  COMPLETED
                          CF  CONFIGURING
                          CG  COMPLETING
                          DL  DEADLINE
                          F   FAILED
                          OOM OUT_OF_MEMORY
                          PD  PENDING
                          R   RUNNING
                          ST  STOPPED
                          S   SUSPENDED
                          TO  TIMEOUT
                      default is any state

                -w    display jobs on any of these nodes
                      default is any node

In the example below, job #20220 was using roughly the same amount of CPU hour and RAM across each of the 4 allocated nodes, meaning that job exercised fairly good parallel processing. Otherwise, only one or a few allocated nodes would be working and other nodes would be idle. On the other hand, only ~14% of the requested amount of RAM were utilized and user may take the Peak RAM uasge as a reference value for request for memory in subsequent submission of similar jobs.

$ showjob
 Job ID: 20220                                     Sat Jan 1 01:00:00 HKT 2022
╒═════════════════╤════════════════════════════════════════╤══════════════════╕
│ User: tmchan    │ Name: sim-2p                           │ State: RUNNING   │
│  QoS: normal    │ Partition: intel                       │ Priority: 17076  │
╞═════════════════╪════════════════════╤═══════════════════╧══════════════════╡
│     Resource    │           Requests │ Usage                                │
├─────────────────┼────────────────────┼──────────────────────────────────────┤
│         Node    │                  3 │ GPI-1-[4,7],GPI-4-10                 │
│          CPU    │                 96 │ 99.38%                               │
│          RAM    │          281.25 GB │ 14.23%                               │
│    Wall time    │         4-00:00:00 │ 1-16:26:22                           │
│          GPU    │                N/A │ N/A                                  │
├─────────────────┼────────────────────┼──────────────────────────────────────┤
│ Per Node Usage  │   CPU hour Usage   │             RAM Usage (GB)           │
│                 │      Up to Now     │         Now            Peak          │
├─────────────────┼────────────────────┼──────────────────────────────────────┤
│     GPI-1-4     │     1285.983       │      15.103 GB       18.998 GB       │
│     GPI-1-7     │     1286.141       │      12.429 GB       16.328 GB       │
│     GPI-4-10    │     1285.903       │      12.490 GB       16.385 GB       │
└─────────────────┴────────────────────┴──────────────────────────────────────┘

For jobs requested GPU, the GPU ID and the GPU/RAM usage would be shown like below. As not all applications may scale the workload across multiple GPUs, taking a look at the GPU Usage below may give user insight into the actual GPU utilization. In this example, only GPU 0 was working at 100% but other GPUs (GPU 1-3) were being idle, which might suggest that revision of job resource requests or tunning in application parameters are warranted. It is advisable to check the usage from time to time as resource usage may fluctuate over the course of job execution.

$ showjob
Job ID: 20221                                      Sat Jan 1 01:01:00 HKT 2022
╒═════════════════╤════════════════════════════════════════╤══════════════════╕
│ User: tmchan    │ Name: gpu-sim                          │ State: RUNNING   │
│  QoS: gpu       │ Partition: gpu                         │ Priority: 15441  │
╞═════════════════╪════════════════════╤═══════════════════╧══════════════════╡
│     Resource    │           Requests │ Usage                                │
├─────────────────┼────────────────────┼──────────────────────────────────────┤
│         Node    │                  1 │ SPG-1-4                              │
│          CPU    │                 32 │ 12.47%                               │
│          RAM    │           93.75 GB │ 6.97%                                │
│    Wall time    │         7-00:00:00 │ 4-06:52:53                           │
├─────────────────┼────────────────────┼──────────────────────┬───────┬───────┤
│          GPU    │         4(IDX:0-3) │ GPU Card             │  GPU% │  RAM  │
│                 ├────────────────────┼──────────────────────┴───────┴───────┤
│                 │              GPU-0 │Tesla-V100-SXM2-32GB     100 %  30 GB │
│                 │              GPU-1 │Tesla-V100-SXM2-32GB       0 %      0 │
│                 │              GPU-2 │Tesla-V100-SXM2-32GB       0 %      0 │
│                 │              GPU-3 │Tesla-V100-SXM2-32GB       0 %      0 │
├─────────────────┼────────────────────┼──────────────────────────────────────┤
│ Per Node Usage  │    CPU hour Usage  │             RAM Usage (GB)           │
│                 │      Up to Now     │         Now            Peak          │
├─────────────────┼────────────────────┼──────────────────────────────────────┤
│     SPG-1-4     │      410.602       │       6.537 GB        7.614 GB       │
└─────────────────┴────────────────────┴──────────────────────────────────────┘
Show job usage for all running jobs
$ showjob
Show job usage for a job with job ID “12345”
$ showjob -j 12345
Show job usage for jobs in “intel” partition
$ showjob -p intel

After a job is finished(e.g. COMPLETED/TIMEOUT/FAILED/OUT_OF_MEMORY), user may check the CPU/RAM usage with commands showjob -x <day> as described below. The job state (code table) would tell if a job is finished normally.

Show finished jobs today

For job “1234”, it completed normally as its state was “COMPLETED”. However, attention has to be paid for the job “1235” because it was aborted when the requested wall time of 1 day was exhausted and its state became “TIMEOUT“.

$ showjob -x 0
Job ID: 1234                                      Mon Jan 10 16:12:00 HKT 2022
╒═════════════════╤═══════════════════════════════════╤═══════════════════════╕
│ User: tmchan    │ Name: sim                         │ State: COMPLETED      │
│  QoS: normal    │ Partition: intel                  │ Exit code: 0          │
├─────────────────┼───────────────────────────────────┴───────────────────────┤
│     Start time: │ 2022-01-06 14:05:43                                       │
│       End time: │ 2022-01-10 04:17:23                                       │
│      Wall time: │          3-14:11:40                                       │
╞═════════════════╪════════════════════╤══════════════════════════════════════╡
│     Resource    │           Requests │ Usage                     Efficiency │
├─────────────────┼────────────────────┼──────────────────────────────────────┤
│         Node    │                  3 │ GPI-1-2,GPI-2-14,GPI-3-17            │
│          CPU    │                 96 │ 95.673                       99.659% │
│          RAM    │             281 GB │ 2.325 GB                      0.827% │
└─────────────────┴────────────────────┴──────────────────────────────────────┘

 Job ID: 1235                                     Mon Jan 10 16:12:01 HKT 2022
╒═════════════════╤═══════════════════════════════════╤═══════════════════════╕
│ User: tmchan    │ Name: sim                         │ State: TIMEOUT        │
│  QoS: normal    │ Partition: intel                  │ Exit code: 0          │
├─────────────────┼───────────────────────────────────┴───────────────────────┤
│     Start time: │ 2022-01-09 15:07:56                                       │
│       End time: │ 2022-01-10 15:08:01                                       │
│      Wall time: │          1-00:00:05                                       │
╞═════════════════╪════════════════════╤══════════════════════════════════════╡
│     Resource    │           Requests │ Usage                     Efficiency │
├─────────────────┼────────────────────┼──────────────────────────────────────┤
│         Node    │                  1 │ GPI-2-1                              │
│          CPU    │                 32 │ 31.871                       99.598% │
│          RAM    │              94 GB │ 9.231 GB                      9.820% │
└─────────────────┴────────────────────┴──────────────────────────────────────┘

Show a finished job today with job ID 12345
$ showjob -x 0 -j 12345
Show finished job(s) today in partition ‘gpu’
$ showjob -x 0 -p gpu
Show finished job(s) with state “TIMEOUT” today
$ showjob -x 0 -s TIMEOUT
Show finished job(s) in the past 7 days
$ showjob -x 7
Show finished job(s) in the past 7 days and in partition ‘gpu’
$ showjob -x 7 -p gpu

List detailed information for a job (for troubleshooting)

$ scontrol show job <JobID>

Checking the resource utilization of a running job

Command : ta <JOB_ID>

$ ta 216
JOBID: 216
================================ GPA-1-20 ===================================
top - 16:41:18 up 149 days, 11:54,  0 users,  load average: 20.05, 19.80, 19.73
Tasks: 608 total,   2 running, 606 sleeping,   0 stopped,   0 zombie
Cpu(s): 79.0%us,  1.9%sy,  0.0%ni, 16.0%id,  3.1%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:  99077612k total,  10895060k used, 88182552k free,   84436k buffers
Swap: 122878968k total,    19552k used, 122859416k free,  7575444k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
29144 h0xxxxxx  20   0 97.5g  84m 6200 R 1995.2  1.7  4982:46 l502.exe          
 2667 h0xxxxxx  20   0 15932 1500 1248 S  2.0  0.0   0:00.00 top               
 2622 h0xxxxxx  20   0 98.8m 1284 1076 S  0.0  0.0   0:00.00 sshd    
 2623 h0xxxxxx  20   0  105m  896  696 S  0.0  0.0   0:00.00 g09                
 2668 h0xxxxxx  20   0  100m  836 1168 S  0.0  0.0   0:00.00 226.hpc2015               
29800 h0xxxxxx  20   0  105m 1172  836 R  0.0  0.0   0:00.00 bash                
29801 h0xxxxxx  20   0  100m  848  728 S  0.0  0.0   0:00.00 grep               
29802 h0xxxxxx  20   0 98.6m  604  512 S  0.0  0.0   0:00.00 head

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2       416G   59G  395G   1% /tmp

You can see the CPU utilization under CPU stats. This example show the process 1502.exe running in parallel on the 20-core system with 1995.2% of the CPU utilization (2000% utilization means all 20 cores of GPA-1-20 are fully used). It also provides information such as memory usage(10895060k ~ 10MB used) , runtime of the processes and local /tmp disk usage(59GB used).

Delete / cancel a job

$ scancel <JobID>

Delete / cancel all jobs for a user

$ scancel -u <Username>

Update attributes of submitted jobs

Update walltime request of a queuing job (a job which is pending and not yet start to run) to 1 hour. Requested walltime can only be updated to be shorter once it is running.

$ scontrol update jobid=<JobID> TimeLimit=01:00:00

Check Partition/Node Usage

User can use command plist to check the status of partitions and nodes

$ plist
PARTITION NODES NODES(A/I/O/T) S:C:T MEMORY   TIMELIMIT  AVAIL_FEATURES            NODELIST 
intel*    84    57/25/2/84     2:16:1 192000   4-00:00:00 CPU_MNF:INTEL,CPU_SKU:622 GPI-1-[1-20],GPI-2-[1-64] 
amd       28    16/12/0/28     2:64:1 512000   4-00:00:00 CPU_MNF:AMD,CPU_SKU:7742, GPA-2-[1-28] 
amd       28    16/12/0/28     2:32:1 256000   4-00:00:00 CPU_MNF:AMD,CPU_SKU:7542, GPA-1-[1-28] 
gpu       7     6/1/0/7        2:16:1 384000   7-00:00:00 CPU_MNF:INTEL,CPU_SKU:622 SPG-1-[1-4],SPG-2-[1-3] 
hugemem   2     1/1/0/2        2:64:1 2048000  7-00:00:00 CPU_MNF:AMD,CPU_SKU:7742, SPH-1-[1-2]

where

  • NODES(A/I/O/T) shows the count of nodes of state “allocated/idle/other/total”
  • S:C:T shows count of sockets (S), cores (C) per socket and threads (T) per core on the nodes
  • AVAIL_FEATURES gives the node features which can be used as “Constraint”

Check GPU Node Usage

User can use command gpu_avail to check the status of  GPU nodes

$ gpu_avail
╒═════════╤════════════════════╤════════════════════╕
│ Compute │    TRES per node   │      Available     │
│  node   │  CPU  RAM(GB)  GPU │  CPU  RAM(GB)  GPU │
├─────────┼────────────────────┼────────────────────┤
│ SPG-1-1 │   32     384    4  │   22     224    3  │
│ SPG-1-2 │   32     384    4  │    0     290    0  │
│ SPG-1-3 │   32     384    4  │    7     180    1  │
│ SPG-1-4 │   32     384    4  │    0     290    0  │
│ SPG-2-1 │   32     384    8  │   24     361    4  │
│ SPG-2-2 │   32     384    8  │    1     293    2  │
│ SPG-2-3 │   32     384    8  │    1     246    3  │
└─────────┴────────────────────┴────────────────────┘