jobstats
Jobstats [1] is a free and open-source job monitoring platform designed for CPU and GPU clusters that use the Slurm workload manager. The jobstats command generates a job report:
$ jobstats job.id
================================================================================
Slurm Job Statistics
================================================================================
Job ID: job.id
User/Account: user.name/pi.group
Job Name: job.name
State: RUNNING
Nodes: 1
CPU Cores: 1
CPU Memory: 64GB (64GB per CPU-core)
GPUs: 1
QOS/Partition: gpu/l40s
Cluster: hpc2021
Start Time: Sat Jan 11, 2025 at 1:11 PM
Run Time: 01:44:12 (in progress)
Time Limit: 1-00:00:00
Overall Utilization
================================================================================
CPU utilization [|||||||||||||||||||||||||||||||||||||||||||||||99%] <- Exact value
CPU memory usage [| 2%] <- Peak value
GPU utilization [|||||||||||||||||||||||||||||||||||||||||| 84%] <- Average value
GPU memory usage [|||||||| 17%] <- Peak value
Detailed Utilization
================================================================================
CPU utilization per node (CPU time used/run time)
SPG-4-4: 01:42:43/01:44:12 (efficiency=98.6%)
CPU memory usage per node - used/allocated
SPG-4-4: 1.2GB/64GB (1.2GB/64GB per core of 1)
GPU utilization per node
SPG-4-4 (GPU 7): 84.5%
GPU memory usage per node - maximum used/total
SPG-4-4 (GPU 7): 7.5GB/45GB (16.6%)
Notes
================================================================================
* This job only used 2% of the 64GB of total allocated CPU memory. For
future jobs, please allocate less memory by using a Slurm directive such
as --mem-per-cpu=2G or --mem=2G. This will reduce your queue times and
make the resources available to other users. For more info:
https://researchcomputing.princeton.edu/support/knowledge-base/memory
* Have a nice day!