AI Research System – Research Computing, HKU ITS

AI-Research is an advanced computing platform with state-of-the-art GPU accelerators to facilitate sophistication in various disciplines of research.

This guide consists of five major parts:

Hardware specification
System Login
Accessing GPU
SLURM scheduler which controls access to GPUs
How to use containers to run software with enroot or Singularity
Other local commands

(Note: Default per user disk quota (/home/user.name) is 50GB and default group quota (/group/pi_group) is 5TB. We do not accept application on extending quota limit.)

Hardware Specification

The system is a NVIDIA DGX A100 machine which consists of:

Dual AMD EPYC 7742, 2.25GHz CPU(base), 3.4Ghz (max boost)
(128-cores total; 256-threads due to Simultaneous multithreading (SMT) feature)
1TB DDR4 RAM
Eight NVIDIA A100 SXM4 GPUs with 40GB GDDR6
NVSwitch
14TB NVMe SSD local storage
200Gb/s HDR InfiniBand
Ubuntu 22.04 LTS

Further hardware details are available in its official datasheet.

System Login

The AI-research system, ai-research.hku.hk, can be accessed via Secure Shell from any device within HKU campus network (physically connected to HKU campus network, or using SSID “HKU” while on campus. If you need off-campus access, please use HKUVPN service).

Following SSH features are enabled:

SFTP (for file transfer)
X Tunneling (for graphics software — not fully supported)
Dynamic Port Tunneling

Accessing GPU

Here we list some common-pitfalls where new user may come across:

`nvidia-smi` shows nothing?

$ nvidia-smi
Failed to initialize NVML: Unknown Error

Users whom logon to the node DO NOT have GPU access immediately (you will need to use the SLURM scheduler to apply for GPU). If you are inside an interactive session, or submitted a job script, then you will be allowed access to the GPUs.

Do not assume no one is using the GPUs.
There maybe other users using the GPUs (either interactively, or have their submitted jobs running). Check the immediate availability of GPUs with the gpu_avail command. If you have requested interactive job and the GPU resources is not enough to fulfill your request, you will be put into waiting (and thus it appears as hung). To achieve efficient and fair use of resources, please refrain from requesting for more GPU than your job is able to use.

Why I cannot use docker?

Our scheduler (SLURM) is currently not compatible with docker, as the GPUs are allocated by SLURM to ensure fair share of resources, docker access by general users are not available. Please use enroot or singularity to pull docker from the docker repositories instead.

SLURM scheduler

Access to GPU card is not granted upon system login as resources are controlled and scheduled via the SLURM scheduler. To gain access to GPUs, you have to submit a SLURM job. Jobs could be interactive ( you can type commands during a session as if you directly logged in ) or batch ( you prepare a job script to include all the commands to be executed for your tasks and submit to run where you cannot interact with the system since then) depending on what you want to do.

Interactive Job

Interactive job allows you to use GPU interactively (of course, when there is immediately-available GPUs). To submit an interactive job with 1 GPU card for a maximum runtime of 5 minutes, use:

$ srun --pty --gres gpu:1 --time 5 /bin/bash

On default memory are allocated at a rate of 64GB/GPU. In case your program would require more than that, you should explicitly request for more memory (for example, 100GB for 1 GPU). Note that in SLURM memories are specified in the unit of MB:

$ srun --pty --gres gpu:1 --memory=100000 --time 5 /bin/bash

Please be noted that currently we limit maximum memory at 400GB/job.

Before submitting an interactive job, you might check the number of available GPUs with the gpu_avail command:

$ gpu_avail
out of 8 GPUs are occupied: GPU 1 GPU 2 GPU 3 GPU 4 GPU 5 GPU 6 GPU 7 GPU 8
Legend: [USING] AVAIL

The printout of the command should be self-explanatory. If you use a terminal multiplexer (screen/tmux), it should be done outside your srun command.

Currently SLURM only controls GPU and memory. Although users may still use CPU, memory and disk resources without having an active GPU job, users are advised to use them only for:

Batch Job

If you feel comfortable with shell script, you may put them into a text file, and submit it onto the system. In the following is a text file which shows the basic of a SLURM job script which uses the enroot container (as described in later sections). The script downloads a container image, starts it (and run a nvidia-smi in order to prove it could reach GPUs), then exits immediately:

#!/bin/bash
#
#SBATCH --get-user-env
#SBATCH --job-name=slurm_demo                   ## Job name
#SBATCH --partition=debug                       ## Job Queue
#SBATCH --output=slurm_demo.o%j                 ## File that STOUT will be written (%j:Job ID, %t:Task ID)
#SBATCH --error=slurm_demo.e%j                  ## File that STDERR will be written
# Uncomment these two lines if you want e-mail to be sent
##SBATCH --mail-type=END,FAIL                    ## Email notification type: BEGIN,END,FAIL,ALL
##SBATCH --mail-user=user@hku.hk             ## Email that notifications will be sent to
#SBATCH --nodes=1                               ## Number of compute node(s)
#SBATCH --ntasks-per-node=1                     ## Number of process(es) per compute node
#SBATCH --time=00:05:00                         ## Runtime in D-HH:MM/HH:MM:SS/MM:SS
##SBATCH --mem=2000                              ## Total memory over all of the cores(in MB)
##SBATCH --mem-per-cpu=100                       ## Memory per CPU core (in MB)
#SBATCH --gres=gpu:1

echo "Submission Directory : " $SLURM_SUBMIT_DIR
echo "Submission Host      : " $SLURM_SUBMIT_HOST
echo "Job User             : " $SLURM_JOB_USER
echo "Job ID               : " $SLURM_JOB_ID
echo "Job Name             : " $SLURM_JOB_NAME
echo "Queue                : " $SLURM_JOB_PARTITION
echo "Node(s) allocated    : " $SLURM_JOB_NODELIST
echo "Number of Node(s)    : " $SLURM_NNODES
echo "Number of CPU Task(s): " $SLURM_NTASKS
echo "Number of Process(s) : " $SLURM_NPROCS
echo "Task(s) per Node     : " $SLURM_TASKS_PER_NODE
echo "CPU(s) per Task      : " $SLURM_CPUS_PER_TASK
echo "Task ID              : " $SLURM_ARRAY_TASK_ID

echo ===========================================================
echo "Job Start  Time is `date "+%Y/%m/%d -- %H:%M:%S"`"

cd $WORK
OUTFILE=${SLURM_JOB_NAME}.${SLURM_JOB_ID}

nvidia-smi

enroot import docker:nvcr.io#nvidia/cuda:12.0-devel
enroot create nvidia+cuda+12.0-devel
(
cat <<END
nvidia-smi
END
) | enroot start nvidia+cuda+12.0-devel nvidia-smi
rm nvidia+cuda+12.0-devel.sqsh
enroot remove nvidia+cuda+12.0-devel

enroot list
mv ${OUTFILE} ${SLURM_SUBMIT_DIR}

echo "Job Finish Time is `date "+%Y/%m/%d -- %H:%M:%S"`"

exit 0

If you save this file with the name script.slurm, then to submit it, type the following into the shell:

$ sbatch script.slurm

Transferring files from/to the node
Preparing container images (see later sections) before SLURM job submission
Submitting job to SLURM

Container

Currently two docker equivalence, enroot and Singularity are installed on the system. They both support docker images and will be discussed in details below:

enroot

Configuration For NVIDIA GPU Cloud (Optional)

Some containers on NVIDA GPU cloud requires authentication. Once you get the API token (check this on how to get your own token), you may store it in your environment via:

$ cat > ~/.local/share/enroot/.credentials <<END
machine nvcr.io login \$oauthtoken password MmdhYOUR_NGC_TOKEN_0aXFn_DONT_COPY_THIS_lM2Y0NjMtZGFhZi00YWRlLTk0ODYtMDNiN2U3YzBiOWE5
END

After that, you may add the following into your ~/.bashrc, which will be effective in your next login:

$ export ENROOT_CONFIG_PATH=~/.local/share/enroot

To take effect immediately, you should run: source ~/.bashrc

Importing Images

To import the image which you would normally pull (via docker) using docker pull nvcr.io/nvidia/cuda:12.0-devel , you should use:

$ enroot import docker://nvcr.io#nvidia/cuda:12.0-devel

After import, enroot will create squash file with a name, e.g. nvidia+cuda+12.0-devel.sqsh in the example above. You may add -o filename.sqsh after the “import” keyword in order to save it to another file name.

Creating Container from Imported Image

To create a container with a squash file, you should use:

$ enroot create nvidia+cuda+12.0-devel.sqsh

By default, the command will extract the squash file into ~/.local/share/enroot/containername, where containername was generated from your squash file name. You may add -n containername after the “create” keyword in order to set a name to the container.

Listing Containers

To get a list of enroot containers under your folder, you should use:

$ enroot list

Running Container

(For enroot, the container only starts if you have a valid GPU connection)

To run the container with the name nvidia+cuda+12.0-devel and starting a bash shell, you should use:

$ enroot start nvidia+cuda+12.0-devel /bin/bash

Similarly you may start a container with other name and other programs. You should add -w (write) if you would like to change files inside the container.

(Variants such as batch and exec are also supported by enroot.)

Deleting Container

To delete a container called containername, you may use:

$ enroot remove containername

You will be asked to delete the folder containing the root filesystem. You have to answer “y” or “N” (typing “yes” will do nothing).

Notes on Accessing Files in a container

A user may mount other folder to the container to make it accessible inside. For example, to mount your current working directory into the container as /mnt inside:

$ enroot start --mount .:mnt nvidi+cuda+12.0-devel /bin/bash

User may interact with the folder in order to access files inside the container even when the container is not running. The root filesystem of the container is at:
~/.local/share/enroot/containername

Further Information

Further information of enroot‘s usage can be found at https://github.com/NVIDIA/enroot/blob/master/doc/usage.md

Singularity

Configuration For NVIDIA GPU Cloud (Optional)

Some containers on NVIDA GPU cloud requires authentication. Once you get the API token (check this on how to get your own token), you may add the following into your ~/.bashrc, which will be effective in your next login:

$ export SINGULARITY_DOCKER_USERNAME="\$oauthtoken"
$ export SINGULARITY_DOCKER_PASSWORD="MmdhYOUR_NGC_TOKEN_0aXFn_DONT_COPY_THIS_lM2Y0NjMtZGFhZi00YWRlLTk0ODYtMDNiN2U3YzBiOWE5"

To take effect immediately, you should run: source ~/.bashrc

Importing Images and Creating Container

To import the image which you would normally pull (via docker) using docker pull nvcr.io/nvidia/cuda:12.0-devel , you should use:

$ singularity build cuda12.simg docker://nvcr.io/nvidia/cuda:12.0-devel

The command will create a “simg” file (cuda11.simg) which is the container. You may create containers with differing names by creating multiple “simg” files.

Running Container

To run the container simg file, you should use:

$ singularity shell --nv cuda12.simg

You will get a shell starting with Singularity> which behaves like a normal shell at the same path where you run the singularity command.

(Variants such as exec and run are supported by singularity.)

Deleting Container

Simply deleting the simg file is okay. You should run:

$ rm cuda12.simg

Further Information

Further information on usage of Singularity is at https://sylabs.io/guides/3.6/user-guide/cli/singularity.html

Other Local Commands

There are several local commands for users’ convenience when using the system:

gpu_avail

This is the local command for checking immediate GPU availability. Just type “gpu_avail” on the shell and you will see something like this:

2 out of 8 GPUs are allocated:
GPU 0 GPU 1 [GPU 2][GPU 3] GPU 4 GPU 5 GPU 6 GPU 7
Legend: [ALLOC] AVAIL

The coloured output should be self-explanatory and this allows user to enquire how many GPUs are immediately available if they would like to submit interactive jobs.

gpu_smi

This is the local command for checking GPU resource usage for a user’s own running jobs. In normal cases if one have an interactive session and running their compute loads, they would like to confirm the occupancy of their GPUs. But a simple run of “nvidia-smi” at another command prompt would be barred from accessing such information (as this separate command prompt is not part of any job).

Just type “gpu_smi” on the shell and you will get something like this if you have a running job, for each of them:

Running Job ID: 5033 [ GPU2 GPU3 ]
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 50 23 23 0 0 0 0 1215 210
1 51 22 22 0 0 0 0 1215 210
# gpu pid type sm mem enc dec command
# Idx # C/G % % % % name
0 - - - - - - -
1 - - - - - - -

For each of the user’s running job, the system will list the job number along with the physical GPU the job is using, then a (per job) print out of GPU’s usage in an abridged form (“dmon” and “pmon”). Note that the GPU IDs always start from “0” in the listing, but these IDs do not correspond to the physical GPUs but the GPUs visible to the user’s job. This allows users to find if their job are actually using the GPU.