Search FAQ:
Account and Privilege
How can I reset password for my HPC account? [HPC2021, AI-Research]
- You may reset your HPC account password by changing HKU Portal PIN, the corresponding HPC2021 account password will be reset to be identical to the HKU Portal PIN.
How can I apply for approval for special resources? [HPC2021]
- If you have program/application showing good efficiency and scalability, you may request for more computation resources per job to allow for intensively-parallel workload. Besides, if you have program/application requiring special computing resources (i.e. GPU, huge memory), you may request for special resources. Please fill in CF162 (for PI group) and CF162f (for individual account) to apply for additional computing resources for using Research Computing facilities and it will be considered accordingly.
File Handling
How can I view and edit scripts in a HPC system? [HPC201v5, HPC2021, AI-Research]
- You may use any command-line text editors like vi, emacs, nano and pico on any login nodes to view and edit plain-text files, like source codes and scripts. More details can be found at Linux and HPC guide.
How can I recover recently altered/deleted files in my home folder? [HPC2021]
Why does my file look fine on Microsoft Windows but malformed in HPC? [HPC2021, AI-Research]
- Because of the differences in the use of end of Line (EOL) characters between Microsoft Windows and Linux platform, file created on Windows may look malformed or corrupted and you may convert the file to be Linux-compatible format with command
dos2unix <filename>
.
SLURM Job Scheduler
Are there equivalent PBS commands / variables in SLURM? [HPC2021, AI-Research]
- Below are some of the equivalent or similar commands/variables between PBS and SLURM scheduler for your quick reference.
-
Scheduler commands
-
PBS SLURM Description qsub $job_script
sbatch $job_script
Submit a job with the job_script qdel $job_id
scancel $job_id
Delete a job with job id = $job_id qstat -u $login
squeue --user $login
List a user’s running and pending job(s) showq
showq
Show list of job status qsub -I
srun --pty bash
Request an interactive job (Not work yet, need master node to resolve compute node host name?) qstat -f $job_id
qstat -xf $job_id
scontrol show job $job_id
Show job details
-x in PBS Pro to show completed jobsqstat -q
sinfo
/sinfo -s
Show partition and queue configuration qstat -Q
scontrol show partition
Show partition details
-
Scheduler directives
-
PBS SLURM Description #PBS
#SBATCH
Scheduler directive -q $queue_name
-p $queue_name
Specify queue/parition name -l nodes=2:ppn=16
or
-l select=2:ncpus=16
-N 2 -n 32
Specify total number of nodes with -N
and total number of tasks ( nodes x ppn / select * ncpus) In PBS) with-n
-l mem=32gb
--mem=32gb
(serial job) or
--mem-per-cpu=1gb
(parallel job)Specify physical memory amount -l walltime=h:mm:ss
-t minutes
or
-t days-hh:mm:ss
Specify maximum wall time -o $file_path
-o $file_path
STDOUT output path -e $file_path
-e $file_path
STDERR output path -t start_index-end_index
--array start_index-end_index
Declare job array -
Job / Scheduler environment variables
-
PBS SLURM SLURM Description $PBS_JOBID
$SLURM_JOBID
Job ID $PBS_O_WORKDIR
$SLURM_SUBMIT_DIR
Working directory where a job is summitted $PBS_O_HOST
$SLURM_SUBMIT_HOST
Hostname where a job is submitted
How can I estimate the querying time of my jobs? [ HPC2021, AI-Research]
- Backfill scheduling is used to make better use of available resources by “filling in” reserved job slots, such that the jobs do not delay the start of another job. For this reason, it is critical to estimate the time required for your job as accurately as possible.
Prior to submitting a job, you can check when it is estimated to be run: -
$ sbatch --test-only myscript.sh
- For a job that has already been submitted, you can check its status:
-
$ squeue --start -j <jobid>
- While you should not underestimate, excessive overestimation can make it appear that subsequent jobs won’t start for a long time. A good rule of thumb, when possible, is to request about 10-15% more time than you think is required.
Why was my SLURM job terminated with a message “killed by the cgroup out-of-memory handler”? [HPC2021, AI-Research]
- Jobs may fail because requested amount of system memory is insufficient at the runtime of a job and a Slurm error may be reported as below:
-
slurmstepd: error: Detected 1 oom-kill event(s). Some of your processes may have been killed by the cgroup out-of-memory handler.
This means Slurm detected the job hitting the maximum requested memory and then the job was killed.
- These errors can be fixed in two ways.
- 1. Request More Memory
- Adjust the value for the “
--mem-per-cpu
” or “--mem
” option like below#SBATCH --mem-per-cpu=6G
See the user guide here for details.
- 2. Use Less Memory
- You may inspect the command line argument available with your program and identify if ways like reducing the number of threads may consume less memory.
Access to Graphical Interface
How can I remotely access a server’s graphical desktop interface? [HPC2021]
- Virtual Network Computing (VNC) is a graphical desktop-sharing system that allows a user to remotely control another computer’s graphical desktop interface (e.g. GNOME and KDE) over a network. It is particular useful for users to work on multiple applications available with graphical user interface only.
- Requirements:
- 1. A VNC client (e.g. RealVNC and tightVNC)
- 2. An SSH client (e.g. Windows 10’s Command Prompt / Putty / MobaXterm, macOS or Linux’s terminal)
- 3. A user account on any HPC cluster system to login a server with VNC connection support as listed below
-
HPC system Server HPC2021 hpc2021-io1.hku.hk
hpc2021-io2.hku.hk - As a VNC session is unencrypted once it is established, such connection have to be made via tunneling within an encrypted SSH connection as described below for the sake of security.
- Connect to a VNC session
- Make sure your local device is connected to HKU campus network or HKUVPN2FA for off campus connection.
- Login to a server above (hpc2021-io1.hku.hk in this example) using an SSH client.
- Start the vncserver on the remote machine by command “
vncserver
“. Upon first execution of vncserver, you will be prompted to set a password for VNC connection. It is recommended to select a strong password with at least 8 characters (Such password is independent of the password for SSH login to the system).$
vncserver
You will require a password to access your desktops. Password: ******** Verify: ******** New 'hpc2021-io1.hku.hk:1 (username)' desktop is hpc2021-io1.hku.hk:1The vncserver will choose the next available display number and the number will vary from session to session.
Take note of the number following the colon(:). it means your VNC session is on display 1 of server hpc2021-io1.hku.hk, which is listening to the port 5900 +1 = 5901. This (5901) is the port number that you will forward in an SSH tunnel.
- 1. In a terminal on a local device (such as Windows’ command prompt or macOS/Linux’s terminal, run the following to make an SSH session to the server running a VNC server.
-
$ ssh -L 127.0.0.1:5901:hpc2021-io1:5901 username@hpc2021-io1.hku.hk
It will create an SSH tunnel that forwards your local machine’s port 5901 to hpc2021-io1’s port 5901 (i.e. the port that the VNC session is listening to) . Such SSH session has to be kept running in order to keep the forwarding running.
- 2. Start a VNC viewer on a local device, type “localhost:5901” in the box and press ENTER key.
- 3. You may safely ignore the prompt for unencrypted session traffic (actually it has been encrypted via SSH tunneling) and Click “Continue”.
- 4. Type your VNC password and click “OK”. You will see the Linux X-window at your local device.
- Terminate a VNC session
You should terminate a VNC session after use, type the command in a terminal where the VNC session is running:vncserver -kill :[display #]
. $vncserver -kill :1
Killing Xvnc process ID 12345
How can I run graphical application on a HPC cluster? [HPC2021]
- Visualization is an integral part of scientific computing and data analysis workflows. The list of visualization software supported by HPC systems is available at HPC Software. To use an application’s graphical interface, you need to establish an SSH connection with X11-forwarding enabled to one of the following nodes, which will then transmit the display from the remote server to the your local device’s desktop.
-
HPC system Server(s) for graphical remote connection HPC2021 hpc2021-io1.hku.hk
hpc2021-io2.hku.hk - X11 forwarding usage
- Windows:
- 1. Download and install MobaXterm from this page. (N.B. Windows’ Command Prompt does not support X11 forwarding)
- 2. Open MobaXterm and create a new SSH session:
- Remote host: hostname of server to be connected (e.g. hpc2021-io1.hku.hk)
- Port: 22
- 3. Ensure “X11-Forwarding” checkbox is checked in “Advanced SSH Settings”
- 4. Select “OK” to start the session
- macOS:
Built-in support for X11 is no longer included in recent macOS but third party libraries are available from the XQuartz project.
-
- 1. Download and install XQuartz. Log out and log back in to reset some variables
- 2. Open a terminal and connect with command :
ssh -X [username]@[hostname]
- Linux:
Open a terminal and connect with command:ssh -X [username]@[hostname]
Alternative Method: Using VNC via ssh tunneling