Account and Privilege
How can I reset password for my HPC account? [HPC2015, HPC2021, AI-Research]
- You may reset your HPC account password by changing HKU Portal PIN, the corresponding HPC2021 account password will be reset to be identical to the HKU Portal PIN.
How can I apply for approval for special resources? [HPC2015, HPC2021]
- If you have program/application showing good efficiency and scalability, you may request for more computation resources per job to allow for intensively-parallel workload. Besides, if you have program/application requiring special computing resources (i.e. GPU, huge memory), you may request for special resources. Please fill in CF162 (for PI group) and CF162f (for individual account) to apply for additional computing resources for using Research Computing facilities and it will be considered accordingly.
How can I view and edit scripts in a HPC system? [HPC201v5, HPC2021, AI-Research]
- You may use any command-line text editors like vi, emacs, nano and pico on any login nodes to view and edit plain-text files, like source codes and scripts. More details can be found at Linux and HPC guide.
How can I recover recently altered/deleted files in my home folder? [HPC2021]
- Accessing snapshots
- Home folders(/home/*) in HPC2021 are served by ZFS, which supports a handy feature called “snapshot“. The taking of snapshots allows a user to quickly roll back to a previous version of files or even retrieve recently removed ones. A daily snapshot is taken during 3~4 am local time and a maximum of 7 snapshots are retained such that you may retrieve files that were altered up to ~7 days ago.
- To get a list of snapshots, run:
$ ls -al /home/username/.zfs/snapshot/
- Note that the ~/.zfs folder is special as it is not displayed even with command
ls -alwhere all folders including hidden ones should be displayed. The timestamp of when a snapshot was taken is apparent in the snapshot name. The contents in a home folder at a particular point in time is kept in the snapshot (which is always read-only, regardless of what the Unix permission bits says). For example, the snapshot of the file on 22 March 2021 could be found with command below:
$ ls -al /home/username/.zfs/snapshot/zfs-auto-snap_daily-2021-03-22-0319/my_file
- You may use a text editor/viewer to show the content of a read-only file with a full path. However, in order to get a file back from a snapshot, it should be copied to a path outside the snapshot folder. Note the use of the
-aflag which also copies the file attributes:
$ cp -a /home/username/.zfs/snapshot/zfs-auto-snap_daily-2021-03-22-0319/my_file /home/username/my_file
Why does my file look fine on Microsoft Windows but malformed in HPC? [HPC2015, HPC2021, AI-Research]
- Because of the differences in the use of end of Line (EOL) characters between Microsoft Windows and Linux platform, file created on Windows may look malformed or corrupted and you may convert the file to be Linux-compatible format with command
SLURM Job Scheduler
Are there equivalent PBS commands / variables in SLURM? [HPC2021, AI-Research]
- Below are some of the equivalent or similar commands/variables between PBS and SLURM scheduler for your quick reference.
PBS SLURM Description
Submit a job with the job_script
Delete a job with job id = $job_id
qstat -u $login
squeue --user $login
List a user’s running and pending job(s)
Show list of job status
srun --pty bash
Request an interactive job (Not work yet, need master node to resolve compute node host name?)
qstat -f $job_id
qstat -xf $job_id
scontrol show job $job_id
Show job details
-x in PBS Pro to show completed jobs
Show partition and queue configuration
scontrol show partition
Show partition details
PBS SLURM Description
Specify queue/parition name
-N 2 -n 32
Specify total number of nodes with
-Nand total number of tasks ( nodes x ppn / select * ncpus) In PBS) with
--mem=32gb(serial job) or
Specify physical memory amount
Specify maximum wall time
STDOUT output path
STDERR output path
Declare job array
Job / Scheduler environment variables
PBS SLURM SLURM Description
Working directory where a job is summitted
Hostname where a job is submitted
How can I estimate the querying time of my jobs? [ HPC2021, AI-Research]
- Backfill scheduling is used to make better use of available resources by “filling in” reserved job slots, such that the jobs do not delay the start of another job. For this reason, it is critical to estimate the time required for your job as accurately as possible.
Prior to submitting a job, you can check when it is estimated to be run:
$ sbatch --test-only myscript.sh
- For a job that has already been submitted, you can check its status:
$ squeue --start -j <jobid>
- While you should not underestimate, excessive overestimation can make it appear that subsequent jobs won’t start for a long time. A good rule of thumb, when possible, is to request about 10-15% more time than you think is required.
Why was my SLURM job terminated with a message “killed by the cgroup out-of-memory handler”? [HPC2021, AI-Research]
- Jobs may fail because requested amount of system memory is insufficient at the runtime of a job and a Slurm error may be reported as below:
slurmstepd: error: Detected 1 oom-kill event(s). Some of your processes may have been killed by the cgroup out-of-memory handler.
This means Slurm detected the job hitting the maximum requested memory and then the job was killed.
- These errors can be fixed in two ways.
- 1. Request More Memory
- Adjust the value for the “
--mem-per-cpu” or “
--mem” option like below
See the user guide here for details.
- 2. Use Less Memory
- You may inspect the command line argument available with your program and identify if ways like reducing the number of threads may consume less memory.
Access to Graphical Interface
How can I remotely access a server’s graphical desktop interface? [HPC2015, HPC2021]
- Virtual Network Computing (VNC) is a graphical desktop-sharing system that allows a user to remotely control another computer’s graphical desktop interface (e.g. GNOME and KDE) over a network. It is particular useful for users to work on multiple applications available with graphical user interface only.
- 1. A VNC client (e.g. RealVNC and tightVNC)
- 2. An SSH client (e.g. Windows 10’s Command Prompt / Putty / MobaXterm, macOS or Linux’s terminal)
- 3. A user account on any HPC cluster system to login a server with VNC connection support as listed below
HPC system Server HPC2015 hpc2015-file.hku.hk HPC2021 hpc2021-io1.hku.hk
- As a VNC session is unencrypted once it is established, such connection have to be made via tunneling within an encrypted SSH connection as described below for the sake of security.
- Connect to a VNC session
- Make sure your local device is connected to HKU campus network or HKUVPN2FA for off campus connection.
- Login to a server above (hpc2021-io1.hku.hk in this example) using an SSH client.
- Start the vncserver on the remote machine by command “
vncserver“. Upon first execution of vncserver, you will be prompted to set a password for VNC connection. It is recommended to select a strong password with at least 8 characters (Such password is independent of the password for SSH login to the system).
vncserverYou will require a password to access your desktops. Password: ******** Verify: ******** New 'hpc2021-io1.hku.hk:1 (username)' desktop is hpc2021-io1.hku.hk:1
The vncserver will choose the next available display number and the number will vary from session to session.
Take note of the number following the colon(:). it means your VNC session is on display 1 of server hpc2021-io1.hku.hk, which is listening to the port 5900 +1 = 5901. This (5901) is the port number that you will forward in an SSH tunnel.
- 1. In a terminal on a local device (such as Windows’ command prompt or macOS/Linux’s terminal, run the following to make an SSH session to the server running a VNC server.
$ ssh -L 127.0.0.1:5901:hpc2021-io1:5901 email@example.com
It will create an SSH tunnel that forwards your local machine’s port 5901 to hpc2021-io1’s port 5901 (i.e. the port that the VNC session is listening to) . Such SSH session has to be kept running in order to keep the forwarding running.
- 2. Start a VNC viewer on a local device, type “localhost:5901” in the box and press ENTER key.
- 3. You may safely ignore the prompt for unencrypted session traffic (actually it has been encrypted via SSH tunneling) and Click “Continue”.
- 4. Type your VNC password and click “OK”. You will see the Linux X-window at your local device.
- Terminate a VNC session
You should terminate a VNC session after use, type the command in a terminal where the VNC session is running:
vncserver -kill :[display #]. $
vncserver -kill :1Killing Xvnc process ID 12345
How can I run graphical application on a HPC cluster? [HPC2015, HPC2021]
- Visualization is an integral part of scientific computing and data analysis workflows. The list of visualization software supported by HPC systems is available at HPC Software. To use an application’s graphical interface, you need to establish an SSH connection with X11-forwarding enabled to one of the following nodes, which will then transmit the display from the remote server to the your local device’s desktop.
HPC system Server(s) for graphical remote connection HPC2015 hpc2015-file.hku.hk HPC2021 hpc2021-io1.hku.hk
- X11 forwarding usage
- 1. Download and install MobaXterm from this page. (N.B. Windows’ Command Prompt does not support X11 forwarding)
- 2. Open MobaXterm and create a new SSH session:
- Remote host: hostname of server to be connected (e.g. hpc2021-io1.hku.hk)
- Port: 22
- 3. Ensure “X11-Forwarding” checkbox is checked in “Advanced SSH Settings”
- 4. Select “OK” to start the session
Built-in support for X11 is no longer included in recent macOS but third party libraries are available from the XQuartz project.
- 1. Download and install XQuartz. Log out and log back in to reset some variables
- 2. Open a terminal and connect with command :
ssh -X [username]@[hostname]
Open a terminal and connect with command:
ssh -X [username]@[hostname]
Alternative Method: Using VNC via ssh tunneling