Search FAQ:


Account and Privilege

How can I reset password for my HPC account? [HPC2015, HPC2021, AI-Research]

  • You may reset your HPC account password by changing HKU Portal PIN, the corresponding HPC2021 account password will be reset to be identical to the HKU Portal PIN.

How can I apply for approval for special resources? [HPC2015, HPC2021]

  • If you have program/application showing good efficiency and scalability, you may request for more computation resources per job to allow for intensively-parallel workload. Besides, if you have program/application requiring special computing resources (i.e. GPU, huge memory), you may request for special resources. Please fill in CF162 (for group account) and CF162f (for individual account) to apply for additional computing resources for using Research Computing facilities and it will be considered accordingly.

File Handling

How can I view and edit scripts in a HPC system? [HPC201v5, HPC2021, AI-Research]

  • You may use any command-line text editors like vi, emacs, nano and pico on any login nodes to view and edit plain-text files, like source codes and scripts. More details can be found at Linux and HPC guide.

How can I recover recently altered/deleted files in my home folder? [HPC2021]

  • Accessing snapshots
  • Home folders(/home/*) in HPC2021 are served by ZFS, which supports a handy feature called “snapshot“. The taking of snapshots allows a user to quickly roll back to a previous version of files or even retrieve recently removed ones. A daily snapshot is taken during 3~4 am local time and a maximum of 7 snapshots are retained such that you may retrieve files that were altered up to ~7 days ago.
  • To get a list of snapshots, run:
  • $ ls -al /home/username/.zfs/snapshot/
  • Note that the ~/.zfs folder is special as it is not displayed even with command ls -al where all folders including hidden ones should be displayed. The timestamp of when a snapshot was taken is apparent in the snapshot name. The contents in a home folder at a particular point in time is kept in the snapshot (which is always read-only, regardless of what the Unix permission bits says). For example, the snapshot of the file on 22 March 2021 could be found with command below:
  • $ ls -al /home/username/.zfs/snapshot/zfs-auto-snap_daily-2021-03-22-0319/my_file
  • You may use a text editor/viewer to show the content of a read-only file with a full path. However, in order to get a file back from a snapshot, it should be copied to a path outside the snapshot folder. Note the use of the -a flag which also copies the file attributes:
  • $ cp -a /home/username/.zfs/snapshot/zfs-auto-snap_daily-2021-03-22-0319/my_file /home/username/my_file

Why does my file look fine on Microsoft Windows but malformed in HPC? [HPC2015, HPC2021, AI-Research]

  • Because of the differences in the use of end of Line (EOL) characters between Microsoft Windows and Linux platform, file created on Windows may look malformed or corrupted and you may convert the file to be Linux-compatible format with command dos2unix <filename>.

SLURM Job Scheduler

Are there equivalent PBS commands / variables in SLURM? [HPC2021, AI-Research]

  • Below are some of the equivalent or similar commands/variables between PBS and SLURM scheduler for your quick reference.
  • Scheduler commands
  • PBS SLURM Description
    qsub $job_script sbatch $job_script Submit a job with the job_script
    qdel $job_id scancel $job_id Delete a job with job id = $job_id
    qstat -u $login squeue --user $login List a user’s running and pending job(s)
    showq showq Show list of job status
    qsub -I srun --pty bash Request an interactive job (Not work yet, need master node to resolve compute node host name?)
    qstat -f $job_id
    qstat -xf $job_id
    scontrol show job $job_id Show job details
    -x in PBS Pro to show completed jobs
    qstat -q sinfo  /  sinfo -s Show partition and queue configuration
    qstat -Q scontrol show partition Show partition details
  • Scheduler directives
  • PBS SLURM Description
    #PBS #SBATCH Scheduler directive
    -q $queue_name -p $queue_name Specify queue/parition name
    -l nodes=2:ppn=16  or
    -l select=2:ncpus=16
    -N 2 -n 32 Specify total number of nodes with  -N  and total number of tasks ( nodes x ppn / select * ncpus) In PBS) with  -n
    -l mem=32gb --mem=32gb  (serial job) or
    --mem-per-cpu=1gb  (parallel job)
    Specify physical memory amount
    -l walltime=h:mm:ss -t minutes  or
    -t days-hh:mm:ss
    Specify maximum wall time
    -o $file_path -o $file_path STDOUT output path
    -e $file_path -e $file_path STDERR output path
    -t start_index-end_index --array start_index-end_index Declare job array
  • Job / Scheduler environment variables
  • PBS SLURM SLURM Description
    $PBS_JOBID $SLURM_JOBID Job ID
    $PBS_O_WORKDIR $SLURM_SUBMIT_DIR Working directory where a job is summitted
    $PBS_O_HOST $SLURM_SUBMIT_HOST Hostname where a job is submitted

How can I estimate the querying time of my jobs? [ HPC2021, AI-Research]

  • Backfill scheduling is used to make better use of available resources by “filling in” reserved job slots, such that the jobs do not delay the start of another job. For this reason, it is critical to estimate the time required for your job as accurately as possible.
    Prior to submitting a job, you can check when it is estimated to be run:
  • $ sbatch --test-only myscript.sh
  • For a job that has already been submitted, you can check its status:
  • $ squeue --start -j <jobid>
  • While you should not underestimate, excessive overestimation can make it appear that subsequent jobs won’t start for a long time. A good rule of thumb, when possible, is to request about 10-15% more time than you think is required.

Why was my SLURM job terminated with a message “killed by the cgroup out-of-memory handler”? [HPC2021, AI-Research]

  • Jobs may fail because requested amount of system memory is insufficient at the runtime of a job and a Slurm error may be reported as below:
  • slurmstepd: error: Detected 1 oom-kill event(s).
    Some of your processes may have been killed by the cgroup out-of-memory handler.

    This means Slurm detected the job hitting the maximum requested memory and then the job was killed.

  • These errors can be fixed in two ways.
  • 1. Request More Memory
  • Adjust the value for the “--mem-per-cpu” or “--mem” option like below
    #SBATCH --mem-per-cpu=6G

    See the user guide here for details.

  • 2. Use Less Memory
  • You may inspect the command line argument available with your program and identify if ways like reducing the number of threads may consume less memory.

Access to Graphical Interface

How can I remotely access a server’s graphical desktop interface? [HPC2015, HPC2021]

  • Virtual Network Computing (VNC) is a graphical desktop-sharing system that allows a user to remotely control another computer’s graphical desktop interface (e.g. GNOME and KDE) over a network. It is particular useful for users to work on multiple applications available with graphical user interface only.
  • Requirements:
  • 1. A VNC client (e.g. RealVNC  and tightVNC)
  • 2. An SSH client (e.g. Windows 10’s Command Prompt / Putty / MobaXterm, macOS or Linux’s terminal)
  • 3. A user account on any HPC cluster system to login a server with VNC connection support as listed below
  • HPC system Server
    HPC2015 hpc2015-file.hku.hk
    HPC2021 hpc2021-io1.hku.hk
    hpc2021-io2.hku.hk
  • As a VNC session is unencrypted once it is established, such connection have to be made via tunneling within an encrypted SSH connection as described below for the sake of security.
  • Connect to a VNC session
  • Make sure your local device is connected to HKU campus network or HKUVPN2FA for off campus connection.
  • Login to a server above (hpc2021-io1.hku.hk in this example) using an SSH client.
  • Start the vncserver on the remote machine by command “vncserver“. Upon first execution of vncserver, you will be prompted to set a password for VNC connection. It is recommended to select a strong password with at least 8 characters (Such password is independent of the password for SSH login to the system).
    $ vncserver
    You will require a password to access your desktops.
    Password: ********
    Verify: ********
    New 'hpc2021-io1.hku.hk:1 (username)' desktop is hpc2021-io1.hku.hk:1

    The vncserver will choose the next available display number and the number will vary from session to session.

    Take note of the number following the colon(:). it means your VNC session is on display 1 of server hpc2021-io1.hku.hk, which is listening to the port 5900 +1 = 5901. This (5901) is the port number that you will forward in an SSH tunnel.

    • 1. In a terminal on a local device (such as Windows’ command prompt or macOS/Linux’s terminal, run the following to make an SSH session to the server running a VNC server.
    • $ ssh -L 127.0.0.1:5901:hpc2021-io1:5901 username@hpc2021-io1.hku.hk

      It will create an SSH tunnel that forwards your local machine’s port 5901 to hpc2021-io1’s port 5901 (i.e. the port that the VNC session is listening to) . Such SSH session has to be kept running in order to keep the forwarding running.

    • 2. Start a VNC viewer on a local device, type “localhost:5901” in the box and press ENTER key.
    • 3. You may safely ignore the prompt for unencrypted session traffic (actually it has been encrypted via SSH tunneling) and Click “Continue”.
    • 4. Type your VNC password and click “OK”. You will see the Linux X-window at your local device.
  • Terminate a VNC session
    You should terminate a VNC session after use, type the command in a terminal where the VNC session is running:

    vncserver -kill :[display #]. $ vncserver -kill :1
    Killing Xvnc process ID 12345

How can I run graphical application on a HPC cluster? [HPC2015, HPC2021]

  • Visualization is an integral part of scientific computing and data analysis workflows. The list of visualization software supported by HPC systems is available at HPC Software. To use an application’s graphical interface, you need to establish an SSH connection with X11-forwarding enabled to one of the following nodes, which will then transmit the display from the remote server to the your local device’s desktop.
  • HPC system Server(s) for graphical remote connection
    HPC2015 hpc2015-file.hku.hk
    HPC2021 hpc2021-io1.hku.hk
    hpc2021-io2.hku.hk
  •  X11 forwarding usage
  • Windows:
  • 1. Download and install MobaXterm from this page. (N.B. Windows’ Command Prompt does not support X11 forwarding)
  • 2. Open MobaXterm and create a new SSH session:
  • Remote host: hostname of server to be connected (e.g. hpc2021-io1.hku.hk)
  • Port: 22
  • 3. Ensure “X11-Forwarding” checkbox is checked in “Advanced SSH Settings”
  • 4. Select “OK” to start the session
  • macOS:
    Built-in support for X11 is no longer included in recent macOS but third party libraries are available from the XQuartz project.
    • 1. Download and install XQuartz. Log out and log back in to reset some variables
    • 2. Open a terminal and connect with command : ssh -X [username]@[hostname]