HPC2021 System

The HPC2021 system is running CentOS 8 and SLURM workload manager for job scheduling. The cluster is composed of 140 general purpose(GP) compute nodes, 9 special purpose(SP) compute nodes and 3 frontend nodes interconnected with 100Gb/s InfiniBand. Intensive I/O is supported by Lustre parallel file system (Cray ClusterStor), while home directories are serviced by NFS file system with global access. All inter-node communication (MPI/Lustre) is through a low-latency Mellanox High Data Rate (HDR) InfiniBand network.

User can access the cluster via the frontend nodes only:

  1. hpc2021.hku.hk, which is reserved for program modification, compilation and job queue submission/manipulation;

  2. hpc2021-io1.hku.hk and hpc2021-io2.hku.hk which is reserved for file transfer, file management and data analysis/visualization.

System Diagram

HPC2021 System Diagram
HPC2021 System Diagram

Login nodes

HPC2021 provides 3 login nodes for users to interact with the system.

Host name Configuration
hpc2021.hku.hk 2 x Intel Xeon Gold 6226R (16 Core) CPU; 96GB RAM; 2 x 240G SSD; Dual 25GbE (Campus Network)
hpc2021-io1.hku.hk 2 x Intel Xeon Gold 6226R (16 Core) CPU; 192GB RAM; 2 x 240G SSD; NVIDIA RTX6000 GPU (24GB GDDR6 ECC memory); Dual 25GbE (Campus Network)
hpc2021-io2.hku.hk 2 x Intel Xeon Gold 6226R (16 Core) CPU; 192GB RAM; 2 x 240G SSD; NVIDIA RTX6000 GPU (24GB GDDR6 ECC memory); Dual 25GbE (Campus Network)

 


Compute nodes

HPC2021 consists of 4 types of compute nodes accessible via job scheduler with a total of 149 nodes, 8,544 phyiscal CPU Cores and 40TB system memory.

Node Type Configuration Quantity
General Purpose – Intel Dual Intel Xeon Gold 6226R (16 Core) CPU; 192GB RAM; 480GB SSD 84
General Purpose – AMD Dual AMD EPYC 7542 (32 Core) CPU; 256GB RAM; 480GB SSD 28
General Purpose – AMD Dual AMD EPYC 7742 (64 Core) CPU; 512GB RAM; 480GB SSD 28
Special Purpose – Large RAM Dual AMD EPYC 7742 (64 Core) CPU; 2TB RAM; 2 x 12TB SAS Hard Drive; 1.92TB SSD 2
Special Purpose – GPU Dual Intel Xeon 6226R (16 Core) CPU; 384GB RAM; 4 x NVIDIA Tesla V100 32GB SXM2 GPU; 6 x 2TB SATA Hard Drive; 960GB NVMe SSD 4
Special Purpose – GPU Dual Intel Xeon 6226R (16 Core) CPU; 384GB RAM; 8 x NVIDIA Tesla V100 32GB SXM2 GPU; 6 x 2TB SATA Hard Drive; 960GB NVMe SSD 3

Data Storage

Two types of network storage are accessible on any nodes in HPC2021.

  1. A Lustre parallel file system provides high bandwidth file services for parallel workload.
  2. Network file system with ZFS underlying file system offers snapshot feature that protects against accidental or malicious loss of data.
Storage Type Configuration Usable Capacity
Lustre parallel file system Cray ClusterStor L300 1.2PB
Network file system ZFS with snapshots 830TB
Besides, local drives like NVMe SSD allow for extreme IO performance to cope with the most demanding workload.

System Interconnect

  • Connection to Campus network: 25GbE x 2
  • Cluster Interconnect: 200Gb/s HDR Infiniband with 2-tiers fat-tree topology supporting rack-level full bandwidth and a 3:1 blocking cross-rack bandwidth

Software Environment

  • Operating System: CentOS 8 (x86-64)
  • HPC software stack: OpenHPC
  • Job Scheduler: SLURM