HPC2021 System

The HPC2021 system is running Rocky Linux 8 and SLURM workload manager for job scheduling. The cluster is composed of 150 general purpose(GP) compute nodes, 11 special purpose(SP) compute nodes, 18 condo nodes and 3 frontend nodes interconnected with 100Gb/s InfiniBand. Condo nodes are compute nodes hosted in HPC2021 but owned by individual PI/Department/Faculty, all users can access those nodes via job scheduler while the owners have the highest priority. Intensive I/O is supported by Lustre parallel file system (Cray ClusterStor), while home directories are serviced by NFS file system with global access. All inter-node communication (MPI/Lustre) is through a low-latency Mellanox High Data Rate (HDR) InfiniBand network.

User can access the cluster via the frontend nodes only:

  1. hpc2021.hku.hk, which is reserved for program modification, compilation and job queue submission/manipulation;

  2. hpc2021-io1.hku.hk and hpc2021-io2.hku.hk which is reserved for file transfer, file management and data analysis/visualization.

System Diagram

HPC2021 System Diagram
HPC2021 System Diagram

 


Login nodes

HPC2021 provides 3 login nodes for users to interact with the system.

Host name Configuration
hpc2021.hku.hk 2 x Intel Xeon Gold 6226R (16 Core) CPU; 96GB RAM; 2 x 240G SSD; Dual 25GbE (Campus Network)
hpc2021-io1.hku.hk 2 x Intel Xeon Gold 6226R (16 Core) CPU; 192GB RAM; 2 x 240G SSD; NVIDIA RTX6000 GPU (24GB GDDR6 ECC memory); Dual 25GbE (Campus Network)
hpc2021-io2.hku.hk 2 x Intel Xeon Gold 6226R (16 Core) CPU; 192GB RAM; 2 x 240G SSD; NVIDIA RTX6000 GPU (24GB GDDR6 ECC memory); Dual 25GbE (Campus Network)

 


Compute nodes

HPC2021 consists of 4 types of compute nodes accessible via job scheduler with a total of 179 nodes, 13,280 physical CPU Cores and 65TB system memory.


Node Type Configuration Quantity
General Purpose – Intel Dual Intel Xeon Gold 6226R (16 Core) CPU; 192GB RAM; 480GB SSD 84
General Purpose – AMD Dual AMD EPYC 7542 (32 Core) CPU; 256GB RAM; 480GB SSD 28
General Purpose – AMD Dual AMD EPYC 7742 (64 Core) CPU; 512GB RAM; 480GB SSD 28
General Purpose – AMD Dual AMD EPYC 9654 (96 Core) CPU; 768GB RAM; 480GB SSD 10
Special Purpose – Large RAM Dual AMD EPYC 7742 (64 Core) CPU; 2TB RAM; 2 x 12TB SAS Hard Drive; 1.92TB SSD 2
Special Purpose – GPU Dual Intel Xeon 6226R (16 Core) CPU; 384GB RAM; 4 x NVIDIA Tesla V100 32GB SXM2 GPU; 6 x 2TB SATA Hard Drive; 960GB NVMe SSD 4
Special Purpose – GPU Dual Intel Xeon 6226R (16 Core) CPU; 384GB RAM; 8 x NVIDIA Tesla V100 32GB SXM2 GPU; 6 x 2TB SATA Hard Drive; 960GB NVMe SSD

3

Special Purpose – GPU Dual Intel Xeon 6438M (32 Core) CPU; 512GB RAM; 8 x NVIDIA Tesla L40S 48GB PCIe GPU; 3.84TB SSD; 7.68TB NVMe SSD 2
Condominium Model – AMD Dual AMD EPYC 7742 (64 Core) CPU; 512GB RAM; 480GB SSD 8
Condominium Model – AMD Dual AMD EPYC 9654 (96 Core) CPU; 768GB RAM; 480GB SSD 8
Condominium Model – GPU Dual AMD EPYC 7543 (32 Core) CPU; 1024GB RAM; 10 x NVIDIA Tesla L40 48GB PCIe GPU; 800GB SSD 2

Data Storage

Three types of network storage are accessible on any nodes in HPC2021.

  1. A Lustre parallel file system provides high bandwidth file services for parallel workload.
  2. Network file system with ZFS underlying file system offers snapshot feature that protects against accidental or malicious loss of data.
  3. Network file system with Pure Storage underlying file system offers snapshot feature that protects against accidental or malicious loss of data.
Storage Type Configuration Usable Capacity
Lustre parallel file system Cray ClusterStor L300 1.2PB
Network file system ZFS with snapshots 830TB
Network file system Pure Storage with snapshots 500TB
Besides, local drives like NVMe SSD allow for extreme IO performance to cope with the most demanding workload.

System Interconnect

  • Connection to Campus network: 25GbE x 2
  • Cluster Interconnect: 200Gb/s HDR Infiniband with 2-tiers fat-tree topology supporting rack-level full bandwidth and a 3:1 blocking cross-rack bandwidth

Software Environment

  • Operating System: Rocky Linux 8 (x86-64)
  • HPC software stack: OpenHPC
  • Job Scheduler: SLURM