HPC2015 System Configuration – Research Computing, HKU ITS

The HPC2015 system uses CentOS and ROCKS and is managed with Torque/Maui scheduler. The cluster comprised 104 general purpose(GP) compute nodes, 9 special purpose(SP) compute nodes and 2 frontend nodes with a FDR Infinband interconnect. User can access the cluster via two frontend nodes only: hpc2015.hku.hk, which is reserved for program modification, compilation and job queue submission/manipulation; and hpc2015-file.hku.hk which is reserved for file transfer, file management and data analysis/visualization. Data intensive IO is supported by Lustre parallel file system(Intel EE for Lustre), while home directories are serviced by NFS file system with global access. All inter-node communication (MPI/Lustre) is through a FDR Mellanox InfiniBand network. Configuration and features for the nodes, interconnect and I/O systems are described below.

Frontend Nodes

There are 2 HP ProLiant DL380 Gen9 servers as frontend nodes. Each frontend node consists of:

A 10-core Intel E5-2660v3 2.6GHz (Haswell) processor
32GB memory(2x 16GB DDR4-2133MHz ECC DIMM)
Two 500GB 7.2k SAS hard disks
Dual port InfiniBand FDR adaptor
Dual port 10Gb ethernet adaptor
Four Integrated 1Gb ethernet ports

Management Node

A HP ProLiant DL380 Gen9 server which consists of:

A 10-core Intel E5-2660v3 2.6GHz (Haswell) processor
16GB memory(16GB DDR4-2133MHz ECC DIMM)
Two 500GB 7.2k SATA hard disks
Ten 4TB 7.2k SAS hard disks
Dual port InfiniBand FDR adaptor
Four integrated 1Gb ethernet ports

General Purpose Compute Nodes

There are 104 HP ProLiant DL160 Gen9 compute servers. Each GP node consists of:

Two 10-core Intel E5-2660v3 2.6GHz (Haswell) processors
96GB memory(6x 16GB DDR4-2133MHz ECC DIMM)
Two 1TB 7.2k SATA hard disks
Dual port InfiniBand FDR10 adaptor
Dual integrated 1Gb ethernet ports

Special Purpose Compute Nodes

GPU nodes – 4 HP ProLiant SL250s Gen8 servers. Each node consists of:
- Two 10-core Intel E5-2670v2 2.5GHz (Ivy Bridge EP) processors
- 96GB memory(6x 16GB DDR3-1600MHz ECC DIMM)
- Two Nvidia Tesla K20X GPU(2688 CUDA cores, 6GB of GDDR5 on-board ECC memory)
- Two 1TB SAS 7.2k hard disks
- Dual port InfiniBand FDR10 adaptor
- Dual integrated 1Gb ethernet ports
MIC nodes – 2 HP ProLiant SL250s Gen8 servers. Each node consists of:
- Two 10-core Intel E5-2670v2 2.5GHz (Ivy Bridge EP) processors
- 96GB memory(6x 16GB DDR3-1600MHz ECC DIMM)
- Two Intel Xeon Phi 7120P coprocessor(61 cores@1.238GHz, 16GB memory of peak bandwidth of 352GB/s)
- Two 1TB SAS 7.2k hard disks
- Dual port InfiniBand FDR10 adaptor
- Dual integrated 1Gb ethernet ports
Large memory nodes – 3 HP ProLiant DL560 Gen8 servers. Each node consists of:
- Four 10-core Intel E5-4640v2 2.2GHz (Ivy Bridge EP) processors
- 512GB memory(32x 16GB DDR3-1600MHz ECC DIMM)
- Two 1TB SAS 7.2k hard disks
- Dual port InfiniBand FDR10 adaptor
- Four integrated 1Gb ethernet ports

Interconnect

HPC2015 cluster system is interconnected with 40Gbps FDR10 InfiniBand technology based on a 2-tiered fat-tree topology. It supports a rack-level full bandwidth and 2:1 oversubscription cross-rack bandwidth.

File Storage

File system of the cluster is designed for serving different purposes.The design is more elegant as compared to the older HPC clusters in ITS, due to availability of the additional Lustre file system. User should use the most suitable storage for different kinds of tasks.

Home Directory Storage: $HOME

Home directory is the initial working directory upon login. Each user has 20GB primary storage of his/her home directory(/home/$user) , which is used for everyday data storage. Home directory is exported from NFS file system, and is mounted as “shared” storage over all cluster nodes. This is the main location for storing source code, executables and results for data processing and analysis. Home directories are backed up in daily basis. User can use the home directory for light disk I/O operations, such as program editing or compilation. However, I/O intensive or large data analysis jobs should not be processed in home directory. There are other high performance storage($WORK or /tmp) that is more suitable for these large data I/O intensive tasks.

Quota limit	20 GB
Accessible from	Frontend nodes and all compute nodes
Performance	Moderate Not appropriate for I/O intensive or large numbers of jobs
Backup	Daily
Purge policy	Not purged

High-performance Working Directory Storage: $WORK

HPC2015 has storage built specifically for high-performance temporary use. Each user has quota limit of 100GB in his/her working directory(/data/$user). The working directory is short term, large shared storage which is managed by Lustre parallel file system that provides excellent performance for HPC environments. This file system is fast for large files I/O, but is not efficient for small files(< 1 MB). Files are not backed up and will be removed if access time is longer than 60 days. User should change to this directory in batch scripts and run jobs in this file system. I/O intensive or large data jobs that exceed the 10GB home directory storage can be processed in this working directory. Be reminded to move the data you wish to retain to elsewhere(e.g. home directory) as the system will remove these files automatically after 60 days.

Quota limit	100 GB
Accessible from	Frontend nodes and all compute nodes
Performance	High performance Appropriate for I/O intensive and data-intensive jobs Not efficient for small files(<1 MB)
Backup	Not backed up
Purge policy	Automatically purged after 60 days

Local high-performance Temporary Storage(per node): /tmp

Each compute node contains a local directly connected temporary storage /tmp which is used for temporary files written and removed during the job execution. If your computations run only on one node, the local temporary storage can give very high I/O performance for temporary files while job is running. The /tmp storage, however, is local to each node and not accessible from other nodes in the cluster. Thus, data has to be shared between compute nodes during job running is not suitable storing in this local /tmp file system. Compute node will clean up files from /tmp immediately after the job terminates.

Size limit	1.5TB per compute node
Accessible from	Attached compute node
Performance	High performance. Appropriate for I/O intensive jobs(local I/O)
Backup	Not backed up
Purge policy	Purged immediate after job terminated