HPCPOWER2 User guide – Research Computing, HKU ITS

General

System overview
Logging into the system
Editing the program
Configuring your account
Transferring data file
Program Compilation and Testing

Resource Management System

Torque Resource Manager
PBS Job Command file
Submitting a Job
Manipulating a Job

General

System overview

HPCPOWER2 system consist of 24 nodes. Each node contains two 64-bit Intel quad-core Xeon processors of 3GHz and 8GB of RAM. Each processor consists of 12MB of cache.

The hpcpower2 uses the Scientific Linux 5.2 as its operating system. Reading of this user guide and using of the system assumes familiarity with the Linux/Unix software environment. In order to get an understanding of the Linux/UNIX, please study the UNIX user’s guide in ITS web page.

The hpcpower2 uses the Torque Resource Manager software to distribute the computational workload across the processors. Torque, similar to OpenPBS, is a batch job scheduling application that provides the facility for building, submitting and processing batch jobs on the system.

Jobs are submitted to the system by creating a PBS job command file that specifies certain attributes of the job, such as how long the job is expected to run and, in the case of parallel programs, how many processors are needed, and so forth. PBS then schedules when the job is to start running on the cluster (based in part on those attributes), runs and monitors the job at the scheduled time, and returns any output to the user once the job completes.

Logging into the system

Logins to the hpcpower2.hku.hk system can be done through the HKU campus network by SSH. SSH is not bundled by MS Windows that you may require to download SSH client like PuTTY. Please visit SSH and Secure File Transfer for more details.

After you login, the places you on the master node, which acts as the control console for any interactive work such as source code editing, compilation, program testing and submitting jobs through Torque Resource Manager. When you log on to the master node you should be in your home directory (either /home1/$LOGNAME or /home2/$LOGNAME) which is also accessible by batch nodes.

Editing the program

You can use the command vi, emacs or pico to edit programs. Please refer to the UNIX user’s guide for detail.

Important notice for Microsoft Windows users: do not use a standard Microsoft Windows editor such as Notepad to edit files that will be used on the Linux or other Unix systems. The two systems use different sequences of control characters to mark the end of line (EOL). If you are using the system from a Microsoft Windows desktop machine, please SSH to the master node and edit the program directly using pico.

Configuring your account

Every user account is pre-configured with the necessary environment. You can use all software in the system without any modification to the system files like .rhosts, .bashrc or .bash_profile.

You can copy the most up-to-date system files from the directory /etc/skel in case your copies are deleted by accident.

When a job is submitted to the cluster through Torque a new login to your account is initiated, and any initialization commands in your startup files (.bashrc, .bash_profile, etc) are executed. In this case (running in batch mode) it is necessary not to put interactive commands such as setting tset and stty or generating outputs in the startup files. If these precautions are not taken then error messages will be written to the batch jobs error file and your program cannot run.

Transferring data file

(UPDATE 19 Aug 2013: Please use hpcpower2b.hku.hk for file transfers between HPCPOWER2 and external computers. File transfer between hpcpower2.hku.hk and external machines will be disabled on 1 Sep 2013 or soon after)

Similar to logging into the system, you can use SCP to connect hpcpower2b.hku.hk within HKU campus network to transfer files between HPCPOWER2 and external computers. SCP is not bundled by MS Windows that you may require to download SCP client like WINSCP. You may visit SSH and Secure File Transfer for more details.

Program Compilation and Testing

Compilers
The PGI Cluster Development Kit suite of compilers is installed. This includes compilers for Fortran 77(pgf77), Fortran 90(pgf90), High Performance Fortran (pghpf), C(pgcc) and C++ (pgCC). For more details on using PGI compiler in hpcpower system, please visit PGI Compiler.

Except PGI compiler, there are also other compilers supported with parallel libraries(MPICH/MPICH2/Open MPI). Please visit HPC Software List for more details.
Test Serial program
In order to test serial program, you can use this command :

./program.exe

The “./” tells the system to run the program in the current directory.
Test MPI program
Use this command to test MPI program in the master node:

mpiexec -n 8 ./proragm.exe

You can change “-n xxx” to where XXX is the number of processes for testing.

Resource Management System

Torque Resource Manager

The Torque Resource Manager is the replacement of OpenPBS resource management system, which handles the management and monitoring of the computational workload on the hpcpower. Users submit “jobs” to the resource management system where they are queued up until the system is ready to run them. Torque selects which jobs to run, what time to run, and which nodes to run, according to a predetermined site policy meant to balance competing user needs and to maximize efficient use of the cluster resources.

To use Torque, you create a batch job command file which you submit to the Torque server to run on the system. A batch job file is simply a shell script containing the set of commands you want to run on the batch nodes. It also contains directives which specify the characteristics (attributes), and resource requirements (e.g. number of nodes and maximum runtime) that your job needs. Once you create your PBS job file, you can reuse it if you wish or modify it for subsequent runs.

Since the system is set up for development of large computation jobs, the following maximum number of CPUs and maximum processing time are allowed for each batch job:

Maximum number cores for each program job = 24 (i.e. 3 compute-nodes of 8 cores)
Maximum processing time for each program job = 24 Hours (wall clock time)

Furthermore, the job scheduling is set in such a fashion that higher priority will be given parallel jobs requiring a larger number of processors.

In order to provide a fair share environment for all users, the system is set such that each user can put no more than 4 jobs on the job queue and no more than 2 of them should be running at the same time.

PBS Job command file

To submit a job to run on the MPI environment, a PBS job command file must be created. The job command file is a shell script that contains PBS directives which are preceded by #PBS.

The following is an example of a PBS command file to run a parallel job, which would require 2 node with 8 cores each node. You should only need to change items indicated in red. This file is also located in the system as /etc/skel/pbs-mpiv2.cmd.

#!/bin/sh
### Job name
#PBS -N test-mpiv2

### Declare job non-rerunable
#PBS -r n
	
### Queue name (parallel or oneday)
### Parallel jobs will be favoured by the system.
### Queue parallel: Walltime can be  00:00:01 to 10:00:00
### Queue oneday  : Walltime can be  00:10:01 to 24:00:00
#PBS -q parallel 

### Wall time required. This example is 2 hours 
#PBS -l walltime=02:00:00 

### Number of nodes 
### The following means 1 node and 1 core. 
### Clearly, this is for serial job 
###PBS -l nodes=1:ppn=1 

### The following means 2 nodes required. Processor Per Node=8, 
### i.e., total 16 CPUs needed to be allocated. 
### ppn (Processor Per Node) can be either 1 or 2 or 4 or 8. 
#PBS -l nodes=2:ppn=8 

### The following stuff will be executed in the first allocated node. 
###Please don't modify it. 
echo $PBS_JOBID : `wc -l < $PBS_NODEFILE` CPUs allocated: `
cat $PBS_NODEFILE` 
cd $PBS_O_WORKDIR 

### Define number of processors 
NPROCS=`wc -l < $PBS_NODEFILE` 
PID=`echo ${PBS_JOBID} | sed "s/.hpcpower2.hku.hk//"` 
MACHFILE=/tmp/machine.j$PID 
cat $PBS_NODEFILE | /usr/bin/uniq | /bin/awk '{print $1":8"}' > $MACHFILE
echo =========================================================== 
echo "Job Start Time is `date "+%Y/%m/%d -- %H:%M:%S"`" 

### Run the parallel MPI executable "a.out" 
time mpiexec -n ${NPROCS} -f ${MACHFILE} ./a.out > ${PBS_JOBNAME}.${PID} 
echo "Job Finish Time is `date "+%Y/%m/%d -- %H:%M:%S"`" 
rm -f ${MACHFILE}

After the PBS directives in the command file, the shell executes a change directory command to $PBS_O_WORKDIR, a PBS variable indicating the directory where the PBS job was submitted and nominally where the progam executable is located. Other shell commands can be executed as well. In the mpirun line, the executable itself is invoked.

If we are running MPI program, then the command “mpiexec -n ${NPROCS} -f ${MACHFILE} ./programfile ” should be used. It is necessary to tell the MPI how many nodes and where the machine file.

The parameter ${PBS_JOBNAME}.${JID} would redirect the standard output of the program to a text file JobName.JobID . You can inspect this file from time to time to check the progress of the program.

Submitting a Job

To submit the job, we use this command qsub

[h0xxxxxx@hpcpower2 test]$ qsub pbs-mpiv2.cmd 216.hpcpower2.hku.hk

Upon successful submission of a job, PBS returns a job identifier of the form JobID.hpcpower2.hku.hk where JobID is an integer number assigned by PBS to that job. You’ll need the job identifier for any actions involving the job, such as checking job status or deleting the job.

When the job is being executed, it stores the program outputs to the file JobName.xxxx where xxxx is the job identifier of the job. At the end of the job, the file JobName.oxxxx and JobName.exxxx would also be copied to the working directory to show the standard output and error which were not explicited redirected in the job command file.

Manipulating a Job

There are some commands for manipulating the jobs

List all your jobs status

[h0xxxxxx@hpcpower2 test]$ qa
hpcpower2.hku.hk:
			                             Req'd      Elap
Job ID     Username Queue    Jobname    SessID NDS   Time     S Time
---------- -------- -------- ---------- ------ ----- -------- - --------
216.hpcpow h0xxxxxx parallel C2Irfc     26530      1 02:00:00 R 01:33:30
226.hpcpow h0xxxxxx oneweek  MIrSi50g   6859       1 128:00:0 R 00:20:03

Job information provided


Username	:	Job owner
NDS	:	Number of nodes requested
Req’d Time	:	Requested amount of wallclock time
Elap Time	:	Elapsed time in the current job state
S	:	Job state (E-Exit; R-Running; Q-Queuing)

List all nodes

[h0xxxxxx@hpcpower2 test]$ pa
j01
			
j02
			
j03
			
j04
			
j05
			
j06
			
j07
			
j08
			
j09
			
j10
			
j11
			
j12
			
j13
			
j14
	jobs = 0/216, 1/216, 2/216, 3/216, 4/216, 5/216, 6/216, 7/216
k01
			
k02

k03
			
k04
			
k05
			
k06
		
k07
	jobs = 0/226, 1/226, 2/226, 3/226, 4/216, 5/226, 6/226, 7/226
k08
		
k09
		
k10

k11

k12

List running node(s) of a job
Command : qstat -n <JOB_ID> or qa -n <JOB_ID>

[h0xxxxxx@hpcpower2 test]$ qstat -n 216
hpcpower2.hku.hk:
                                                     Req'd      Elap
Job ID     Username Queue    Jobname    SessID NDS   Time     S Time
---------- -------- -------- ---------- ------ ----- -------- - --------
216.hpcpow h0xxxxxx parallel C2Irfc     26530      1 02:00:00 R 01:38:12
   j14/7+j14/6+j14/5+j14/4+j14/3+j14/2+j14/1+j14/0

Check CPU utilization of the node
Command : ta <JOB_ID>

[h0xxxxxx@hpcpower2 test]$ ta 216
JOBID: 216
===================================j14===================================
top - 12:10:11 up 39 days, 22:35,  0 users,  load average: 7.99, 7.98, 7.99
Tasks: 154 total,   2 running, 152 sleeping,   0 stopped,   0 zombie
Cpu(s): 64.3%us,  0.3%sy,  0.0%ni, 35.2%id,  0.1%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   8178736k total,  7027124k used,  1151612k free,    77800k buffers
Swap: 25165816k total,    11268k used, 25154548k free,  4598436k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
29107 h0xxxxxx  25   0 3911m 2.1g 7024 R  799 26.4   1:40.95 l1002.exel         
 3130 rpc       18   0  8028  140  136 S    0  0.0   0:00.00 portmap            
 3248 dbus      15   0 21248  192  188 S    0  0.0   0:00.00 dbus-daemon        
 3584 ntp       15   0 19132 4828 3732 S    0  0.1   0:05.31 ntpd               
 3674 haldaemo  15   0 30832 1324  836 S    0  0.0   0:13.27 hald               
 3682 haldaemo  17   0 12284  512  508 S    0  0.0   0:00.00 hald-addon-acpi    
28183 h0xxxxxx  18   0 65932 1320 1076 S    0  0.0   0:00.00 bash               
28203 h0xxxxxx  18   0  8168  388  308 S    0  0.0   0:00.00 pbs_demux          
28240 h0xxxxxx  25   0 63844 1148  964 S    0  0.0   0:00.00 216.hpcpower2

You can see the CPU utilization under CPU stats. This example show the process l1002.exel running in parallel on the 8-core system with 799% of the CPU utilization (800% utilization means all 8 cores of j14 are fully used). It also provides information such as memory usage and runtime of the processes.

Delete a job
Command : qdel <JOB_ID>

[h0xxxxxx@hpcpower2 test]$ qdel 216