HTCondor Job command file
The following shows the simplest HTCondor job command file, assuming your program is abc.exe
. If you are using MATLAB or Java, you may find links to an template at the end of this page.
Basic Structure
Here is a minimalist HTCondor submission file which takes the program abc.exe
and run it.
Universe = vanilla Executable = abc.exe Requirements = ( Target.OpSys == "WINDOWS" && ( Target.Arch == "INTEL" || Target.Arch == "X86_64" ) && ( strcmp(substr(Target.Name,6,1),"N") =?= 0 ) ) request_cpus = 1 Error = err.$(Cluster).$(Process) Output = out.$(Cluster).$(Process) Log = log.$(Cluster).$(Process) queue
Three files will be automatically produced by the time of submission. They are prefixed with:
- out — output file, which are the content of the stdout generated by the process.
- err — error file, which are the content of the stderr generated by the process.
- log — log file, which records events of the job itself along its life-cycle.
The suffix is the cluster ID and process ID, which we will discuss below.
Automatic Variables
$(Cluster) — Cluster ID. A new ID is given to each condor_submit
call.
$(Process) — Job ID. A new ID is given to each job generated by the job submission script.
In HTCondor it is often displayed as two parts, separated by a dot (.) like 123.0 where 123 is the cluster ID and 0 is the job ID.
Command Line Arguments
Command line arguments to be passed to the executable (either .exe or .bat) could be specified using the expression Arguments
. For example the following sends the job ID as the command line argument:
Universe = vanilla Executable = abc.exe Arguments = $(Process) Requirements = ( Target.OpSys == "WINDOWS" && ( Target.Arch == "INTEL" || Target.Arch == "X86_64" ) && ( strcmp(substr(Target.Name,6,1),"N") =?= 0 ) ) request_cpus = 1 Error = err.$(Cluster).$(Process) Output = out.$(Cluster).$(Process) Log = log.$(Cluster).$(Process) queue
Input/Output Files
Any files which should be accessible by the executable besides itself have to be sent using the Input file mechanism. Adding the following lines send a 7-zip to the remote machine in order to extract the file and use the data within.
should_transfer_files = YES transfer_input_files = 7z.dll, 7-zip.dll, 7z.exe, data.7z
If your executable generates output files, make sure it is generated at the same folder as the executable itself. (When in doubt, make sure your files is at the folder pointed by the environment variable %CD%
or %TEMP%
. Then you can specify the path of the output files (but no wildcard could be used). Empty place-holder files are generated using the same name as your output files when you submit the jobs. The actual content will be copied over after any machine run the code. If the specified output file does not appear at the target machine when the job complete, HTCondor will generate an error and hold the job.
transfer_output_files = abc.data when_to_transfer_output = ON_EXIT_OR_EVICT
Requirements Strings
One may set the Requirements
string in order to force the job to land on a certain type of machine.
Type | Subtype | Requirements String | Remarks |
Normal Slots (which run jobs after Learning Commons closure) |
Any Windows | Requirements = ( Target.OpSys == "WINDOWS" && ( Target.Arch == "INTEL" || Target.Arch == "X86_64" ) && ( strcmp(substr(Target.Name,6,1),"N") =?= 0 ) ) |
|
Windows 10 | Requirements = ( Target.OpSysAndVer == "WINDOWS1000" && ( Target.Arch == "INTEL" || Target.Arch == "X86_64" ) && ( strcmp(substr(Target.Name,6,1),"N") =?= 0 ) ) |
~160 PCs | |
Debug/24-hour slots (for debug/develop use) |
Windows 10 | Requirements = ( Target.OpSysAndVer == "WINDOWS1000" && ( Target.Arch == "INTEL" || Target.Arch == "X86_64" ) && ( strcmp(substr(Target.Machine,0,7),"HTCW10C") =?= 0 ) ) |
10 VMs |
Local Universe (i.e., htclogin.hku.hk. Abusive use is prohibited) |
CentOS 6 htclogin |
Requirements = ( Target.OpSysAndVer == "CentOS6" && ( Target.Arch == "INTEL" || Target.Arch == "X86_64" ) && ( strcmp(substr(Target.Machine,0,7),"htclogin") =?= 0 ) ) |
1 VM |
Other Tricks
The queue
keyword can appear multiple times in a submission script and the variables’ value at that point is used to generate jobs. For example the following is a valid job description:
Universe = vanilla Executable = abc.exe Requirements = ( Target.OpSys == "WINDOWS" && ( Target.Arch == "INTEL" || Target.Arch == "X86_64" ) && ( strcmp(substr(Target.Name,6,1),"N") =?= 0 ) ) request_cpus = 1 Error = err.$(Cluster).$(Process) Output = out.$(Cluster).$(Process) Log = log.$(Cluster).$(Process) Arguments = abc queue Arguments = def queue Arguments = ghi queue
The other way is to generate multiple jobs with the same set of parameters (useful if you have randomization within your code, or your code is a randomized algorithm). For example we could run the program abc.exe as 100 different jobs:
Universe = vanilla Executable = abc.exe Requirements = ( Target.OpSys == "WINDOWS" && ( Target.Arch == "INTEL" || Target.Arch == "X86_64" ) && ( strcmp(substr(Target.Name,6,1),"N") =?= 0 ) ) request_cpus = 1 Error = err.$(Cluster).$(Process) Output = out.$(Cluster).$(Process) Log = log.$(Cluster).$(Process) queue 100
Further Reading
http://www.iac.es/sieinvens/siepedia/pmwiki.php?n=HOWTOs.CondorSubmitFile
User Applications
- MATLAB jobs
- Java jobs
- Other Packages
- We are working on some other packages. They will be added here once available.