HTCondor commands
Some commands are useful for keeping track of your job’s progress. In the following examples, 123 and 123.0 would mean your job’s cluster ID and cluster+job ID.
condor_q— To generate a list of unmatched jobs
- List all jobs in the queue:
condor_q- Status codes: I — Idle, R — Running, C — Completed, X — Cancelled, H — Held (for handle held jobs, check the Example below)
- Check if the job
123.0could be run. If it is held, the command provides reason for that:condor_q -analyze 123.0
condor_rm— To cancel jobs
- Cancel job
123.0:condor_rm 123.0 - Cancel all jobs in cluster
123:condor_rm 123
condor_status— To show the list of all PCs and whether they are available
- List all available nodes:
condor_status - List all available 24-hour nodes:
condor_status -constraint "strcmp(substr(Machine,0,3),\"HTC\")=?=0"
condor_history— To show the history of run/cancelled jobs
- List all history:
condor_history - List history of your own jobs:
condor_history -y hku_portal_ID - Check the wall clock time of the job at the node:
condor_history 123.0 -format "%f\n" RemoteWallClockTime - Check the last node which processed the job:
condor_history 123.0 -format "%s\n" LastRemoteHost
condor_hold— To hold jobs manually
- Hold job
123.0:condor_hold 123.0 - Hold all jobs in cluster
123:condor_hold 123
condor_release— To release jobs manually
- Release your job:
condor_release 123.0 - Release all jobs in cluster
123:condor_release 123
condor_qedit— To edit job submission
- To reset the requirement string to the one we recommend:
condor_qedit 123.0 Requirements "( Target.OpSys == \"WINDOWS\" && ( Target.Arch == \"INTEL\" || Target.Arch == \"X86_64\" ) && ( strcmp(substr(Target.Name,6,1),\"N\") =?= 0 ) )"
Examples
-
You list all your jobs:
condor_q -y hku_portal_ID -
You found one of your job is held (say, job
123.0has status “H” in thecondor_qlistings) and you want to investigate:condor_q -analyze 123.0-
If the error is “file not found”, just put the file to the indicated path, and release the job:
condor_release 123.0 -
If the error is related to the
Requirementsstring, you may want to “reset” to the default one on our HTCondor system:condor_qedit 123.0 Requirements "( Target.OpSys == \"WINDOWS\" && ( Target.Arch == \"INTEL\" || Target.Arch == \"X86_64\" ) && ( strcmp(substr(Target.Name,6,1),\"N\") =?= 0 ) )" -
If you decide to cancel the job altogether:
condor_rm 123.0
-
If the error is “file not found”, just put the file to the indicated path, and release the job:
Further Reading
http://www.iac.es/sieinvens/siepedia/pmwiki.php?n=HOWTOs.CondorUsefulCommands
