HTCondor commands
Some commands are useful for keeping track of your job’s progress. In the following examples, 123
and 123.0
would mean your job’s cluster ID and cluster+job ID.
condor_q
— To generate a list of unmatched jobs
- List all jobs in the queue:
condor_q
- Status codes: I — Idle, R — Running, C — Completed, X — Cancelled, H — Held (for handle held jobs, check the Example below)
- Check if the job
123.0
could be run. If it is held, the command provides reason for that:condor_q -analyze 123.0
condor_rm
— To cancel jobs
- Cancel job
123.0
:condor_rm 123.0
- Cancel all jobs in cluster
123
:condor_rm 123
condor_status
— To show the list of all PCs and whether they are available
- List all available nodes:
condor_status
- List all available 24-hour nodes:
condor_status -constraint "strcmp(substr(Machine,0,3),\"HTC\")=?=0"
condor_history
— To show the history of run/cancelled jobs
- List all history:
condor_history
- List history of your own jobs:
condor_history -y hku_portal_ID
- Check the wall clock time of the job at the node:
condor_history 123.0 -format "%f\n" RemoteWallClockTime
- Check the last node which processed the job:
condor_history 123.0 -format "%s\n" LastRemoteHost
condor_hold
— To hold jobs manually
- Hold job
123.0
:condor_hold 123.0
- Hold all jobs in cluster
123
:condor_hold 123
condor_release
— To release jobs manually
- Release your job:
condor_release 123.0
- Release all jobs in cluster
123
:condor_release 123
condor_qedit
— To edit job submission
- To reset the requirement string to the one we recommend:
condor_qedit 123.0 Requirements "( Target.OpSys == \"WINDOWS\" && ( Target.Arch == \"INTEL\" || Target.Arch == \"X86_64\" ) && ( strcmp(substr(Target.Name,6,1),\"N\") =?= 0 ) )"
Examples
-
You list all your jobs:
condor_q -y hku_portal_ID
-
You found one of your job is held (say, job
123.0
has status “H” in thecondor_q
listings) and you want to investigate:condor_q -analyze 123.0
-
If the error is “file not found”, just put the file to the indicated path, and release the job:
condor_release 123.0
-
If the error is related to the
Requirements
string, you may want to “reset” to the default one on our HTCondor system:condor_qedit 123.0 Requirements "( Target.OpSys == \"WINDOWS\" && ( Target.Arch == \"INTEL\" || Target.Arch == \"X86_64\" ) && ( strcmp(substr(Target.Name,6,1),\"N\") =?= 0 ) )"
-
If you decide to cancel the job altogether:
condor_rm 123.0
-
If the error is “file not found”, just put the file to the indicated path, and release the job:
Further Reading
http://www.iac.es/sieinvens/siepedia/pmwiki.php?n=HOWTOs.CondorUsefulCommands