HTCondor commands

Some commands are useful for keeping track of your job’s progress. In the following examples, 123 and 123.0 would mean your job’s cluster ID and cluster+job ID.

condor_q— To generate a list of unmatched jobs

  • List all jobs in the queue: condor_q
    • Status codes: I — Idle, R — Running, C — Completed, X — Cancelled, H — Held (for handle held jobs, check the Example below)
  • Check if the job 123.0could be run. If it is held, the command provides reason for that: condor_q -analyze 123.0

condor_rm— To cancel jobs

  • Cancel job 123.0: condor_rm 123.0
  • Cancel all jobs in cluster 123: condor_rm 123

condor_status— To show the list of all PCs and whether they are available

  • List all available nodes: condor_status
  • List all available 24-hour nodes: condor_status -constraint "strcmp(substr(Machine,0,3),\"HTC\")=?=0"

condor_history— To show the history of run/cancelled jobs

  • List all history: condor_history
  • List history of your own jobs: condor_history -y hku_portal_ID
  • Check the wall clock time of the job at the node: condor_history 123.0 -format "%f\n" RemoteWallClockTime
  • Check the last node which processed the job: condor_history 123.0 -format "%s\n" LastRemoteHost

condor_hold— To hold jobs manually

  • Hold job123.0: condor_hold 123.0
  • Hold all jobs in cluster123: condor_hold 123

condor_release— To release jobs manually

  • Release your job:condor_release 123.0
  • Release all jobs in cluster123: condor_release 123

condor_qedit— To edit job submission

  • To reset the requirement string to the one we recommend: condor_qedit 123.0 Requirements "( Target.OpSys == \"WINDOWS\" && ( Target.Arch == \"INTEL\" || Target.Arch == \"X86_64\" ) && ( strcmp(substr(Target.Name,6,1),\"N\") =?= 0 ) )"

Examples

  1. You list all your jobs: condor_q -y hku_portal_ID
  2. You found one of your job is held (say, job 123.0has status “H” in the condor_qlistings) and you want to investigate: condor_q -analyze 123.0
    1. If the error is “file not found”, just put the file to the indicated path, and release the job: condor_release 123.0
    2. If the error is related to the Requirementsstring, you may want to “reset” to the default one on our HTCondor system: condor_qedit 123.0 Requirements "( Target.OpSys == \"WINDOWS\" && ( Target.Arch == \"INTEL\" || Target.Arch == \"X86_64\" ) && ( strcmp(substr(Target.Name,6,1),\"N\") =?= 0 ) )"
    3. If you decide to cancel the job altogether: condor_rm 123.0

Further Reading

http://www.iac.es/sieinvens/siepedia/pmwiki.php?n=HOWTOs.CondorUsefulCommands