TORQUE hands-on
Prepared by Vangelis Koukis, Computing Systems Laboratory, ICCS-NTUA.

Introduction

This hands-on will guide you through basic examples of using the TORQUE scheduler. You will submit, examine the environment of and destroy batch and interactive jobs.

Access

To complete this hands-on lab you need to be logged in a TORQUE submit node, where the TORQUE client utilities are installed. We will use the Grid UI, part of TUC's Grid node, for this purpose, as it has been setup as a TORQUE submit node for the local compute cluster. You have to use ssh to log into the UI. If you use Windows and don't have an ssh client you can download PuTTY which is one of the most popular, free clients.

If you don't meet this requirement, please contact the trainers for assistance.

Download the examples

Log into the UI and download the tutorial application files:

   
wget http://cslab.ece.ntua.gr/tuc/torque.tar
 
Unpack the file in your home directory:
tar xvf torque.tar

and change to the directory created:

cd torque

The package contains the following:

  • simple.pbs
  • task.sh

Log into the UI in a second window, so you can issue two commands at once and notice the effect each one has on the other.

Submit a simple interactive job

Submit a simple interactive job, and examine its environment, especially $PBS_ENVIRONMENT and the contents of $PBS_NODEFILE:
[vkoukis@ui ~]$ qsub -I -q tuc
qsub: waiting for job 1081.ce01.grid.tuc.gr to start
qsub: job 1081.ce01.grid.tuc.gr ready

[vkoukis@wn034 ~]$ echo $PBS_
$PBS_ENVIRONMENT  $PBS_JOBNAME      $PBS_NODENUM      $PBS_O_LANG       $PBS_O_PATH       $PBS_O_WORKDIR    $PBS_TASKNUM
$PBS_JOBCOOKIE    $PBS_MOMPORT      $PBS_O_HOME       $PBS_O_LOGNAME    $PBS_O_QUEUE      $PBS_QUEUE        $PBS_VERSION
$PBS_JOBID        $PBS_NODEFILE     $PBS_O_HOST       $PBS_O_MAIL       $PBS_O_SHELL      $PBS_SERVER       $PBS_VNODENUM
[vkoukis@wn034 ~]$ echo $PBS_JOBID
1081.ce01.grid.tuc.gr
[vkoukis@wn034 ~]$ echo $PBS_ENVIRONMENT
PBS_INTERACTIVE 
[vkoukis@wn034 ~]$ cat $PBS_NODEFILE
wn034.grid.tuc.gr
[vkoukis@wn034 ~]$ echo $PBS_NODEFILE
/var/spool/pbs/aux//1081.ce01.grid.tuc.gr
Using ls and cd verify that the Worker Node shares the same home directory structure with the UI.

Logout of the job, by pressing Ctrl-D, or typing exit at the prompt. Repeat the submission, this time providing a proper job name with the -N argument. Also, repeat the submission requesting a larger number of nodes and notice how the contents of $PBS_NODEFILE change. Don't request more than two nodes, otherwise your job may be blocked for a very long time before it has a chance to run.

In the second window, use qstat with arguments -a and -f to examine the execution queue.

Kill the currently running interactive job

In the second window, run qdel on the currently running job. Notice what happens. How long does it take for the job to really die?

[vkoukis@ui ~]$ qdel 1081

...

qsub: job 1081.ce01.grid.tuc.gr completed
[vkoukis@ui ~]$

Alter a currently running job

Run qalter on a currently running job and change its maximum wallclock time to something ridiculously small, such as 20 seconds. Notice what happens when TORQUE notices that the job can no longer continue running:
[vkoukis@ui ~]$ qalter -l walltime=00:00:20 jobID_here
Also try using qsig on a running job, with a signal such as SIGKILL:
[vkoukis@ui ~]$ qsig -s SIGKILL jobID_here

Submit a simple script

Make the file simple.pbs executable, then show its contents and run it directly at the command line. Notice the PBS-specific comments and how they are ignored. Then, submit it as a batch process (no -I argument this time) using qsub. Remember to use the -k oe argument to keep both the standard output and standard error files.
[vkoukis@ui ~]$ chmod +x ./simple.pbs
[vkoukis@ui ~]$ cat simple.pbs
[vkoukis@ui ~]$ qsub -k oe -q tuc ./simple.pbs
Use cat on the standard output and error files to examine the output of the batch job.

Examine different node requirement specifications

Play with differnt node requirement specifications in qsub, preferably in an interactive job and notice how the contents of $PBS_NODEFILE change every time:
[vkoukis@ui ~]$ qsub -I -q tuc -l nodes=1+2+3+4
qsub: waiting for job 1090.ce01.grid.tuc.gr to start
qsub: job 1090.ce01.grid.tuc.gr ready
...
cat $PBS_NODEFILE

Challenge: Launch a shell script on multiple nodes

Start an interactive job with more than one cores, then use pbsdsh to launch the small task.sh script on all the allocated nodes. Notice the values of $PBS_TASKNUM, $PBS_NODENUM and $PBS_VNODENUM being printed:
[vkoukis@ui torque-hands-on]$ qsub -I -l nodes=4 -q tuc
qsub: waiting for job 1386.ce01.grid.tuc.gr to start
qsub: job 1386.ce01.grid.tuc.gr ready

[vkoukis@wn034 ~]$ cd torque-hands-on/
[vkoukis@wn034 torque-hands-on]$ pbsdsh `pwd`/task.sh
Thu May 7 22:54:28 EEST 2009 Task running on wn034.grid.tuc.gr: PBS_TASKNUM=6, PBS_NODENUM=0, PBS_VNODENUM=0.
Thu May 7 22:54:28 EEST 2009 Bye
Thu May 7 22:54:29 EEST 2009 Task running on wn032.grid.tuc.gr: PBS_TASKNUM=8, PBS_NODENUM=2, PBS_VNODENUM=2.
Thu May 7 22:54:29 EEST 2009 Task running on wn031.grid.tuc.gr: PBS_TASKNUM=9, PBS_NODENUM=3, PBS_VNODENUM=3.
Thu May 7 22:54:29 EEST 2009 Bye
Thu May 7 22:54:29 EEST 2009 Bye
Thu May 7 22:54:29 EEST 2009 Task running on wn033.grid.tuc.gr: PBS_TASKNUM=7, PBS_NODENUM=1, PBS_VNODENUM=1.
Thu May 7 22:54:29 EEST 2009 Bye
[vkoukis@wn034 torque-hands-on]$
Notice how you must provide pbsdsh with the absolute path to the command, using `pwd`

Challenge 2: Launch an array of jobs

Modify simple.pbs so that it prints out the value of $PBS_ARRAYID. Then submit an array of jobs, for a variety of indexes using the -t argument of qsub. Make sure you include the -k oe argument to keep the output and error files of every job array member. Notice the value printed from each job based on its array index.