![]() |
![]() |
||||
|
Submit a Single Batch Job
Hardin and seldon (the interactive computers to which you connect using SSH) are not the computational workhorses of the cluster. The main computing power of the cluster lies in the additional 32 processors which are available to programs submitted to the batch queue. These instructions will show you how your programs can use that power. The job queue is like a valet:
Up to 8 of your jobs can be executing at a time, when resources are available. Each job will have exclusive use of one processor and up to 512MB of memory unless you specify otherwise. Design your jobs to be rerun. Do not change the files your job will be using before the job finishes. Batch jobs may be rerun for a variety of reasons -- priority decisions, node failure, or administrative maintenance. Jobs in an execution queue will be preempted (either suspended -- temporarily stopped from execution, or actually terminated -- ending the job and putting it back into the input queue) by the scheduler if more than 4 of your jobs are running at one time and another job becomes more deserving for execution as determined by a fairshare algorithm. Of the jobs eligible to be preempted, the job having used the least amount of CPU time will be selected. Preempted jobs gain priority over time, and they will be put into execution again, possibly preempting others' jobs. You can improve the thruput of your jobs, and possibly avoid preemption, if you limit the maximum amount of CPU time your job will use by specifying that limit when you submit the job. The scheduler will factor this limit into the fairshare decision-making process. If you do not specify a CPU time limit, the scheduler will assume your job will run for one month.
A shell script is a set of instructions that the cluster node needs
to find and run your program. It's a simple text file (usually with
a .txt extension). You can create it with the To create a shell script called [abc123@seldon abc123]$ nano myprog.txtWhen you are finished typing, press <Ctrl>-X to exit the nano editor and Y to save the changes.
To run one program you need a five-line shell script like this: #!/bin/bash #PBS -j oe cd ~/myprograms matlab -r 'myprog' The first line The second line An optional line following the second line tells PBS the maximum amount of CPU time the job will take. CPU time is specified in hours:minutes:seconds, so one hour would be written as "1:00:00". The following PBS directive limits (using the -l option) a job to three hours of CPU time. Place it immediately after line 2 of your shell script if you want to use it: #PBS -l cput=3:00:00 The third line of the example script is blank. It makes the script more readable. The fourth line The last line tells MATLAB to start in batch mode and run
matlab -nodisplay -r 'commands'Runs MATLAB commands or your own M-files from the working directory. Separate multiple commands with commas or semicolons (;). Do not include the pathname or a file extension (.m) to run an M-file. Put quotes around your list of commands. You can pass parameters to your M-file using this syntax, for example: matlab -nodisplay -r 'myprog(3.8, 0.2, 2.5)' If you are submitting multiple jobs which execute the same MATLAB program with different parameters, you need a way to distinguish the output files. You can do this simply by printing the parameter values in the beginning of your MATLAB code. You can also redirect MATLAB output into a log file with a name that contains the parameters. This command matlab -r 'myprog(3.8, 0.2, 2.5)' > myprog_3.8_0.2_2.5.logwill save MATLAB output in a file called myprog_3.8_0.2_2.5.log.
An alternative MATLAB command is: matlab -nodisplay < myprog.m The left arrow feeds You can also redirect the output to a special log file by adding
matlab -nodisplay < myprog.m > filename.log
stata -b do myprog.doStata will write its output to myprog.log in the working directory.
sas myprog.sasSAS will write a log file named myprog.log and an output file with results myprog.lst.
R CMD BATCH myprog.R myprog.logor R --no-save < myprog.R > myprog.logR will write a log file myprog.log.
Splus BATCH input_file output_fileSplus reads the program from input_file and writes the results to output_file.
gauss -b finance.e > finance.lstGAUSS writes its results to standard output, which in this case is redirected to the file named finance.lst
oxl finance.ox > finance.lstor oxl finance.oxo > finance.lstOx writes its results to standard output, which in this case is redirected to the file named finance.lst
Singular -t -q < adjoint.sing > adjoint.lstSingular writes its results to standard output, which in this case is redirected to the file named adjoint.lst
For other programs, see Statistical Software Manuals
You should make your shell script executable and test it before you submit it to the queue. At the prompt, type in: chmod u+x myprog.txt You can change permissions on multiple shell scripts located in the same directory at once: chmod u+x *.txt Test your script by running it (you can abort it by typing <Ctrl>-C): ./myprog.txt Remember to clean up any unwanted files your script may have created when you tested it.
The qsub -m abe -N jobname myprog.txt
The letters following
If the The last parameter (do not omit it) is the name of your shell script file.
To check the status of your jobs, type in [abc123@seldon abc123]$ qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 5002.seldon Job52 abc123 55:15:0 R A 5031.seldon simul007 def456 18:45:4 R A ... 5068.seldon m331 xyz987 0 Q A The The
pbstat condenses and interprets the output of [abc123@seldon abc123]$ pbstat ------------------------------------------------------ PBS Job ID number : 7364.seldon Job owner : abc123@seldon.it.northwestern.edu Job name : stage1.run Job started on : Tue Oct 23 08:16:08 2006 Job status : Running Mail Points : a PBS queue and server : A on seldon Job is running on : node21:mem=524228kb;ncpus=1 # of CPUs being used : 1 CPU utilization : 98% (ideal max is 100%) Elapsed walltime : 08:13:35 (max is 672:00:00) Elapsed CPU time : 08:13:00 (max is 672:00:00) Memory usage : 105.5 MB VMemory usage : 818.1 MB
If you need to remove your job from the queue before it starts, or if you want to terminate an already running batch job, type in: qdel job_id job_id is the number listed in the first column of qstat output.
When a job is finished you will see a new file in the directory from
which you typed in the You may also see a similar file with the You should use the Only text output is automatically saved in the log file. If your program
produces graphs you need to add instructions to your program to save those
graphs to disk in a file. In MATLAB this is done by the |
|||||||||||||||||||||||||||||||||
![]() |
Services |
Get Connected |
Support |
Educational Resources |
NUIT
|
|