How to allocate job resources

This article explains how to allocate resources for batch computing on LOTUS. It covers:

It is essential in shared resources environment to closely specify how to run a batch job on LOTUS. Hence,  allocating resources such as the queue, the memory requirement,  the job duration  and the number of cores  is a requirement and this is done by adding specific options to the  job submission command bsub, as detailed below.

LOTUS queues

All jobs wait in queues until they are scheduled and dispatched to hosts. The  short-serial  queue is the default queue and it should be used for all serial jobs unless there is a memory requirement of over 512 GB per jobs in which case the high-mem queue should be used. An example on how to set a job to a given queue defined by its queue-name is: 

$ bsub -q short-serial < myjob

to view available queues, run the following command:

$ bqueues
test             40  Open:Active       -    -    -    -     0     0     0     0
cpom-comet       35  Open:Active     128    -    -    -  1664  1536   128     0
rsg-general      35  Open:Active     482    -    -    -     6     0     6     0
rsgnrt           35  Open:Active      30    -    -    -    18     0    18     0
copy             30  Open:Active       -    -    -    -     0     0     0     0
sst_cci          30  Closed:Inact     96    -    -    -     0     0     0     0
ingest           30  Open:Active       -    -    -    -     1     0     1     0
short-serial     30  Open:Active    3000 2000    -    - 49163 46166  2997     0
high-mem         30  Open:Active      96   48    -    -     0     0     0     0
par-single       25  Open:Active     512  256    -    -    20     0    20     0
par-multi        20  Open:Active     512  256    -    -   404   320    84     0
long-serial      10  Open:Active     512  256    -    -    31     0    31     0

Queues other than the five  public queues: short-serial, long-serial, par-single, par-multi and high-mem should be ignored as they  implement different job scheduling and control policies. Queues can use all server hosts in the cluster, or a configured subset of the server hosts. 

Note: STATUS is Open and  the queue is Active for a job to be dispatched.

Job duration 

-W 00:30  Sets the runtime limit of your job to a predicted  time in hours and minutes (e.g. 30 mins) - if you do not specify the run time with -W, the default maximum of 1 hour applies

Each queue has a specific maximum allowed job duration see  Table 1. Any jobs exceeding this limit  will be aborted automatically (even if a longer duration is specified)

Specifying memory requirements

Any jobs requiring more then 4GB RAM (which is the memory per core for the lowest-specification host type in LOTUS)  must specify the memory needed with the -R flag:

$ bsub –R “rusage[mem=XXX]”

where XXX is the memory size in unit MB.

Any jobs using extra memory that have been submitted without this flag may be killed by the service administrators if found to be adversely affecting the performance of other users' jobs.

Memory limit control

The memory limit control is enforced on jobs submitted to short-serialand long-serialqueues. For jobs with allocated memory requirement greater than 8GB, the memory limit control has to be specified otherwise the default memory limit 8GB will apply and the job will be terminated if it exceeds 8GB. 

Note in the following:

$ bsub -R "rusage[mem=XXX]" -M YYY

  XXX is in units of MB  and  YYY is the memory limit in units of MB

In summary:

If... Then...
bsub -R “rusage[mem=XXX]”
bsub -R “rusage[mem=10000]
the default memory limit of 8000MB (8GB) is enforced

this job will be killed when it exceeds 8000 MB (8GB)
bsub -R “rusage[mem=XXX]” -M YYY
If YYY < maxlimit = 64000 MB (64GB)
If YYY > maxlimit = 64000 MB (64GB)

bsub -R “rusage[mem=15000]” -M 15000      

 YYY is enforced
maxlimit of 64 GB is enforced

this job will be killed if it exceeds 15000 MB (15GB)    

Read the   bsub manual page for more information about the -R and -M options including other select key words.

Selecting high-memory hosts

The second phase of LOTUS compute, added in spring/summer 2014, enables high-memory nodes to be selected using the  bsub -R and -Moptions, for example:

$ bsub -R "select[maxmem > 128000]"

This will select from machines with greater than 128000 MB physical RAM (units are always in MB) but this doesn't guarantee how much memory is allocated to that job. To target a host with enough free memory, try adding the resource usage:

$ bsub -R "select[maxmem > 128000] rusage[mem=150000]"

A Job with a high-memory hosts selected to/or greater than 64GB should be submitted to high-mem queue or par-single queue. 

$ bsub -R "select[maxmem > 128000] rusage[mem=150000]" -q high-mem

12 such high-memory (512GB) hosts are currently available. 

Exclusive host use

Adding -x option to the bsub command puts the host running your job into exclusive execution mode and hence avoid sharing with other jobs. This is recommended for very large memory jobs or parallel MPI jobs only.

$ bsub -x < myscript

Spanning multiple hosts for additional memory per process:  

This is  to restrict the number of processes run per host . For example, to run only one process per host use :

$ bsub -R "span[ptile=1]" < myscript

Number of cores

LSF can allocate more than one core to run a job and automatically keeps track of the job status, while a parallel job is running. When submitting a parallel job that requires multiple cores, you can specify the exact number of cores to use.

To submit a parallel job, use -n <number of cores>; and specify the number of cores/processors the job requires. For example:

$ bsub -n 4 myjob

The job "my job" submits as a parallel job. The job is started when four cores are available.

The /work/scratch-nopw and /work/scratch-pwdirectories

The /work/scratch-nopw (size 90TB)  is on new flash-based storage which should have significant performance benefits particularly for operations involving many small files. Please create a subdirectory:

$ mkdir /work/scratch-nopw/newuser

The  /work/scratch-pw (size 1PB) is the largest temporary area. This temporary filespace is shared across the whole LOTUS  cluster, to allow parallel and MPI-IO jobs to access the same files over the course of their execution. This directory uses the Panasas high-speed parallel file system. Please create a subdirectory :

$ mkdir /work/scratch-pw

Note: However, you should configure your software to use  /work/scratch-pw  ONLY if you think you need a shared file writes with MPI-IO.

In contrast, the  /tmp directories are all local directories, one per host. These can be used to store small temporary data files for fast access by the local process. Please make sure that your jobs delete any files in /tmp when they complete. Note also that large volumes of data cannot be stored on the local /tmp disk. Please use the /work/scratch-pw directory or group workspaces for large data volumes, but be sure to remove data as soon as possible afterward. 

Data in these directories is temporary and may be arbitrarily removed at any point once your job has finished running. Do not use them to store important output for any significant length of time. Any important data should be written to a group workspace so that you do not lose it.

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.