How to estimate job resources
This article is a guidance on how to estimate the amount of resource that your job requires. This is an important issue because accurate prediction of your compute requirements will (i) ensure that your job gets the resource it requires and (ii) overall LOTUS usage will be more efficient. This article covers:
- Job duration
- Memory requirements
- First time runs
- Memory usage and swap-space
In order to establish the duration of a job, for use with the
bsub -W option, the following approaches can be used. It is recommended to run a single test job and inspect the job information in this way to establish the job duration before submitting a large batch of similar jobs. Note that if your job exceeds the default job duration of 1 hour, it will be terminated, so it is important to obtain a realistic measurement to use as a prediction for further job(s).
1. Review output of an example job
If you have run an example job with LOTUS then you can review the job information to estimate the job duration:
$ bjobs -l <job_id>
bjobs is not available you might be able to access this information with:
$ bhist <job_id>
$ bjobs -l 4077555 [ ... ] RUNLIMIT 480.0 min of host138.jc.rl.ac.uk Mon Jun 13 14:31:17: Started on <host138.jc.rl.ac.uk>, Execution Home </home/users/fatima>, Execution CWD </work/scratch/fatima/mycript>; Mon Jun 13 18:05:44: Done successfully. The CPU time used is 600.5 seconds.
One way of identifying the memory requirements of a job is to look at what memory similar jobs have required. Job output logs include details about resource usage which includes the maximum memory used during the job run. For example, the following output is taken from a job output log file:
Resource usage summary: CPU time : 4.06 sec. Max Memory : 43.74 MB Average Memory : 43.74 MB Total Requested Memory : - Delta Memory : - Max Swap : 2807 MB
While a job is running it is possible to see how much memory is being used by typing the command
bjobs –l jobID and looking at the resource usage collected section:
Mon Jun 13 18:34:27: Resource usage collected. The CPU time used is 150 seconds. MEM: 315 Mbytes; SWAP: 847 Mbytes
First time runs
It can be difficult to determine an approximation of memory usage, as all jobs types are different and will depend on the application used, problem size and the data type used. When running a job for the first time it is essential to estimate an approximate memory usage. For this it is advised to run the job in one of the following ways :
- Run the job using a high memory host with a short duration time
- Run the job in an exclusive mode by using
- Run the job with a conservative memory resource request and memory limit.
Memory usage and swap space
As well as the physical memory, each host on LOTUS has a certain amount of virtual memory, or swap space available. Swap space is an area of the hard disk drive which the Operating System uses as additional memory, using a small amount of swap space in normal circumstances. When a job requires more memory than is physically available, it will start using swap space. Because swap space is on the hard disk drive it is significantly slower than real memory, and generally if a job starts swapping, it will slow to the point it will no longer be making any progress with the task and it is best to terminate the job before the node becomes unstable and crashes.