High Performance Computing

The SGE job scheduler

The Sun Grid Engine (SGE) job manager is the only permitted way to run compute jobs on the Nottingham clusters. SGE can manage usage of nodes, allows jobs to be queued when all nodes are in use and is highly configurable. On a multiuser system SGE avoids conflicts in resource usage and is vital in order to maintain a high job throughput. There is a brief guide to using SGE here. More experienced users may find the SGE User guide SGE6-User.pdf located under /opt/streamline/SGE6 on the login nodes.

The scheduler allocates priorities automatically, based on job size (larger jobs get higher priority so that serial jobs don't 'hog' the system), and also taking into account the amount of resource a user has used recently, so reducing priority for those who have used a lot of resource. In addition, the system is configured with the aim of giving fair share of resource for all types of job, and yet maintaining high utilisation and throughput. Users who have particular concerns about meeting deadlines, or who are having particular difficulty in getting jobs to run in a timely manner are encouraged to contact HPC support to discuss their requirements.

To get users started, the /software/EXAMPLE directory, available from all login nodes, contains example code to compile, scripts to submit jobs, and README files with appropriate instructions. This should be copied to your own /work area as required.

Please note: there is a default limit of 2 days on the length of time jobs are allowed to run on the system. You may select a longer time limit for your jobs, up to a maximum of 14 days. Jobs which reach the 14 day limit will be terminated by the system. You are strongly advised to select a time limit for your job, using the "-l h_rt' resource option, as described in Commands, queues and resources.

Calculations which take longer than 14 days should be cut into sub-tasks, submitted as seperate jobs. If you think this limit will impose particular difficulties on your use of the system, please contact HPC support.