Sure, this is totally possible. SGE queues are independent of one another, so you can assign whatever nodes you would like to each queue, letting them overlap however you wish.
To create a queue, type qconf -aq
: this will open up your default editor (usually vim). Type the name of the queue as the qname
, add the hosts you would like to assign in the hostlist
, and for slots
, add a comma-delimited list of entries of the format [hostname=numslots]
. Typically the number of slots is the number of cores in the host, but you can under- or over-subscribe if you prefer. If you want the queues to overlap, just add the same hosts to multiple queues.
Note, however, that by default the overlapping queues are not aware of each others' usage. They will both cheerfully assign jobs to the same node and expect them to run.
The most common way to prevent this is to makes nodes job-exclusive, so only one job may run at a time. (This is the default in other schedulers like PBS.) SGE makes this a little complicated, and involves creating a virtual "resource" which can only be used once per node. To do this, type qconf -mc
to manage consumable resources. This will open an editor listing consumable resources: add a new one called "exclusive", like so:
#name shortcut type relop requestable consumable default urgency
#-----------------------------------------------------------------------------------------
exclusive excl BOOL EXCL YES YES 1 1000
For more information, see the grid engine wiki.
You can also configure what are called subordinate queues. In this, you set one queue up so that it will automatically override the other when over a certain number of slots-per-node are assigned. To set this up, run qconf -mq queue1
and under "subordinate", specify queue2=N
. Then whenever the number of slots used on a node in queue1 is over N, the job in queue2 will be suspended until the queue1 job is complete.
The solution I found is to make a new parallel environment that has the $pe_slots
allocation rule (see man sge_pe
). I set the number of slots available to that parallel environment to be equal to the max since $pe_slots
limits the slot usage to per-node. Since starcluster sets up the slots at cluster bootup time, this seems to do the trick nicely. You also need to add the new parallel environment to the queue. So just to make this dead simple:
qconf -ap by_node
and here are the contents after I edited the file:
pe_name by_node
slots 9999999
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $pe_slots
control_slaves TRUE
job_is_first_task TRUE
urgency_slots min
accounting_summary FALSE
Also modify the queue (called all.q
by starcluster) to add this new parallel environment to the list.
qconf -mq all.q
and change this line:
pe_list make orte
to this:
pe_list make orte by_node
I was concerned that jobs spawned from a given job would be limited to a single node, but this doesn't seem to be the case. I have a cluster with two nodes, and two slots each.
I made a test file that looks like this:
#!/bin/bash
qsub -b y -pe by_node 2 -cwd sleep 100
sleep 100
and executed it like this:
qsub -V -pe by_node 2 test.sh
After a little while, qstat
shows both jobs running on different nodes:
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
25 0.55500 test root r 10/17/2012 21:42:57 all.q@master 2
26 0.55500 sleep root r 10/17/2012 21:43:12 all.q@node001 2
I also tested submitting 3 jobs at once requesting the same number of slots on a single node, and only two run at a time, one per node. So this seems to be properly set up!
Best Answer
SGE is weird with this, and I haven't found a good way to do this in the general case. One thing that you can do, if you know the memory size of the node you want, is to qsub while reserving an amount of memory almost equal to the full capacity of the node. This will ensure it grabs a system with nothing else running on it.