Set up SGE to Fill Each Node Completely Rather than Distribute Jobs

Originally posted on Stack Overflow by mistake… See PS at bottom for response from that post.

I've search for this a while, but cannot find the answer. The problem I have is this: assume I have a SGE set up with two 12-CPU machines. I have two 1-CPU jobs to submit to the grid, but other users will often want to submit 12-CPU jobs. These are shared memory jobs that cannot be split across multiple machines. What happens is that sometimes I'll submit my two jobs and they'll each go to a separate machine, leaving each with 11/12 CPUs free. This then prevents others from running 12-CPU jobs while I'm working.

Is there a way around this? I know that you can use the fillup rules to control a single qsub (so fillup can make a 12-CPU qsub either stay on one machine, split between several, etc.), but is there a comparable setting to force separate qsub's to go to the same machine? I also know I can explicitly request a particular machine (I think it is -h machinename, or something similar), but I'd prefer to have a more robust setup than this.

Any help is appreciated. Thanks!

PS: On the Stack Overflow post, one response came in before the thread was closed suggesting using the parallel environment allocation_rule=$fill_up. Unless I've done something wrong in trying it, I don't think this satisfies the problem. From what I've seen testing, if I set to fill_up this means that the CPUs requested WITHIN a single qsub are put to the same grid machine if possible, but CPUs from DIFFERENT qsubs will still go to the low-load machine (or whatever the grid chooses), and might go to an empty machine. Testing for this involved qsubbing a few single CPU jobs, waiting ~5 min, then submitting a few more. Although sometimes the first group would end up on the same machine (I'm guessing because machine load isn't real time, so they all were sent to the same low-load machine?), but the second group would not consistently go to the same machine as the first group.

Set up SGE to Fill Each Node Completely Rather than Distribute Jobs

Best Answer

Related Topic

Best Answer

Related Solutions

Ubuntu – Trying to install Sun Grid Engine on Ubuntu 10.04 – can’t connect more execution hosts

Qsub is working but qrsh fails and only when resources are specified explicitly with -l. Why

Related Topic