Ubuntu – Trying to install Sun Grid Engine on Ubuntu 10.04 – can’t connect more execution hosts

gridengineUbuntu

I'm using Ubuntu 10.04 and trying to install Sun Grid Engine from Ubuntu repesitory. It works on single machine, I can submit jobs etc. But I can't make it working with any other machine. I added another execution host and installed gridengine-client gridengine-common gridengine-exec but it somehow can't communicate with master. I even turned off all firewalls to make sure it isn't causing a problem.

When I try qstat -f on master node I get:

queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
standard@neuron1               BIP   0/0/2          0.04     lx26-amd64    
---------------------------------------------------------------------------------
standard@neuron2               BIP   0/0/2          -NA-     -NA-          au

When I restart deamon on neuron2 node I get:

error: can't find connection
error: can't get configuration from qmaster -- backgrounding

When I try to run qstat -f from n2 (neuron2) node I get:

error: commlib error: access denied (server host resolves destination host "n1" as "neuron1")
error: unable to contact qmaster using port 6444 on host "n1"

I have two hostnames for this machine and it looks like the first error has something to do with it, but it would be strange if it is causing this kind of problem. I tried telnet n1 6444 and it connects.

Does anybody know what is going on here? Am I missing something?

Best Answer

Ok, the problem was indeed with doubled host names. When I removed one from it started working. I will dig it and try to find why it is that way.

Related Topic