VirtualBox performance degradation when running multiple VMs in parallel

performancevirtualbox

At Travis CI (http://travis-ci.org) we use VirtualBox VMs (through Vagrant) for
running tests for the Ruby community.

On our worker servers we have up to (up to) N parallel processes running N test
suites in N VMs in parallel, i.e. one worker process runs one test suite in one
VM at a time, but N of them are running concurrently.

Now, as soon as many workers are actually performing builds in parallel the
performance of each build will degrade significantly compared to when the very
same build would be run in a single worker (and nothing else running in
parallel).

Here is an example:

This "build matrix" consists of 20 individual builds:

http://staging.travis-ci.org/#!/svenfuchs/rails/builds/1906

At the time when this was run there were 10 workers running, so this build
started out with 10 individual builds being executed in 10 workers (and VMs) in
parallel. This build is one of them and it has taken ~ 2 hours to complete:

[see the last link in the list on the page above, i only can post 2 urls]

The very same build would take only ~ 20 minutes when there are no other builds
being executed in parallel. Here's an example of that:

http://staging.travis-ci.org/#!/svenfuchs/rails/builds/1927

This performance degradation obviously is something we need to sort out but
we're not sure where to look.

The test suite basically executes Ruby processes which might shell out and spawn
several other Ruby processes each executing unit tests on the codebase. Some of
them hit databases such as MySQL, Sqlite3 and Postgres but we also notice the
same sort of degradation with tests that do not hit any database at all.

The worker server that hosts these processes and VMs looks like this:

  • Linux 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011 x86_64 GNU/Linux
  • 12x (Hexacore) Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
  • 12 GB Memory

Each VM:

  • Linux lucid32 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 21:21:01
    UTC 2011 i686 GNU/Linux
  • 1x Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
  • 1 GB Memory

Any hints on how to sort this out or maybe just better identify the root problem
would be highly appreciated.

Thanks!

Best Answer

You don't mention what the underlying disk is like on these servers, but these type of performance issues are nearly always IO-related. What are the IO stats looking like when you're running multiple builds versus just one?

Additionally, you'd get much better performance out of your hardware by using something like Xen or VMware ESXi as opposed to VirtualBox.