You might well be better off creating it as a full-stack install including the OS and all dependencies, on an OS that you know it works well under. Then let the end user decide if they want to install it standalone or as a VM guest on KVM, VMWare, Xen, or whatever. The only thing you'd gain over this with a KVM image is some amount of hardware-agnosticity, but there are still potential pitfalls there as you've pointed out, and the site would have to be willing and able to maintain a KVM host in addition to your application and its OS.
EDIT: To respond to your reformulated question, I think you've still go to leave this stuff to the sites -- VM guests, by design, cannot control how networking is set up on the host. You can certainly provide documentation and instructions for doing this in the simplest or most effective way, but that may vary from site to site. No matter what you do, if you distribute this as a VM image or appliance, the site will have to effectively set up and maintain a VM host that is specifically configured to work on their network.
I will give very rough idea/explanation.
In OP situation, besides measuring within the VM, the host should be look at too.
In this case, we can assume the following are correct
- In all the test, the host I/O(disk) bandwidth is not max out. As VM(
"monitoring"
) I/O increases with more CPUs allocated to it. If host I/O was already max out, there should be no I/O performance gain.
"bla"
is not the limiting factor As "monitoring"
I/O performance improved without changes to "bla"
- CPU is the main factory for performance gain(in OP case) Since I/O is not the bottle neck, and OP not mention any memory size changes. But why? Or how?
Additional factor
- Write take more time than Read This is the same for VM and for host. Put it in extremely simple terms: VM wait for host to finish read and write.
What happen when more cpu assigned to "monitoring"
?
When "monitoring"
is allocated more CPUs, it gain more processing power, but it also gain more processing time for I/O.
This has nothing to do with rsync
as it is a single thread program.
It is the I/O layer utilizing the increased CPU power, or more precisely, the increased processing time.
If cpu monitoring program (eg. top) is used on "monitoring"
during test, it will show not one, but all cpu usage go up, and also %wa. %wa is wait time spend on I/O.
This performance increase will only happen when your host I/O is not max. out.
I cannot find the cpu scheduling in KVM site, but there is this blog mentioning KVM is using CFS and cgroups, following is the quote
Within KVM, each vcpu is mapped to a Linux process which in turn utilises hardware assistance to create the necessary 'smoke and mirrors' for virtualisation. As such, a vcpu is just another process to the CFS and also importantly to cgroups which, as a resource manager, allows Linux to manage allocation of resources - typically proportionally in order to set constraint allocations. cgroups also apply to Memory, network and I/O. Groups of processes can be made part of a scheduling group to apply resource allocation requirements to hierarchical groups of processes.
In a nutshell, more cpu = more cpu time = more I/O time slot in a given period of time.
Best Answer
libvirt only provides the autostart function. If this is important for you, I would set up an init script that starts VMs in a certain order. The algorithm would be
virsh start VM1
; wait for service to come up (check with ping/snmp/telnet to relevant ports);virsh start VM2
and so on