If you are getting 280 sessions on a VM with two vCPU's and 500+ on the same physical box using all 4 cores then you are coming close to the bare metal performance with that VM config. Reconfigure the VM to use 4 vCPU's and performance should come close provided nothing else on the ESX host is consuming significant resources. However for a 4-vCPU VM running on a 4 Core single CPU host the Hypervisor overhead is going to take up some fraction of the overall capacity and that might be quite significant. The ESX hypervisor will only schedule your VM when it can schedule all 4 vCPU's concurrently so anything else running (Hypervisor, Service Console, other VM's) will cause all 4 vCPU's to stall on a setup like this. On this setup if it is possible for you to run your application across two dual vCPU VM's you may find that it scales better even with the added overhead of running an additional Guest OS, the scheduling problem will be a lot easier for the Hypervisor to deal with as you will only stall two cores when other tasks need to be given access to CPU resources.
Each VMware vCPU equates to a single core on a multi-core system, that's how VMware carves up processor resources. The Yorkfield is a Core2 Quad and definitely does not support HT - it has 4 physical cores with no HT cores.
CPU-Z running in a VM will only report the number of vCPU's that are presented to the VM, although it will identify the underlying CPU correctly. Depending on the ESX version and how the VM is configured it can present those as a dual core single processor or as two separate processors but that has no impact on perfromance, it is simply a presentation choice that is used to facilitate certain licensing situations.
Edited to provide more accurate and current data:
The strict co-scheduling point made above was a bit of a red herring. ESX has been using a relaxed co-scheduling mechanism since ESX V3 that allows some leeway (clock drift between vCPU cores) and that has improved with subsequent versions.
It is still generally true that a VM that presents as many vCPU's as there are physical cores will have more trouble being scheduled under load than VM's with fewer vCPU's but it is not as dramatic as it comes across in my original answer. A very detailed explanation of how it actually works can be found in this VMware white paper.
On a Big Endian-System (Solaris on SPARC)
$ echo -n I | od -to2 | head -n1 | cut -f2 -d" " | cut -c6
0
On a little endian system (Linux on x86)
$ echo -n I | od -to2 | head -n1 | cut -f2 -d" " | cut -c6
1
The solution above is clever and works great for Linux *86 and Solaris Sparc.
I needed a shell-only (no Perl) solution that also worked on AIX/Power and HPUX/Itanium. Unfortunately the last two don't play nice: AIX reports "6" and HPUX gives an empty line.
Using your solution, I was able to craft something that worked on all these Unix systems:
$ echo I | tr -d [:space:] | od -to2 | head -n1 | awk '{print $2}' | cut -c6
Regarding the Python solution someone posted, it does not work in Jython because the JVM treats everything as Big. If anyone can get it to work in Jython, please post!
Also, I found this, which explains the endianness of various platforms. Some hardware can operate in either mode depending on what the O/S selects: http://labs.hoffmanlabs.com/node/544
If you're going to use awk this line can be simplified to:
echo -n I | od -to2 | awk '{ print substr($2,6,1); exit}'
For small Linux boxes that don't have 'od' (say OpenWrt) then try 'hexdump':
echo -n I | hexdump -o | awk '{ print substr($2,6,1); exit}'
Best Answer
The WMI WIN32_Processor class gives basic info about installed processors..