How does cache work on hyperthreaded Intel Xeon Processors

cachehyperthreadingintel

I am running some experiments on a research database on EC2 using a c1.xlarge instance. As far as I can tell, the c1.xlarge uses 8 hyper-threaded virtual CPUs. Amazon also says that this instance uses a physical processor from the "Intel Xeon Family".

Again, the system has 8 cpus, 4 workers and 4 others (lock manager, communicator, 2 sequencers). Therefore we have a 1 to 1 thread to CPU correspondence. The experiment being run is on checkpointing. We are creating a 9th thread to take a checkpoint and evaluating its effect on throughput. The results are predictable when this 9th thread is assigned to one of the CPUs containing one of the 4 workers or lock manager – the throughput drops. However, when we place this 9th thread on one of the remaining cpus, we expect not to see any effect since these threads are not bottlenecks of the system and should not effect the worker threads. However, I am seeing throughput drop and I am searching for an explanation.

I have begun to suspect that there may be a problem if the checkpointing thread spawned on a non-worker thread is somehow invalidating the cache's of the worker threads. I dont have a good understanding of how the caching works on this particular EC2 instance or on Intel Xeon Processor and so I am looking for an explanation as to how this caching works, in particular how it works in a hyper-threaded system. Do both cpus on a hyperthreaded processor share cache? Do threads share cache across processors?

I was able to find an Intel manual which says that the intel Xeon processor 3000 and 5000 series use a "smart second level cache that enables data sharing between two cores to reduce memory traffic". Is this what the instance might be using, and if so does that mean all 8 vCPU's share cache?

Best Answer

Since there is a virtualization layer between your server OS and the hardware with EC2, there's not even a guarantee that all 8 threads from your vCPUs are executing on the same physical CPU. Trying to measure things like cache hits or patters from inside the guest are an exercise in futility. You don't have visibility into the actual hardware.

A vCPU doesn't represent a physical CPU core, and 8 vCPUs doesn't represent a single physical CPU with 8 logical cores. Certainly, the CPU scheduler for most hypervisors will try to schedule threads from the same guest to execute on the same core in a multi-processes system, but there is no guarantee.

Related Topic