February 11, 2014 at 12:00 am #7245
I stumbled upon vSMP and I have questions about setting up a testbed for the vSMP Foundation Free version.
From the discussion in one topic about vSMP and InfiniBand, I infer that the vSMP virtual machine presents a NUMA SMP architecture to the guest O/S — Linux in my case — and that vSMP uses a dedicated InfinBand interface to implement the NUMA memory model. O/S features that are NUMA-aware would then permit allocation of physical memory to a process on the same node it is executing on.
I have 8 nodes I can experiment with. Each has two quad-core CPUs. The free version is limited to 4 processors per node and 4 processors per VM. Are these cores or CPUs?
If cores, can these be assigned one core per node so that a each core can primarily work on nearby memory? I also assume that at least four of the nodes would be memory-only nodes in an eight-node VM using the free version. What kind of performance hit should we expect compared to a configuration without memory-only nodes when using a NUMA-aware heap allocator?
Thank you.February 11, 2014 at 9:12 am #7652
Your first paragraph is generally correct (at a simplified, 30,000 ft level). vSMP Foundation aggregated several x86 nodes into a single virtual system, running a single Linux OS, which sees the underlying virtual machine as a NUMA shared-memory system (I say this is simplified because we actually do much more than mere ccNUMA, but that’s a discussion fit more for a whiteboard)
We try to avoid the term CPU as it may confuse the reader. The terminology we prefer to use is: processor == socket == physical_silicon_package. If we do use CPU then we mean it in the sense of processing logical unit, e.g. core (or SMT, where relevant). You have 8 dual-socket nodes, each socket carrying 4 cores. You did not write how much RAM you have on each node, so for the sake of example I would assume 96 GB RAM per node. If you apply vSMP Foundation Free to this 8-node cluster, the result would be a system with 8 cores and more than 700 GB RAM.
The available cores cannot be cores from across all nodes, as the vSMP Foundation Free license model allows for use of computing and IO resources only from one node, while RAM is aggregated with other (up to 8) nodes. In other words, all but one node are “memory-only” nodes (you will designate which of the 8 nodes to use the compute and IO out of the cluster during installation)
In terms of performance, you should actually expect better performance compared with a hard-wired SMP system that is similarly configured. For example, if you run a workload requiring 300GB RAM, it is expected to run at similar or better speed than if you ran the same workload on a 4-socket system with same-generation core architecture and 512GB physically installed and when using same number of cores. We have a track record of outperforming HW-only SMPs when it comes to large-memory workloads, and you can see many examples on http://scalemp.mywebdev.a2hosted.com/performance.
Of course, for scalability on using compute and RAM on multiple nodes (System Expansion, as opposed to Memory Expansion), actual performance would vary, mostly based on the implementation of the application. You can see some nice multi-threaded examples in the same URL referred to above.
Hope this helps.
You must be logged in to reply to this topic.