The 7 nodes are identical, there is only one IB fabric. One subnet manager.
IF you notice, there are LIDs assigned to all the ports on the shared leaf switch. The 3 that don’t come up are on the same leaf switch as the 4 that do come up.
The 8th node (primary) is newer, on a separate but connected switch, same fabric. Same HCA, same amount of memory, same motherboard, newer processors.
Broadwell vs Haswell.
I would consider moving the primary to another Haswell node, but that would probably entail reissuing the license. I would very much rather not do that.
I am considering rewriting the USB sticks of the three misbehaving nodes.
The Primary node is up and running right now with 5/8 nodes.
I still need to figure out how to get into F5 setup to tell the system
to expose the QDR card in the Primary node for OFED use, so I can mount GPFS.
Would much prefer a system which is serial-console friendly.