Benzi Galili

Hi Dave,

Sorry about the color palette, I’ll ask fr that to be reproduced and if we see the same unreadable palette used, request a change to it as an enhancement request.

As for the problem at hand – the output you provided suggests that 5 of the nodes share the same IB switch while the other nodes are on a different IB switch (and those switches are interconnected)
1. can it be that those nodes are on a physically different switches? If yes, are those switches possibly running each a different subnet manager?)
2. are those nodes identical from hardware perspective? or are they in a different rack/switch as they are a different generation? if they differ, could that be the issue? if all are the same hardware, could it be that those nodes have different BIOS settings (e.g. some have CPU virtualizaiton support turned on, and some not)?

I would recommend the following:
1. power all nodes off
2. power on only the nodes that are ‘ok’ and connect to the primary. leave the others powered off.
3. when the nodes finish the aggregation, you should be prompted to “hit ESC to continue with X out of 8 nodes” (or, you could wait fr that to time out. the system should be able to come up. if it does, we know the SW can operate on that hardware model with your environment/settings – and should then focus on understanding what is different with the other nodes (the most probable suspect would be the different fabric)