Hi - I just bought a H2312JFQJR quad-node system with 8X E5-2670 processors for scientific computing. The systems were initially very easy to setup, but now when I run a simulation across 4 nodes (all 64 cores) the following things happen:
- The systems slow down considerably. I notice that in /proc/cpuinfo the clock frequency is only 1200 MHz. With three systems, its 2600.
- The amber warning lights become lit on each server.
- None of this happens if I only load 3/4 nodes.
- In my kern.log I see many messages like this:
Oct 23 19:36:51 localhost kernel: [ 2506.830104] CPU27: Core power limit notification (total events = 196)
Oct 23 19:36:51 localhost kernel: [ 2506.830112] CPU15: Package power limit notification (total events = 196)
Oct 23 19:36:51 localhost kernel: [ 2506.830122] CPU11: Package power limit notification (total events = 196)
Plugging the whole system through a load meter, I see that is pulling about 1000W right before these things happen, and then it suddenly decreases power usage dramatically (to maybe about 500W). The chassis is powered through 2 1000W power supplies, which are supposed to be redundant and split across the four nodes. Questions:
1) Whats going on? The system is supposed to be compatible with these CPUs.
2) The TDP of the E5-2670 is 115W. 115W x 8 is 920W, leaving a scant 80W available for everything else assuming the power supplies are really redundant. This can't be right.
3) Would it help if I switched to a 208V input power? I remember seeing somewhere (perhaps on the PSUs themselves) that their capacity was 1200W with 200V input power.
4) Are there bios settings that I should look out for?