Tue Apr 2 16:05:57 UTC 2019

Sorry, my explanation was perhaps not clear enough.

The issue is with number of cluster-nodes and not the number of MPI-ranks. 
Of course, 128 MPI-ranks are fine. For instance, I tried 160x48x2, 
108x12x8, and some more configurations (which reads as [Nodes x 
RanksPerNode x OmpThreads]). As a side-note, e.g. 108x12x8 hits a total 
rank-count that is typically preferred by CP2K (108x12 == 36x36 aka 
square-number). Back to my problem, I found 256 cluster-nodes work fine but 
none of the configurations in between 128 and 256 cluster-nodes. For me 
this translates in economic disadvantage for the end user given that less 
than 256 nodes can do the job.

