 Hi everyone,

I have a single node server with two Xeon Gold 6154 CPU @ 3.00GHz, 18 cores 
each, 36 total, 72 threads.

I have tested several NPROC_REP values in a series of vibrational analysis 
runs started with the following command:
mpirun -np 72 --bind-to hwthread  cp2k.psmp -i cp2k.inp -o cp2k.out

I report a brief table summary of the timings in the attached file: 

The timings refer to the same two wf optimization steps, taken as example, 
for the elementary cell of a MOF of N=37 atoms.
Each calculation started from the same cp2k-RESTART-1.wfn

The required n. of SCF WAVEFUNCTION OPTIMIZATION is 223 (6N + 1).

>From the table, in principle, the best choice seems to be NPROC_REP=1.
Unfortunately, with NPROC_REP=1 each of the 72 replica will perform 4 SCF 
WAVEFUNCTION OPTIMIZATIONs, for a total of 288 calculations.
When replicas terminate the assigned calculations, the threads can not be 
redistributed towards the replicas that are still running.

So, these seem also aspects that need to be taken into account when 
choosing a suitable value of NPROC_REP.

Please, let me know if you have any comments and thanks for your kind 

#                                      TIMINGS / s
# NPROC_REP	memory (MB)	OT CG step 3	OT LS step 4
72		13757.3		7.6		3.7
36		17051.3		24.9   	  	11.9
24		18455.0		31.4		14.6
18		19976.3		36.6		17.3
12		22914.0		59.7		28.1
9		25733.4		72.6		34.4
8		26830.3		78.0		37.1
6		31564.3		93.8		42.6
4		40061.4		141.6		68.1
3		49098.9		181.6		86.8
2		64692.3		256.3		121.5
1		87551.7		473.2		227.0

