<div dir="ltr">Good morning! Our group is running BOMD simulations in one unit cell of ZIF-8 crystal (276 atoms, cubic system of side 16.9856 A). With the following setup, PBE with GTH-TZV2P basis and GTH pseudopotentials (700 Ry cutoff) we have noticed that there is no benefit in using more than one core in a quadcore processor; on the other hand, the code scaled very well increasing the number of different processors (sockets). Example of Non Scaling: Having a machine with four processors (sockets), each processor with four cores (xeon X7350), running 4 MPI, one per each socket, takes 55 seconds per OT-DIIS, while running 8 MPI, two per socket, takes 41 seconds, while running 16 MPI, all cores, takes 40 seconds. This same NON-SCALING behavior across multiple cores of the same processor has been observed also on a i5-2550K intel processor. Moreover, the behavior is the same using threads or mixing MPI and threads (OpenMP). Example of Scaling: The same system, using hybrid functionals, scales more than linearly with the number of cores: 16MPI go more than 4 times faster than 4MPI. (Even thought the time required for an MD step becomes prohibitive to think about doing such calculations!) Considerations: On this basis, it seems that our PBE simulations are not computationally expensive. The limiting factor seems to be the cache, i.e. increasing number of sockets the cache increases and so does the speed of the simulations. Questions: Do you agree with this analysis? Is there a way to improve the speed of our PBE computations (for example, some option to reduce the amount of data to be transferred between RAM and cache)? Thanks a lot for the attention, best regards, Marco and Andrea </div>