[CP2K:3210] Re: FIST bug?

Teodoro Laino teodor... at gmail.com
Wed Apr 20 16:31:29 UTC 2011


Hi Matt,

thanks. The fixes are in the CVS. In summary: although it was strictly related to PME, it was due to the porting of the RS_GRID (a couple of years ago) in that part of the code.
Cheers
Teo


On Apr 20, 2011, at 1:37 PM, Matt W wrote:

> Ok change lines 164 and 167 of pme.F to
> 
>    IF ( rden % desc % parallel .AND. rden % desc % distributed ) THEN
> 
> and
> 
>    IF ( PRESENT(shell_particle_set) .AND. rden % desc %
> parallel .AND. rden % desc % distributed ) THEN
> 
> respectively. It was using an outdated and risky check to see if the
> RS grids were distributed.
> 
> Cheers,
> 
> Matt
> 
> On Apr 20, 12:01 pm, Matt W <mattwa... at gmail.com> wrote:
>> Hi Guys,
>> 
>> I've just had a very quick look and I think it is a PME bug. Not SPME,
>> not QS. Particularly, the QS RS grids are pretty well tested.
>> 
>> If I change the EWALD section to
>> 
>>       &EWALD
>>         EWALD_TYPE spme
>>         ALPHA .44
>>         NS_MAX 25
>>         GMAX 64 64 64
>>       &END EWALD
>> 
>> then there is no problem with 8,16 32 procs. All give the same energy
>> 
>> out_spme_8.out: ENERGY| Total FORCE_EVAL ( FIST ) energy (a.u.):
>> -0.000148559882818
>> out_spme_16.out: ENERGY| Total FORCE_EVAL ( FIST ) energy
>> (a.u.):       -0.000148559882818
>> out_spme_32.out: ENERGY| Total FORCE_EVAL ( FIST ) energy
>> (a.u.):       -0.000148559882818
>> 
>> I believe that the PME uses a second smaller grid for some purposes -
>> I would guess that there is some inconsistency in the treatment of the
>> larger and smaller grids.  I'll try and have a proper look but people
>> who know the PME code might spot it more quickly.
>> 
>> In summary: there is a bug with PME and at least some distributed
>> grids. But no evidence for problems with other methods.
>> 
>> Cheers,
>> 
>> Matt
>> 
>> (previous mail to the group got bounced so reposting via google gui)
>> 
>> On Apr 20, 6:00 am, Teodoro Laino <teodor... at gmail.com> wrote:
>> 
>>> Hi Noam,
>> 
>>> I can confirm your bug report. The issue is related to the relatively new distribution RS_GRID (a couple of years old..).
>>> This means that the same issue could be present also in QS jobs (they share the same infrastructure).
>> 
>>> Matt who worked on this stuff is in CC. The issue can be reproduced with this simple input file:
>> 
>>> &FORCE_EVAL
>>>   METHOD FIST
>>>   &MM
>>>     &FORCEFIELD
>>>       parm_file_name water.pot  ! (just use the one in tests/Fist/sample_pot/)
>>>       parmtype CHM
>>>       &CHARGE
>>>         ATOM OT
>>>         CHARGE -0.8476
>>>       &END CHARGE
>>>       &CHARGE
>>>         ATOM HT
>>>         CHARGE 0.4238
>>>       &END CHARGE
>>>     &END FORCEFIELD
>>>     &POISSON
>>>       &EWALD
>>>         EWALD_TYPE pme
>>>         ALPHA .44
>>>         NS_MAX 25
>>>       &END EWALD
>>>     &END POISSON
>>>   &END MM
>>>   &SUBSYS
>>>     &CELL
>>>       ABC 24.955 24.955 24.955
>>>     &END CELL
>>>     &COORD
>>> OT   -0.757  -5.616  -7.101    MOL1
>>> HT   -1.206  -5.714  -6.262    MOL1
>>> HT    0.024  -5.102  -6.896    MOL1
>>> OT  -11.317  -2.629  -9.689    MOL2
>>> HT  -11.021  -3.080 -10.480    MOL2
>>> HT  -10.511  -2.355  -9.252    MOL2
>>>     &END
>>>   &END SUBSYS
>>> &END FORCE_EVAL
>>> &GLOBAL
>>>   PROJECT water_3_dist
>>>   RUN_TYPE ENERGY_FORCE
>>> &END GLOBAL
>> 
>>> If the poisson section is substituted with this one:
>> 
>>>     &POISSON
>>>       &EWALD
>>>         EWALD_TYPE pme
>>>         ALPHA .44
>>>         NS_MAX 25
>>>         &RS_GRID
>>>          DISTRIBUTION_TYPE REPLICATED
>>>         &END
>>>       &END EWALD
>>>     &END POISSON
>> 
>>> where the distribution_type is set to replicated the bug disappears. So it is triggered by a certain combination of grid points and by the distributed type (which automatically is activated for 16 procs - 2..8 procs instead have replicated type).
>>> All module which use RS_GRID (FIST is just one)  may be affected by this bug, including QS.
>> 
>>> Regards,
>>> Teo
>> 
>>> On Apr 19, 2011, at 3:37 PM, Noam Bernstein wrote:
>> 
>>>> We were playing around with the regtests, specifically
>>>>   cp2k/tests/QMMM/QS/regtest-3/water_3_dist.inp
>>>> On the latest version (cvs update today 19 April), no patches, I get
>>>> different results running on n_proc=2..8 and n_proc=16.  The
>>>> difference seems to happen when RS_GRID goes from fully replicated
>>>> to distributed.  I've attached the input files and two sample output
>>>> files (I changed from QMMM to FIST).  The first output quantity difference
>>>> is on the
>>>>    ENERGY| Total FORCE_EVAL ( FIST ) energy (a.u.):
>>>> energies are off by 8 mRy.  Can someone replicate this issue (I suppose
>>>> I could have compiler or MPI issues, although we think we see this error
>>>> on several somewhat different Linux platforms)?  If so, any ideas
>>>> as to the source of the problem?
>> 
>>>>     thanks,
>>>>     Noam
>> 
>>>> --
>>>> You received this message because you are subscribed to the Google Groups "cp2k" group.
>>>> To post to this group, send email to cp... at googlegroups.com.
>>>> To unsubscribe from this group, send email to cp2k+uns... at googlegroups.com.
>>>> For more options, visit this group athttp://groups.google.com/group/cp2k?hl=en.
>> 
>>>> <out.2><out.16><water_3_dist.inp>
> 
> -- 
> You received this message because you are subscribed to the Google Groups "cp2k" group.
> To post to this group, send email to cp... at googlegroups.com.
> To unsubscribe from this group, send email to cp2k+uns... at googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/cp2k?hl=en.
> 




More information about the CP2K-user mailing list