[CP2K:3210] Re: FIST bug?
Teodoro Laino
teodor... at gmail.com
Wed Apr 20 16:31:29 UTC 2011
Hi Matt,
thanks. The fixes are in the CVS. In summary: although it was strictly related to PME, it was due to the porting of the RS_GRID (a couple of years ago) in that part of the code.
Cheers
Teo
On Apr 20, 2011, at 1:37 PM, Matt W wrote:
> Ok change lines 164 and 167 of pme.F to
>
> IF ( rden % desc % parallel .AND. rden % desc % distributed ) THEN
>
> and
>
> IF ( PRESENT(shell_particle_set) .AND. rden % desc %
> parallel .AND. rden % desc % distributed ) THEN
>
> respectively. It was using an outdated and risky check to see if the
> RS grids were distributed.
>
> Cheers,
>
> Matt
>
> On Apr 20, 12:01 pm, Matt W <mattwa... at gmail.com> wrote:
>> Hi Guys,
>>
>> I've just had a very quick look and I think it is a PME bug. Not SPME,
>> not QS. Particularly, the QS RS grids are pretty well tested.
>>
>> If I change the EWALD section to
>>
>> &EWALD
>> EWALD_TYPE spme
>> ALPHA .44
>> NS_MAX 25
>> GMAX 64 64 64
>> &END EWALD
>>
>> then there is no problem with 8,16 32 procs. All give the same energy
>>
>> out_spme_8.out: ENERGY| Total FORCE_EVAL ( FIST ) energy (a.u.):
>> -0.000148559882818
>> out_spme_16.out: ENERGY| Total FORCE_EVAL ( FIST ) energy
>> (a.u.): -0.000148559882818
>> out_spme_32.out: ENERGY| Total FORCE_EVAL ( FIST ) energy
>> (a.u.): -0.000148559882818
>>
>> I believe that the PME uses a second smaller grid for some purposes -
>> I would guess that there is some inconsistency in the treatment of the
>> larger and smaller grids. I'll try and have a proper look but people
>> who know the PME code might spot it more quickly.
>>
>> In summary: there is a bug with PME and at least some distributed
>> grids. But no evidence for problems with other methods.
>>
>> Cheers,
>>
>> Matt
>>
>> (previous mail to the group got bounced so reposting via google gui)
>>
>> On Apr 20, 6:00 am, Teodoro Laino <teodor... at gmail.com> wrote:
>>
>>> Hi Noam,
>>
>>> I can confirm your bug report. The issue is related to the relatively new distribution RS_GRID (a couple of years old..).
>>> This means that the same issue could be present also in QS jobs (they share the same infrastructure).
>>
>>> Matt who worked on this stuff is in CC. The issue can be reproduced with this simple input file:
>>
>>> &FORCE_EVAL
>>> METHOD FIST
>>> &MM
>>> &FORCEFIELD
>>> parm_file_name water.pot ! (just use the one in tests/Fist/sample_pot/)
>>> parmtype CHM
>>> &CHARGE
>>> ATOM OT
>>> CHARGE -0.8476
>>> &END CHARGE
>>> &CHARGE
>>> ATOM HT
>>> CHARGE 0.4238
>>> &END CHARGE
>>> &END FORCEFIELD
>>> &POISSON
>>> &EWALD
>>> EWALD_TYPE pme
>>> ALPHA .44
>>> NS_MAX 25
>>> &END EWALD
>>> &END POISSON
>>> &END MM
>>> &SUBSYS
>>> &CELL
>>> ABC 24.955 24.955 24.955
>>> &END CELL
>>> &COORD
>>> OT -0.757 -5.616 -7.101 MOL1
>>> HT -1.206 -5.714 -6.262 MOL1
>>> HT 0.024 -5.102 -6.896 MOL1
>>> OT -11.317 -2.629 -9.689 MOL2
>>> HT -11.021 -3.080 -10.480 MOL2
>>> HT -10.511 -2.355 -9.252 MOL2
>>> &END
>>> &END SUBSYS
>>> &END FORCE_EVAL
>>> &GLOBAL
>>> PROJECT water_3_dist
>>> RUN_TYPE ENERGY_FORCE
>>> &END GLOBAL
>>
>>> If the poisson section is substituted with this one:
>>
>>> &POISSON
>>> &EWALD
>>> EWALD_TYPE pme
>>> ALPHA .44
>>> NS_MAX 25
>>> &RS_GRID
>>> DISTRIBUTION_TYPE REPLICATED
>>> &END
>>> &END EWALD
>>> &END POISSON
>>
>>> where the distribution_type is set to replicated the bug disappears. So it is triggered by a certain combination of grid points and by the distributed type (which automatically is activated for 16 procs - 2..8 procs instead have replicated type).
>>> All module which use RS_GRID (FIST is just one) may be affected by this bug, including QS.
>>
>>> Regards,
>>> Teo
>>
>>> On Apr 19, 2011, at 3:37 PM, Noam Bernstein wrote:
>>
>>>> We were playing around with the regtests, specifically
>>>> cp2k/tests/QMMM/QS/regtest-3/water_3_dist.inp
>>>> On the latest version (cvs update today 19 April), no patches, I get
>>>> different results running on n_proc=2..8 and n_proc=16. The
>>>> difference seems to happen when RS_GRID goes from fully replicated
>>>> to distributed. I've attached the input files and two sample output
>>>> files (I changed from QMMM to FIST). The first output quantity difference
>>>> is on the
>>>> ENERGY| Total FORCE_EVAL ( FIST ) energy (a.u.):
>>>> energies are off by 8 mRy. Can someone replicate this issue (I suppose
>>>> I could have compiler or MPI issues, although we think we see this error
>>>> on several somewhat different Linux platforms)? If so, any ideas
>>>> as to the source of the problem?
>>
>>>> thanks,
>>>> Noam
>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups "cp2k" group.
>>>> To post to this group, send email to cp... at googlegroups.com.
>>>> To unsubscribe from this group, send email to cp2k+uns... at googlegroups.com.
>>>> For more options, visit this group athttp://groups.google.com/group/cp2k?hl=en.
>>
>>>> <out.2><out.16><water_3_dist.inp>
>
> --
> You received this message because you are subscribed to the Google Groups "cp2k" group.
> To post to this group, send email to cp... at googlegroups.com.
> To unsubscribe from this group, send email to cp2k+uns... at googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/cp2k?hl=en.
>
More information about the CP2K-user
mailing list