[CP2K-user] [CP2K:21691] Implementation Suggestion: job not running due to some SLURM parameters
Michela Cavalieri
bnzmichela at gmail.com
Thu Jul 24 16:50:11 UTC 2025
I am guessing that the workaround is the TRACE_MASTER as you suggested in
case it became messy. I would still like an explanation for why it didn't
print anything out - should this be added to the CP2K manual?
Thank you kindly for your time,
Michela
On Thursday, July 24, 2025 at 12:47:59 PM UTC-4 Michela Cavalieri wrote:
> Hello Dr. Krack,
>
> I found that the issue was TRACE T that I recently added (see this same
> thread) under &GLOBAL.
>
> I did not think of that at all because it was suggested to me here. I took
> it out and things are running smoothly.
>
> My guess as to what happened is that the "TRACE" is massive - see attached
> the file that I got from printing my compute node output. Perhaps it timed
> out before computing any MD steps (*I allocated 42 hours*).
>
> I printed the attached file from the compute node (as I mentioned before,
> nothing was printed to my working directory even after submitting multiple
> times) in less than 10 min of runtime. I will keep running this on our HPC
> compute node and see if outputs any errors that could give us a better idea
> for why nothing was printed.
>
> I am not sure why it wouldn't still print the output and why it left me
> without any restart files at all! That is the complete opposite of why I
> was trying to use TRACE T!
>
> I would like to raise this as an issue, unless you can perhaps correct
> something that I did wrong.
>
> Thank you,
>
> Michela
>
> On Tuesday, July 22, 2025 at 4:42:57 AM UTC-4 Krack Matthias wrote:
>
>> Hi Michela
>>
>>
>>
>> Although the downloadable CP2K containers (apptainers) may work on many
>> systems, this is not the case for all cluster systems given the large
>> variety of cluster and slurm configurations.
>>
>>
>>
>> Could you run successfully the apptainer self test? E.g. with
>>
>> “apptainer run -B /projects:/projects
>> /shared/container_repository/cp2k/cp2k_2024.1_openmpi_generic_psmp.sif
>> run_tests”
>>
>>
>>
>> If that test does not work, the container is not suited for your cluster
>> system and you should try to build a CP2K binary from scratch using the
>> appropriate compiler and MPI modules installed on your cluster system. I
>> recommend to ask one of the sysadmins for assistance to perform that task.
>>
>>
>>
>> Best
>>
>>
>>
>> Matthias
>>
>>
>>
>> *From: *cp... at googlegroups.com <cp... at googlegroups.com> on behalf of
>> Michela Benazzi <bnzmi... at gmail.com>
>> *Date: *Monday, 21 July 2025 at 19:56
>> *To: *cp2k <cp... at googlegroups.com>
>> *Subject: *Re: [CP2K:21677] Implementation Suggestion: job not running
>> due to some SLURM parameters
>>
>> Hello Dr. Krack and CP2K community,
>>
>>
>>
>> I hope you are doing well! I have two additional questions after my jobs
>> failed without trace again - SLURM exit code 15, which is supposedly on the
>> side of the software/app being used.
>>
>>
>>
>> My improvements: I have added TRACE T under &GLOBAL, and have fixed my
>> bash script settings (I was not requesting enough time and memory). I am
>> seeing a positive improvement there.
>>
>>
>>
>> I wish I could troubleshoot, but because my jobs failed immediately upon
>> starting without leaving trace, there's no .err files to refer to. Can I
>> please get some assistance? I am not attaching any more input files because
>> multiple jobs have failed, so I do not think that is the issue. Find my
>> slurm script for reference below my signature.
>>
>>
>>
>> Thank you,
>>
>>
>>
>> Michela
>>
>>
>>
>> #!/bin/bash
>> #SBATCH -p short
>> #SBATCH --mem=36G ##memory per task
>> #SBATCH --job-name=en1
>> #SBATCH --ntasks=8
>> #SBATCH --cpus-per-task=7
>> #SBATCH -N 1 ## containers can only run on one node, and c*n = 56 or 128
>> (intel nodes top out at 56 cores, zen2 at 128)
>> #SBATCH --constraint=ib
>> #SBATCH --time=42:00:00
>> #SBATCH -o %j.out
>> #SBATCH -e %j.err
>>
>> #set up the number of OpenMP threads:
>> export OMP_NUM_THREADS=7 ## should be the same as cpus-per-task
>>
>> #The setup_new file contains all the instructions for setting up the
>> correct environment before the user can compile and/or run CP2K:
>>
>> export PATH=$PATH:/shared/centos7/cp2k/cp2k-6.1.0/data
>> dir=/home/benazzi.m/BHT/
>> inputFile=$dir/120h_bulk-1.restart
>>
>> apptainer run -B /projects:/projects
>> /shared/container_repository/cp2k/cp2k_2024.1_openmpi_generic_psmp.sif
>> mpirun -n 8 --oversubscribe cp2k.psmp -i $inputFile
>>
>> On Wednesday, July 16, 2025 at 4:30:02 PM UTC-4 Michela Benazzi wrote:
>>
>> Thank you Dr. Krack,
>>
>>
>>
>> I will give it a try!
>>
>> Michela
>>
>>
>>
>> On Wednesday, July 16, 2025 at 12:01:08 PM UTC-4 Krack Matthias wrote:
>>
>> Hi Michela
>>
>>
>>
>> You can try to activate the TRACE input key
>> <https://manual.cp2k.org/trunk/CP2K_INPUT/GLOBAL.html#CP2K_INPUT.GLOBAL.TRACE> in
>> the &GLOBAL section (or TRACE_MASTER if the output becomes messy because of
>> the large number of MPI ranks).
>>
>>
>>
>> HTH
>>
>>
>>
>> Matthias
>>
>>
>>
>>
>>
>> *From: *cp... at googlegroups.com <cp... at googlegroups.com> on behalf of
>> Michela Benazzi <bnzmi... at gmail.com>
>> *Date: *Wednesday, 16 July 2025 at 17:31
>> *To: *cp2k <cp... at googlegroups.com>
>> *Subject: *[CP2K:21665] Implementation Suggestion: job not running due
>> to some SLURM parameters
>>
>> Good morning,
>>
>>
>>
>> I have recently noticed that I was not allocating nearly enough memory
>> for my jobs - (25 GB allocated, 32 GB were actually being used).
>>
>>
>>
>> Those jobs failed without any trace - no output, restart, log, error
>> files at all.
>>
>>
>>
>> Is there any way (or any way to implement it if that does not exist yet)
>> to add a feature where:
>>
>>
>>
>> 1. The job outputs information until memory capacity is failed
>>
>> 2. There is an error message explaining the cause in the .out or .err
>> files
>>
>>
>>
>> Thank you!
>>
>> Michela
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "cp2k" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to cp2k+uns... at googlegroups.com.
>> To view this discussion visit
>> https://groups.google.com/d/msgid/cp2k/d4c2d7a3-fb48-4c39-b185-6ebfa692aa35n%40googlegroups.com
>> <https://groups.google.com/d/msgid/cp2k/d4c2d7a3-fb48-4c39-b185-6ebfa692aa35n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "cp2k" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to cp2k+uns... at googlegroups.com.
>>
>> To view this discussion visit
>> https://groups.google.com/d/msgid/cp2k/90858c2f-b28a-4b39-8c24-6fe4d8997081n%40googlegroups.com
>> <https://groups.google.com/d/msgid/cp2k/90858c2f-b28a-4b39-8c24-6fe4d8997081n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cp2k/32fb931f-715b-445b-be74-79d66bfa65d3n%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20250724/14c24f7d/attachment-0001.htm>
More information about the CP2K-user
mailing list