[CP2K-user] [CP2K:21691] Implementation Suggestion: job not running due to some SLURM parameters

Michela Cavalieri bnzmichela at gmail.com
Thu Jul 24 16:50:11 UTC 2025


I am guessing that the workaround is the TRACE_MASTER as you suggested in 
case it became messy. I would still like an explanation for why it didn't 
print anything out - should this be added to the CP2K manual?

Thank you kindly for your time,

Michela

On Thursday, July 24, 2025 at 12:47:59 PM UTC-4 Michela Cavalieri wrote:

> Hello Dr. Krack,
>
> I found that the issue was TRACE T that I recently added (see this same 
> thread) under &GLOBAL. 
>
> I did not think of that at all because it was suggested to me here. I took 
> it out and things are running smoothly.
>
> My guess as to what happened is that the "TRACE" is massive - see attached 
> the file that I got from printing my compute node output. Perhaps it timed 
> out before computing any MD steps (*I allocated 42 hours*).
>
> I printed the attached file from the compute node (as I mentioned before, 
> nothing was printed to my working directory even after submitting multiple 
> times) in less than 10 min of runtime. I will keep running this on our HPC 
> compute node and see if outputs any errors that could give us a better idea 
> for why nothing was printed. 
>
> I am not sure why it wouldn't still print the output and why it left me 
> without any restart files at all! That is the complete opposite of why I 
> was trying to use TRACE T!
>
> I would like to raise this as an issue, unless you can perhaps correct 
> something that I did wrong.
>
> Thank you,
>
> Michela
>
> On Tuesday, July 22, 2025 at 4:42:57 AM UTC-4 Krack Matthias wrote:
>
>> Hi Michela
>>
>>  
>>
>> Although the downloadable CP2K containers (apptainers) may work on many 
>> systems, this is not the case for all cluster systems given the large 
>> variety of cluster and slurm configurations.
>>
>>  
>>
>> Could you run successfully the apptainer self test? E.g. with
>>
>> “apptainer run -B /projects:/projects 
>> /shared/container_repository/cp2k/cp2k_2024.1_openmpi_generic_psmp.sif 
>> run_tests”
>>
>>  
>>
>> If that test does not work, the container is not suited for your cluster 
>> system and you should try to build a CP2K binary from scratch using the 
>> appropriate compiler and MPI modules installed on your cluster system. I 
>> recommend to ask one of the sysadmins for assistance to perform that task.
>>
>>  
>>
>> Best
>>
>>  
>>
>> Matthias
>>
>>  
>>
>> *From: *cp... at googlegroups.com <cp... at googlegroups.com> on behalf of 
>> Michela Benazzi <bnzmi... at gmail.com>
>> *Date: *Monday, 21 July 2025 at 19:56
>> *To: *cp2k <cp... at googlegroups.com>
>> *Subject: *Re: [CP2K:21677] Implementation Suggestion: job not running 
>> due to some SLURM parameters
>>
>> Hello Dr. Krack and CP2K community,
>>
>>  
>>
>> I hope you are doing well! I have two additional questions after my jobs 
>> failed without trace again - SLURM exit code 15, which is supposedly on the 
>> side of the software/app being used.
>>
>>  
>>
>> My improvements: I have added TRACE T under &GLOBAL, and have fixed my 
>> bash script settings (I was not requesting enough time and memory). I am 
>> seeing a positive improvement there.
>>
>>  
>>
>> I wish I could troubleshoot, but because my jobs failed immediately upon 
>> starting without leaving trace, there's no .err files to refer to. Can I 
>> please get some assistance? I am not attaching any more input files because 
>> multiple jobs have failed, so I do not think that is the issue. Find my 
>> slurm script for reference below my signature.
>>
>>  
>>
>> Thank you,
>>
>>  
>>
>> Michela
>>
>>  
>>
>> #!/bin/bash
>> #SBATCH -p short
>> #SBATCH --mem=36G ##memory per task
>> #SBATCH --job-name=en1
>> #SBATCH --ntasks=8
>> #SBATCH --cpus-per-task=7
>> #SBATCH -N 1 ## containers can only run on one node, and c*n = 56 or 128 
>> (intel nodes top out at 56 cores, zen2 at 128)
>> #SBATCH --constraint=ib
>> #SBATCH --time=42:00:00
>> #SBATCH -o %j.out
>> #SBATCH -e %j.err
>>  
>> #set up the number of OpenMP threads:
>>  export OMP_NUM_THREADS=7 ## should be the same as cpus-per-task
>>  
>> #The setup_new file contains all the instructions for setting up the 
>> correct environment before the user can compile and/or run CP2K:
>>
>> export PATH=$PATH:/shared/centos7/cp2k/cp2k-6.1.0/data
>> dir=/home/benazzi.m/BHT/
>> inputFile=$dir/120h_bulk-1.restart
>>
>> apptainer run -B /projects:/projects 
>> /shared/container_repository/cp2k/cp2k_2024.1_openmpi_generic_psmp.sif 
>> mpirun -n 8 --oversubscribe cp2k.psmp -i $inputFile
>>
>> On Wednesday, July 16, 2025 at 4:30:02 PM UTC-4 Michela Benazzi wrote:
>>
>> Thank you Dr. Krack,
>>
>>  
>>
>> I will give it a try! 
>>
>> Michela
>>
>>  
>>
>> On Wednesday, July 16, 2025 at 12:01:08 PM UTC-4 Krack Matthias wrote:
>>
>> Hi Michela
>>
>>  
>>
>> You can try to activate the TRACE input key 
>> <https://manual.cp2k.org/trunk/CP2K_INPUT/GLOBAL.html#CP2K_INPUT.GLOBAL.TRACE> in 
>> the &GLOBAL section (or TRACE_MASTER if the output becomes messy because of 
>> the large number of MPI ranks).
>>
>>  
>>
>> HTH
>>
>>  
>>
>> Matthias
>>
>>  
>>
>>  
>>
>> *From: *cp... at googlegroups.com <cp... at googlegroups.com> on behalf of 
>> Michela Benazzi <bnzmi... at gmail.com>
>> *Date: *Wednesday, 16 July 2025 at 17:31
>> *To: *cp2k <cp... at googlegroups.com>
>> *Subject: *[CP2K:21665] Implementation Suggestion: job not running due 
>> to some SLURM parameters
>>
>> Good morning,
>>
>>  
>>
>> I have recently noticed that I was not allocating nearly enough memory 
>> for my jobs - (25 GB allocated, 32 GB were actually being used).
>>
>>  
>>
>> Those jobs failed without any trace - no output, restart, log, error 
>> files at all.
>>
>>  
>>
>> Is there any way (or any way to implement it if that does not exist yet) 
>> to add a feature where:
>>
>>  
>>
>> 1. The job outputs information until memory capacity is failed
>>
>> 2. There is an error message explaining the cause in the .out or .err 
>> files
>>
>>  
>>
>> Thank you!
>>
>> Michela
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "cp2k" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to cp2k+uns... at googlegroups.com.
>> To view this discussion visit 
>> https://groups.google.com/d/msgid/cp2k/d4c2d7a3-fb48-4c39-b185-6ebfa692aa35n%40googlegroups.com 
>> <https://groups.google.com/d/msgid/cp2k/d4c2d7a3-fb48-4c39-b185-6ebfa692aa35n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "cp2k" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to cp2k+uns... at googlegroups.com.
>>
>> To view this discussion visit 
>> https://groups.google.com/d/msgid/cp2k/90858c2f-b28a-4b39-8c24-6fe4d8997081n%40googlegroups.com 
>> <https://groups.google.com/d/msgid/cp2k/90858c2f-b28a-4b39-8c24-6fe4d8997081n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+unsubscribe at googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cp2k/32fb931f-715b-445b-be74-79d66bfa65d3n%40googlegroups.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cp2k.org/archives/cp2k-user/attachments/20250724/14c24f7d/attachment-0001.htm>


More information about the CP2K-user mailing list