To gather all relevant SLURM logs and information for SLURM cases, a script has been created called “capture_slurm_logs.sh” and placed on the admin0 (Shaheen-III) and admin10 (TDS) nodes:
/root/hpe/advdeployment
To run this script:
{
admin10:~/hpe/advdeployment # ./capture_slurm_logs.sh slurm1
Capturing SLURM logs and relevant information...
slurm.conf 100% 7522 5.0MB/s 00:00
messages 100% 806KB 115.8MB/s 00:00
slurmctld.log 100% 213KB 67.1MB/s 00:00
scp: /var/log/slurmctld.log: No such file or directory
slurmdbd.log 100% 628 513.9KB/s 00:00
Compressing and creating a .tgz file...
SLURM_logs_2025.02.17-11.54.09/
SLURM_logs_2025.02.17-11.54.09/README_FIRST.txt
SLURM_logs_2025.02.17-11.54.09/journalctl.txt
SLURM_logs_2025.02.17-11.54.09/scontrol_show_config.txt
SLURM_logs_2025.02.17-11.54.09/slurm.conf
SLURM_logs_2025.02.17-11.54.09/messages
SLURM_logs_2025.02.17-11.54.09/slurmctld.log
SLURM_logs_2025.02.17-11.54.09/slurmdbd.log
Upload 'SLURM_logs_2025.02.17-11.54.09.tgz' to the case.
admin10:~/hpe/advdeployment #
}
The “No such file or directory” message can be ignored for now. And this script doesn’t take more than 12 seconds to generate the relevant logs and information.