#SBATCH --nodelist=node[01-09]
#SBATCH --nodelist=node01
Tag / slurm
Slurm display full job name
sacct -u username --format=JobID,JobName%30
sacct -u username --format=JobID,TIME,JobName%30,Start%12,Elapsed,NCPU,CPUTime
Source:
https://stackoverflow.com/questions/42217102/expand-columns-to-see-full-jobname-in-slurm
Slurm sbatch exclude nodes or node list
Both works.
#SBATCH --exclude=node[01-09]
#SBATCH --exclude=node01,node02,node03,node04,node05,node07,node08,node09,node10
Delete a series of jobs in slurm
Delete a series of jobs using their job id. Change xxxxxxx
to the start job id and the end job id in the script. (don’t worry about the jobs in between that is not yours, you don’t have the authorization to delete them, they will be skipped automatically)
./delJobs.sh
The script can be download here:
https://gist.github.com/fa5a9bdfa9192339259100019afcee0a.git
Slurm system multiple job submission template
They are many ways to submit Slurm jobs in parallel, here I will share the one that I used the most. This template can be looped through a list of entries and submit them all at once. It is especially practical when you need to run hundreds of samples at the same time.
Pay attention to the administrative limits superimposed by your admin, 500 jobs are usually the limit they gave us.
You can loop within your slurm submission script to request multiple sessions or parallel within your code, but when dealing with large number of samples, I like my way better since I have better control over individual jobs and combining with parallel within each of those sections will powers it up even more). If one node mysteriously fails (which can happen especially when you run hundreds of samples), I can easily monitor which one and resubmit it. Please feel free to choose whatever you like, whichever way works for you should be the best way.
You will need two files, one is the loop function, another is your slurm template and here is the usage:
– Have your sample list as a txt file with one column containing your sample names, in this template it is noted as sampleList.txt
;
– Have your yourSlurmScript.sh
composed well, replace places where your sample name will go with “Z”. (you can use a character that is not present in your yourSlurmScript.sh
, I find that capitalized “Z” never present in my code, “X” is also a common choice)
– Put your yourSlurmScript.sh
file name into the batchSubmit.sh
script, and run as below:
./batchSubmit.sh. # you can change the name to whatever you want
1. Loop function, batchSubmit.sh
:
2. Prepare yourSlurmScript.sh
Something very important here, ALWAYS rsync
your files into your node assigned tmp folder and run your job there, don’t use cp
especially when your jobs are “heavy”. Or I promise you your server admin will ask you out for a serious talk…
The two above scripts can be download here:
https://gist.github.com/b533a6151d8fb607a51b397ad0eb2b2c.git
https://gist.github.com/f51685c1c6277de8785374b09cffb5b5.git