06/03/202002/01/2023 by bioinfocore

Liftover bigwig files

Blogs, Unix
Blogs, Unix

There are two ways to lift-over bigwig files from one genome build to another. One is using CrossMap or step by step as below. (CrossMap is almost the same to the break down steps)

CrossMap method: (taking from hg19 to hg38 as example)

pip install CrossMap

CrossMap.py bigwig hg19ToHg38.over.chain input.bw output.bw

genome liftover chain files can be downloaded here: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/liftOver/ (change according to your needs)

Step by step method: (bw –> bedGraph –> liftover –> bw)

bigWigToBedGraph input.bw input.bedGraph

liftOver input.bedGraph hg19ToHg38.over.chain input_hg38.bedgraph unMapped

fetchChromSizes hg38 > hg38.chrom.sizes

LC_COLLATE=C sort -k1,1 -k2,2n input_hg38.bedgraph > input_hg38.sorted.bedgraph

bedGraphToBigWig input_hg38.sorted.bedgraph hg38.chrom.sizes output.bw

bigWigToBedGraph, liftOver, fetchChromSizes, bedGraphToBigWig are all UCSC utilities which can be installed from here: http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads

Source:
http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads
http://crossmap.sourceforge.net/#

06/02/202010/21/2021 by bioinfocore

Run Unix commands in python

Between languages, Python, Unix
Python, Unix

To run Unix command in python:

import os
os.system('your unix code')
os.system('ls')

Source:
https://code.tutsplus.com/articles/how-to-run-unix-commands-in-your-python-program–cms-25926

05/15/202010/21/2021 by bioinfocore

Unix insert a line to the beginning of a file

Unix
Unix

Insert a line to the beginning of a file using sed:

sed '1 s/^/This is my first line\n/' inFile.txt > outFile.txt

Source:
https://linuxconfig.org/how-to-insert-line-to-the-beginning-of-file-on-linux

05/15/202010/21/2021 by bioinfocore

Unix skip/remove the first or last line

Unix
Unix

Skip or remove the first line:

tail -n +2 file.txt

Skip the last line:

head -n -1 file.txt

Source:
https://unix.stackexchange.com/questions/55755/print-file-content-without-the-first-and-last-lines

05/05/202010/21/2021 by bioinfocore

Unix remove repeated rows

Unix
Unix

One simple line to remove repeated rows from a txt file:

awk '!seen[$0]++' fileIn.txt > fileOut.txt

05/04/202010/21/2021 by bioinfocore

Unix regex grep extract substring

Unix
Unix

Using grep to extract substrings:

grep -oP 'G*_\K(.+)(?=.bw)'

# \K defines the beginning; (?=) for the string end.
# eg. extract bigwig file names
ls -lah folder/* | cut -d' ' -f 10 | grep -E 'bw' | grep -oP 'G*_\K(.+)(?=.bw)'

Source:
https://unix.stackexchange.com/questions/437405/opposite-of-k-to-keep-the-stuff-right

04/22/202010/21/2021 by bioinfocore

Unix store a list of files into an array variable

Unix
Unix

Store a list of file into an array variable, and loop through the list:

files=($(ls -lah yourFolder/* | cut -d' ' -f X)) # X depend on which column is your file name, usually 14 in a local computer, but might differ.
for item in "${files[@]}"
do
  echo $item
done

Or read the list from file then loop:

while read sample;
do
    sample_list="$sample_list $sample"
done < sampleList.txt

for sample in $tissue_list
do
    echo $sample
done

Source:
https://stackoverflow.com/questions/9954680/how-to-store-directory-files-listing-into-an-array

04/16/202010/21/2021 by bioinfocore

Delete a series of jobs in slurm

Unix
slurm, Unix

Delete a series of jobs using their job id. Change xxxxxxx to the start job id and the end job id in the script. (don’t worry about the jobs in between that is not yours, you don’t have the authorization to delete them, they will be skipped automatically)

./delJobs.sh

The script can be download here:
https://gist.github.com/fa5a9bdfa9192339259100019afcee0a.git

04/16/202010/21/2021 by bioinfocore

Slurm system multiple job submission template

Blogs, Unix
Blogs, slurm, Unix

They are many ways to submit Slurm jobs in parallel, here I will share the one that I used the most. This template can be looped through a list of entries and submit them all at once. It is especially practical when you need to run hundreds of samples at the same time.

Pay attention to the administrative limits superimposed by your admin, 500 jobs are usually the limit they gave us.
You can loop within your slurm submission script to request multiple sessions or parallel within your code, but when dealing with large number of samples, I like my way better since I have better control over individual jobs and combining with parallel within each of those sections will powers it up even more). If one node mysteriously fails (which can happen especially when you run hundreds of samples), I can easily monitor which one and resubmit it. Please feel free to choose whatever you like, whichever way works for you should be the best way.

You will need two files, one is the loop function, another is your slurm template and here is the usage:

– Have your sample list as a txt file with one column containing your sample names, in this template it is noted as sampleList.txt;
– Have your yourSlurmScript.sh composed well, replace places where your sample name will go with “Z”. (you can use a character that is not present in your yourSlurmScript.sh, I find that capitalized “Z” never present in my code, “X” is also a common choice)
– Put your yourSlurmScript.sh file name into the batchSubmit.sh script, and run as below:

./batchSubmit.sh. # you can change the name to whatever you want

1. Loop function, batchSubmit.sh:

2. Prepare yourSlurmScript.sh

Something very important here, ALWAYS rsync your files into your node assigned tmp folder and run your job there, don’t use cp especially when your jobs are “heavy”. Or I promise you your server admin will ask you out for a serious talk…

The two above scripts can be download here:
https://gist.github.com/b533a6151d8fb607a51b397ad0eb2b2c.git
https://gist.github.com/f51685c1c6277de8785374b09cffb5b5.git

04/13/202010/21/2021 by bioinfocore

Change Unix character encoding

Unix
html, Unix

This happens usually when you transferring files between systems, for example, “scp” or “rsync” file from your local machine to a Linux server. The difference will show up when you have special characters (eg. ø, ó, ä … ) in your file, especially when your file is an HTML file, there will be garbled code showing up.

iconv -f iso-8859-1 -t utf-8 input.html > fixed_input.html
mv fixed_input.html input.html

Sometimes even if you have made sure that locally your file was encoded in utf-8, the transfer will still force recognize it as in iso-8859-1. This happens.

bioinfo core

Index & solution of bioInfo utilities

Tag / Unix