Skip or remove the first line:
tail -n +2 file.txt
Skip the last line:
head -n -1 file.txt
Source:
https://unix.stackexchange.com/questions/55755/print-file-content-without-the-first-and-last-lines
Skip or remove the first line:
tail -n +2 file.txt
Skip the last line:
head -n -1 file.txt
Source:
https://unix.stackexchange.com/questions/55755/print-file-content-without-the-first-and-last-lines
When no need to return anything:
from joblib import Parallel, delayed
import multiprocessing
# Number of cores available to use
num_cores = multiprocessing.cpu_count()
# If your function takes only 1 variable
def yourFunction(input):
# anything in your loop
return XXX
Parallel(n_jobs=num_cores)(delayed(yourFunction)(input) for input in list)
# If your function taking more than 1 variable
def yourFunction(input1, input2):
# anything in your loop
return XXX
Parallel(n_jobs=num_cores)(delayed(yourFunction)(input1, input2) for input1 in list1 for input2 in list2)
When need to return things, simply point it to a variable, it will be saved as a list:
results = Parallel(n_jobs=num_cores)(delayed(yourFunction)(input) for input in list)
When need to return data.frame and later concatenate together, using mp.Pool
import multiprocessing as mp
with mp.Pool(processes = num_cores-1) as pool:
resultList = pool.map(yourFunction, argvList))
results_df = pd.concat(resultList)
Source:
https://stackoverflow.com/questions/9786102/how-do-i-parallelize-a-simple-python-loop
https://blog.dominodatalab.com/simple-parallelization/
https://stackoverflow.com/questions/36794433/python-using-multiprocessing-on-a-pandas-dataframe
One simple line to remove repeated rows from a txt file:
awk '!seen[$0]++' fileIn.txt > fileOut.txt
Using grep
to extract substrings:
grep -oP 'G*_\K(.+)(?=.bw)'
# \K defines the beginning; (?=) for the string end.
# eg. extract bigwig file names
ls -lah folder/* | cut -d' ' -f 10 | grep -E 'bw' | grep -oP 'G*_\K(.+)(?=.bw)'
Source:
https://unix.stackexchange.com/questions/437405/opposite-of-k-to-keep-the-stuff-right
Rename only one or a few pandas dataframe columns:
df.rename(columns={"A": "a", "B": "c"}, inplace=True)
Rename one or a few index:
df.rename(index={"A": "a", "B": "c"}, inplace=True)
Source:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html
Store a list of file into an array variable, and loop through the list:
files=($(ls -lah yourFolder/* | cut -d' ' -f X)) # X depend on which column is your file name, usually 14 in a local computer, but might differ.
for item in "${files[@]}"
do
echo $item
done
Or read the list from file then loop:
while read sample;
do
sample_list="$sample_list $sample"
done < sampleList.txt
for sample in $tissue_list
do
echo $sample
done
Source:
https://stackoverflow.com/questions/9954680/how-to-store-directory-files-listing-into-an-array
Use os.listdir()
, listing files in current directory:
import os
arr = os.listdir()
print(arr)
Use glob
, listing by regular expression:
import glob
listFiles = []
for file in glob.glob("*.txt"):
listFiles.append(file)
Source has an excellent answer to this question, further usage can be found in the source.
Source:
https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory
Add empty row with or without name:
df.append(pd.Series(name='NameOfNewRow')) # name the new row
df.append(pd.Series(), ignore_index=True) # not name the new row
Add empty column:
df['new'] = pd.Series()
Source:
https://stackoverflow.com/questions/39998262/append-an-empty-row-in-dataframe-using-pandas
https://stackoverflow.com/questions/16327055/how-to-add-an-empty-column-to-a-dataframe
Replace nan in a numpy array to zero or any number:
a = numpy.array([1,2,3,4,np.nan])
# if copy=False, the replace inplace, default is True, it will be changed to 0 by default
a = numpy.nan_to_num(a, copy=True)
# if you want it changed to any number, eg. 10.
numpy.nan_to_num(a, copy=False, nan=10)
Replace inf or -inf with the most positive or negative finite floating-point values or any numbers:
a = numpy.array([1,2,3,4,np.inf])
# change to the most positive or finite floating-point value by default
a = numpy.nan_to_num(a, copy=True)
# if you want it changed to any number, eg. 10.
a = numpy.nan_to_num(a, copy=True, posinf=10)
# if you want it changed to any number, eg. 10., same goes to neginf
a = numpy.nan_to_num(a, copy=True, posinf=10, neginf=-10)
The parameter
posinf
andneginf
only works when your numpy version is equal or higher than 1.17.
Source:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.nan_to_num.html
Delete a series of jobs using their job id. Change xxxxxxx
to the start job id and the end job id in the script. (don’t worry about the jobs in between that is not yours, you don’t have the authorization to delete them, they will be skipped automatically)
./delJobs.sh
The script can be download here:
https://gist.github.com/fa5a9bdfa9192339259100019afcee0a.git