06/02/202010/21/2021 by bioinfocore

Run Unix commands in python

Between languages, Python, Unix
Python, Unix

To run Unix command in python:

import os
os.system('your unix code')
os.system('ls')

Source:
https://code.tutsplus.com/articles/how-to-run-unix-commands-in-your-python-program–cms-25926

05/28/202010/21/2021 by bioinfocore

Python Pandas remove rows with the same index

Python
Pandas, Python

Remove rows with the same index, only keep one first or last of them.

df = df.loc[~df.index.duplicated(keep='first')]
df = df.loc[~df.index.duplicated(keep='last')]

Source:
https://stackoverflow.com/questions/13035764/remove-rows-with-duplicate-indices-pandas-dataframe-and-timeseries

05/05/202010/21/2021 by bioinfocore

Python run loops in parallel

When no need to return anything:

from joblib import Parallel, delayed
import multiprocessing

# Number of cores available to use
num_cores = multiprocessing.cpu_count()

# If your function takes only 1 variable
def yourFunction(input):
    # anything in your loop
    return XXX

Parallel(n_jobs=num_cores)(delayed(yourFunction)(input) for input in list)


# If your function taking more than 1 variable
def yourFunction(input1, input2):
    # anything in your loop
    return XXX

Parallel(n_jobs=num_cores)(delayed(yourFunction)(input1, input2) for input1 in list1 for input2 in list2)

When need to return things, simply point it to a variable, it will be saved as a list:

results = Parallel(n_jobs=num_cores)(delayed(yourFunction)(input) for input in list)

When need to return data.frame and later concatenate together, using mp.Pool

import multiprocessing as mp
with mp.Pool(processes = num_cores-1) as pool:
    resultList = pool.map(yourFunction, argvList))

results_df = pd.concat(resultList)

Source:
https://stackoverflow.com/questions/9786102/how-do-i-parallelize-a-simple-python-loop
https://blog.dominodatalab.com/simple-parallelization/
https://stackoverflow.com/questions/36794433/python-using-multiprocessing-on-a-pandas-dataframe

04/28/202010/21/2021 by bioinfocore

Pandas rename one column or one index

Python
Pandas, Python

Rename only one or a few pandas dataframe columns:

df.rename(columns={"A": "a", "B": "c"}, inplace=True)

Rename one or a few index:

df.rename(index={"A": "a", "B": "c"}, inplace=True)

Source:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html

04/22/202010/21/2021 by bioinfocore

Python list files from a directory

Use os.listdir(), listing files in current directory:

import os
arr = os.listdir()
print(arr)

Use glob, listing by regular expression:

import glob

listFiles = []
for file in glob.glob("*.txt"):
    listFiles.append(file)

Source has an excellent answer to this question, further usage can be found in the source.

Source:
https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory

04/17/202010/21/2021 by bioinfocore

Pandas add an empty row or column to a dataframe with index

Python
Pandas, Python

Add empty row with or without name:

df.append(pd.Series(name='NameOfNewRow')) # name the new row
df.append(pd.Series(), ignore_index=True) # not name the new row

Add empty column:

df['new'] = pd.Series()

Source:
https://stackoverflow.com/questions/39998262/append-an-empty-row-in-dataframe-using-pandas
https://stackoverflow.com/questions/16327055/how-to-add-an-empty-column-to-a-dataframe

04/16/202010/21/2021 by bioinfocore

Python NumPy replace nan in array to 0 or a number

Python
numpy, Python

Replace nan in a numpy array to zero or any number:

a = numpy.array([1,2,3,4,np.nan])

# if copy=False, the replace inplace, default is True, it will be changed to 0 by default
a = numpy.nan_to_num(a, copy=True) 

# if you want it changed to any number, eg. 10.
numpy.nan_to_num(a, copy=False, nan=10)

Replace inf or -inf with the most positive or negative finite floating-point values or any numbers:

a = numpy.array([1,2,3,4,np.inf])

# change to the most positive or finite floating-point value by default
a = numpy.nan_to_num(a, copy=True)

# if you want it changed to any number, eg. 10.
a = numpy.nan_to_num(a, copy=True, posinf=10)

# if you want it changed to any number, eg. 10., same goes to neginf
a = numpy.nan_to_num(a, copy=True, posinf=10, neginf=-10)

The parameter posinf and neginf only works when your numpy version is equal or higher than 1.17.

Source:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.nan_to_num.html

04/16/202010/21/2021 by bioinfocore

Pickle UnicodeDecodeError incompatible between python2 and python3

When loading a pickle file saved using python2 and reload it into python3, you might get such errors:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position X: ordinal not in range(128)

This is due to an incompatible issue between python2 and python3, below is the easiest way to fix it, by adding encoding='latin1':

pickle.load(file, encoding='latin1')

There are definitely other ways, but this is the simplest one.

Source:
https://stackoverflow.com/questions/11305790/pickle-incompatibility-of-numpy-arrays-between-python-2-and-3

04/15/202010/21/2021 by bioinfocore

Pandas filter a dataframe by the sum of rows or columns

Python
Pandas, Python

Filter a dataframe by the sum of rows:

df = df[df.sum(axis=1) > 0]
df = df.loc[df.sum(axis=1) > 0,:]

Filter by sum of columns:

df = df.loc[:,df.sum() > 0]
df = df.loc[:,df.sum(axis=0) > 0]

Source:
https://stackoverflow.com/questions/40425484/filter-dataframe-in-pandas-on-sum-of-rows

04/14/202010/21/2021 by bioinfocore

Pandas remove rows or columns with null/nan/missing values

Python
Pandas, Python

Remove rows with nan/null/missing values:

df = df.dropna(axis=0, how='any') # Remove if any value is na
df = df.dropna(axis=0, how='all') # Remove if all values are na

Remove columns with nan/null/missing values:

df = df.dropna(axis=1, how='any') # Remove if any value is na
df = df.dropna(axis=1, how='all') # Remove if all values are na

Defaut remove is inplace=False, if you want to remove inplace, add inplace=True

Source:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html

bioinfo core

Index & solution of bioInfo utilities

Category / Python