Python run loops in parallel

When no need to return anything:

from joblib import Parallel, delayed
import multiprocessing

# Number of cores available to use
num_cores = multiprocessing.cpu_count()

# If your function takes only 1 variable
def yourFunction(input):
    # anything in your loop
    return XXX

Parallel(n_jobs=num_cores)(delayed(yourFunction)(input) for input in list)


# If your function taking more than 1 variable
def yourFunction(input1, input2):
    # anything in your loop
    return XXX

Parallel(n_jobs=num_cores)(delayed(yourFunction)(input1, input2) for input1 in list1 for input2 in list2)

When need to return things, simply point it to a variable, it will be saved as a list:

results = Parallel(n_jobs=num_cores)(delayed(yourFunction)(input) for input in list)

When need to return data.frame and later concatenate together, using mp.Pool

import multiprocessing as mp
with mp.Pool(processes = num_cores-1) as pool:
    resultList = pool.map(yourFunction, argvList))

results_df = pd.concat(resultList)

Source:
https://stackoverflow.com/questions/9786102/how-do-i-parallelize-a-simple-python-loop
https://blog.dominodatalab.com/simple-parallelization/
https://stackoverflow.com/questions/36794433/python-using-multiprocessing-on-a-pandas-dataframe

Pandas add an empty row or column to a dataframe with index

Add empty row with or without name:

df.append(pd.Series(name='NameOfNewRow')) # name the new row
df.append(pd.Series(), ignore_index=True) # not name the new row

Add empty column:

df['new'] = pd.Series()

Source:
https://stackoverflow.com/questions/39998262/append-an-empty-row-in-dataframe-using-pandas
https://stackoverflow.com/questions/16327055/how-to-add-an-empty-column-to-a-dataframe

Python NumPy replace nan in array to 0 or a number

Replace nan in a numpy array to zero or any number:

a = numpy.array([1,2,3,4,np.nan])

# if copy=False, the replace inplace, default is True, it will be changed to 0 by default
a = numpy.nan_to_num(a, copy=True) 

# if you want it changed to any number, eg. 10.
numpy.nan_to_num(a, copy=False, nan=10) 

Replace inf or -inf with the most positive or negative finite floating-point values or any numbers:

a = numpy.array([1,2,3,4,np.inf])

# change to the most positive or finite floating-point value by default
a = numpy.nan_to_num(a, copy=True)

# if you want it changed to any number, eg. 10.
a = numpy.nan_to_num(a, copy=True, posinf=10)

# if you want it changed to any number, eg. 10., same goes to neginf
a = numpy.nan_to_num(a, copy=True, posinf=10, neginf=-10)

The parameter posinf and neginf only works when your numpy version is equal or higher than 1.17.

Source:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.nan_to_num.html

Pickle UnicodeDecodeError incompatible between python2 and python3

When loading a pickle file saved using python2 and reload it into python3, you might get such errors:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position X: ordinal not in range(128)

This is due to an incompatible issue between python2 and python3, below is the easiest way to fix it, by adding encoding='latin1':

pickle.load(file, encoding='latin1')

There are definitely other ways, but this is the simplest one.

Source:
https://stackoverflow.com/questions/11305790/pickle-incompatibility-of-numpy-arrays-between-python-2-and-3

Pandas remove rows or columns with null/nan/missing values

Remove rows with nan/null/missing values:

df = df.dropna(axis=0, how='any') # Remove if any value is na
df = df.dropna(axis=0, how='all') # Remove if all values are na

Remove columns with nan/null/missing values:

df = df.dropna(axis=1, how='any') # Remove if any value is na
df = df.dropna(axis=1, how='all') # Remove if all values are na

Defaut remove is inplace=False, if you want to remove inplace, add inplace=True

Source:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html