Pandas merge or sum values in rows with the same index

When the values are numeric and apply operators:


When the values are numeric and apply functions:

# Common usage, merging values
df[1].groupby(level=0).apply(lambda xList: ';'.join(list([x for x in xList])))


Python matplotlib: all about fonts

Fonts output into pdf as text, not shape, to be recognized in Illustrator:

import matplotlib
matplotlib.rcParams['pdf.fonttype'] = 42
matplotlib.rcParams['ps.fonttype'] = 42

Change fonts style for all, change the default:

import matplotlib as mpl
matplotlib.rcParams[''] = 'Arial'

Change fonts style for labels, tick marks, and titles: (for each plot, not changing the default)

plt.xlabel('XXXX', fontsize=12, fontname='Arial')
plt.ylabel('XXXXX', fontsize=12, fontname='Arial')


plt.title('XXXXX', fontsize=14, fontname='Arial')

Change legend font style: (for each plot, not changing the default)

plt.legend(prop=matplotlib.font_manager.FontProperties(family='Arial', size=12, weight='bold', style='normal'))


Python Jupyter notebook share the variable across notebooks

yourVar = 'data or your variable'
%store yourVar
del yourVar # only deletes the variable in this notebook but not in store

In the second notebook:

%store -r yourVar # if you have a variable with the same name, it will rewrite it.

To list and delete the stored variable:

%store #list variable in store
%store -d yourVar #delete the variable in store


Ignore row with only NaN in plotHeatmap – deepTools

If there are NaN in the output from computeMatrix, the generated heatmap is not sorted and a warning message stating Mean of empty slice will show up.

To overcome this, those null values need to be replaced using 0 in the computeMatrix step by --missingDataAsZero tag.

computeMatrix scale-regions -S -R xxx.bed --missingDataAsZero -m xxx -b xxx -a xxx --numberOfProcessors xx -o xxx.gz

plotHeatmap -m xxx.gz -out xxx.png


Python return os.system and subprocess output as a string

When we want to use Unix command in python we can directly use os.system() to realize it. However, if we only want to return the output as a string, for example, return ls file names into a string, we need to use subprocess.check_out instead.

To note that, the output from subprocess.check_out() is a bytes object instead of a string, thus we need to further decode to transform into a string.

fileList = subprocess.check_output('ls someFolder/*', shell=True).decode('utf-8').strip().split('\n')