01/11/202301/19/2023 by bioinfocore

Rasterize points in scatterplot

This is needed when the plot has too many points but you would still want the legend and label to be saved as “pdf” to be editable. Works for large heatmaps too.

ax.scatter(..., rasterized=True)
ax.pcolormeash(..., rasterized=True)

Reference:
https://btjanaka.net/blog/matplotlib-figures/

06/27/202201/19/2023 by bioinfocore

python select items from a list using a boolean array

Python
numpy, Python

from itertools import compress

arr = np.array([x=='TRUE' for x in xList]) # Turn a string list to boolean
list(compress(yList, arr))

Source:
https://www.geeksforgeeks.org/python-itertools-compress/

06/24/202206/24/2022 by bioinfocore

Slurm sbatch specify nodes or node list

Unix
slurm, Unix

#SBATCH --nodelist=node[01-09]
#SBATCH --nodelist=node01

06/24/202206/24/2022 by bioinfocore

Pandas merge or sum values in rows with the same index

Python
Pandas, Python

When the values are numeric and apply operators:

df.groupby(level=0).sum()
df.groupby(df.index).sum()
df[specifiedColumn].groupby(level=0).sum()

When the values are numeric and apply functions:

# Common usage, merging values
df[1].groupby(level=0).apply(lambda xList: ';'.join(list([x for x in xList])))

Source:
https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.core.groupby.GroupBy.apply.html
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.sum.html

06/17/202206/17/2022 by bioinfocore

Python Pandas merge multiple row/column values into one row/column

Python
Pandas, Python

Merge multiple row or column values into one row or column:

df.loc[colSelected,].apply(lambda x: ';'.join(x.astype(str)), axis=0)
df.loc[,rowSelected].apply(lambda x: ';'.join(x.astype(str)), axis=1)

Source:
https://stackoverflow.com/questions/33098383/merge-multiple-column-values-into-one-column-in-python-pandas

10/21/202110/22/2021 by bioinfocore

Python matplotlib: all about fonts

Blogs, Python
Blogs, Python

Fonts output into pdf as text, not shape, to be recognized in Illustrator:

import matplotlib
matplotlib.rcParams['pdf.fonttype'] = 42
matplotlib.rcParams['ps.fonttype'] = 42

Change fonts style for all, change the default:

import matplotlib as mpl
matplotlib.rcParams['font.family'] = 'Arial'

Change fonts style for labels, tick marks, and titles: (for each plot, not changing the default)

plt.xlabel('XXXX', fontsize=12, fontname='Arial')
plt.ylabel('XXXXX', fontsize=12, fontname='Arial')

plt.xticks(fontname='Arial')
plt.yticks(fontname='Arial')

plt.title('XXXXX', fontsize=14, fontname='Arial')

Change legend font style: (for each plot, not changing the default)

plt.legend(prop=matplotlib.font_manager.FontProperties(family='Arial', size=12, weight='bold', style='normal'))

Source:
https://jonathansoma.com/lede/data-studio/matplotlib/exporting-from-matplotlib-to-open-in-adobe-illustrator/
https://stackoverflow.com/questions/20753782/default-fonts-in-seaborn-statistical-data-visualization-in-ipython
https://stackoverflow.com/questions/47112522/matplotlib-how-to-set-legends-font-type
https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html

06/10/202110/21/2021 by bioinfocore

Unix generate a random string

Unix
Unix

head /dev/urandom | tr -dc A-Za-z0-9 | head -c10

# Assigning it to a variable
ranNum=$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c10)

Source:
https://unix.stackexchange.com/questions/230673/how-to-generate-a-random-string

04/26/202110/21/2021 by bioinfocore

Unix rearrange columns

Unix
Unix

awk 'BEGIN {FS="\t"; OFS="\t"} {print $2, $3, $4, $1}' input.txt > output.txt

FS and OFS specify the input/output separator

Source:
https://unix.stackexchange.com/questions/344541/easiest-way-to-rearrange-columns-and-manipulate-text-file

03/16/202110/21/2021 by bioinfocore

Liftover bam files

Blogs, Unix
Blogs, Unix

The most straightforward way is using CrossMap.

Taking from hg19 to hg38 as example:

pip install CrossMap

CrossMap.py bam -a hg19ToHg38.over.chain input.bam output
#.bam extension will be added automatically

genome liftover chain files can be downloaded here: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/liftOver/ (change according to your needs)

It is suggested to always use ‘-a’ option according to the CrossMap website.

Source:
http://crossmap.sourceforge.net/#convert-bam-cram-sam-format-files

01/07/202110/21/2021 by bioinfocore

Python Jupyter notebook share the variable across notebooks

yourVar = 'data or your variable'
%store yourVar
del yourVar # only deletes the variable in this notebook but not in store

In the second notebook:

%store -r yourVar # if you have a variable with the same name, it will rewrite it.
yourVar

To list and delete the stored variable:

%store #list variable in store
%store -d yourVar #delete the variable in store

Source:
https://stackoverflow.com/questions/35935670/share-variables-between-different-jupyter-notebooks
https://ipython.org/ipython-doc/rel-0.12/config/extensions/storemagic.html

bioinfo core

Index & solution of bioInfo utilities

Author / bioinfocore