Post-processing & Visualization

TumorDecon provides functions for creating the following plots, for digital cytometry performed with the LM22 signature matrix (or up-regulated genes for the 22 cell types in LM22):

Box plots that show immune cells frequencies in descending order, so that users can easily recognize the most frequent immune cells within a group of samples/patients
Bar charts that show the estimated percentage of each immune cell within a group of individual samples
Pair plots that illustrate the correlation between different immune cells
Cluster heat maps that group cells based on their euclidean distance

WARNING: The following 4 functions for plotting data currently only work for results generated with LM22 cell signatures, as the names of the cell types are hard-coded into the color plots. If you are working with a different group of cell types, you will need to define your own dictionary for combing cell types, with the combine_celltypes() function, and then drop any column names with non-LM22 cell types. An example of this is provided further down this page.

cell_frequency_boxplot(sample_cell_freq, title='', title_fontsize=10, save_as=None, axes_style=None, font_scale=1, xylabel_fontsize=10, rcParams={'figure.figsize': (15, 7)})

Create boxplot of the output from the tumor_deconvolve() function for digital cytometry. Boxplot shows distribution of frequencies across all samples (patients), for each cell type.

Parameters:

sample_cell_freq – Pandas DataFrame. Output file from the tumor_deconvolve() function for digital cytometry. Rows are patient IDs, columns are cell typs
title – String. Plot title
title_fontsize – Int. Font-size for title.
save_as – String. Filename to save plot to. File format is inferred from filename’s extension (‘.png’, ‘.pdf’, ‘.eps’, etc). If None, plot is not saved.
axes_style – Dictionary. Parameters to pass to sns.set_style, to customize plot. Valid dictionary keys can be found by running “sns.axes_style()” with no arguments
font_scale – Float. Scaling for plot elements such as tick labels
xylabel_fontsize – Int. Font-size for axis labels.
rcParams – Dictionary of parameters to pass as the rc argument to sns.set function. Valid dictionary keys can be found at https://matplotlib.org/stable/tutorials/introductory/customizing.html

Returns:

Matplotlib boxplot

Example 1: Boxplot. The data plotted in this example is all patients from the result of running Cibersort on the Colorectal Adenocarcinoma RNA Seq v2 data set from cBioPortal.org

>>> td.cell_frequency_boxplot(results, save_as="boxplots.png")

cell_frequency_barchart(sample_cell_freq, title=' ', title_fontsize=10, save_as=None, axes_style=None, font_scale=1, xylabel_fontsize=10, rcParams={'figure.figsize': (15, 7)})

Create barcharts of the output from the tumor_deconvolve() function for digital cytometry. Plot shows the cell type distribution for each individual sample (patient) in the output file.

Parameters:

sample_cell_freq – Pandas DataFrame. Output file from the tumor_deconvolve() function for digital cytometry. Rows are patient IDs, columns are cell typs
title – String. Plot title
title_fontsize – Int. Font-size for title.
save_as – String. Filename to save plot to. File format is inferred from filename’s extension (‘.png’, ‘.pdf’, ‘.eps’, etc). If None, plot is not saved.
axes_style – Dictionary. Parameters to pass to sns.set_style, to customize plot. Valid dictionary keys can be found by running “sns.axes_style()” with no arguments
font_scale – Float. Scaling for plot elements such as tick labels
xylabel_fontsize – Int. Font-size for axis labels.
rcParams – Dictionary of parameters to pass as the rc argument to sns.set function. Valid dictionary keys can be found at https://matplotlib.org/stable/tutorials/introductory/customizing.html

Returns:

Matplotlib stacked barchart

Example 2: Barchart. The data plotted in this example is the first 12 patients from the result of running Cibersort on the Colorectal Adenocarcinoma RNA Seq v2 data set from cBioPortal.org

>>> td.cell_frequency_barchart(results, save_as="barcharts.png")

pair_plot(sample_cell_freq, title='', title_fontsize=10, font_scale=1, save_as=None, figsize=(20, 20))

Create pair plots from the output of the tumor_deconvolve() function for digital cytometry.

Parameters:

sample_cell_freq – Pandas DataFrame. Output file from the tumor_deconvolve() function for digital cytometry. Rows are patient IDs, columns are cell typs
title – String. Plot title
title_fontsize – Int. Font-size for title.
save_as – String. Filename to save plot to. File format is inferred from filename’s extension (‘.png’, ‘.pdf’, ‘.eps’, etc). If None, plot is not saved.
font_scale – Float. Scaling for plot elements such as tick labels
figsize – Tuple. (x,y) size of the figure

Returns:

Matplotlib pair plots

Example 3: Pair Plots. The data plotted in this example is the first 12 patients from the result of running Cibersort on the Colorectal Adenocarcinoma RNA Seq v2 data set from cBioPortal.org

>>> td.pair_plot(results, save_as="pairplots.png")

hierarchical_clustering(sample_cell_freq, title='', title_fontsize=10, font_scale=1, save_as=None, figsize=(20, 5))

Create hierarchical clustered heatmap of the output from the tumor_deconvolve() function for digital cytometry. Plot clusters samples/patients by similar cellular profile distributions.

Parameters:

sample_cell_freq – Pandas DataFrame. Output file from the tumor_deconvolve() function for digital cytometry. Rows are patient IDs, columns are cell typs
title – String. Plot title
title_fontsize – Int. Font-size for title.
save_as – String. Filename to save plot to. File format is inferred from filename’s extension (‘.png’, ‘.pdf’, ‘.eps’, etc). If None, plot is not saved.
axes_style – Dictionary. Parameters to pass to sns.set_style, to customize plot. Valid dictionary keys can be found by running “sns.axes_style()” with no arguments
figsize – Tuple. (x,y) size of the figure

Returns:

Matplotlib hierarchical clustered heatmap

Example 4: Cluster Maps. The data plotted in this example is all patients from the result of running Cibersort on the Colorectal Adenocarcinoma RNA Seq v2 data set from cBioPortal.org

>>> td.hierarchical_clustering(results, save_as="clustermaps.png")

combine_celltypes(df, cols_to_combine=None)

Given a Pandas DataFrame (typically the output from the tumor_deconvolve() function) and a dictionary of column names (typically cell subtypes to combine into a single celltype), create a new DataFrame where the frequencies of related cell types are summed up and combined into a single column.

Parameters:

df – Pandas Dataframe. Output of td.tumor_deconvolve()
cols_to_combine – Dictionary. Keys are the desired names of any new cell type columns, values are an arary of current column names to combine under the key’s name (all unmentioned column names are left as they are). Default value is the dictionary for combining common cell types from LM22

Return type:

Pandas DataFrame

Example: Assume we have run Cibersort with LM22, and saved the output into the variable results. LM22 contains 3 types of macrophages (M0, M1, M2). If we are just interested in the distribution of ALL macrophages, we can combine these into one general column called “Macrophages”:

>>> print(results[['Macrophages M0', 'Macrophages M1', 'Macrophages M2']])
Patient_ID       Macrophages M0  Macrophages M1  Macrophages M2
TCGA-3L-AA1B-01        0.159410        0.035259        0.046693
TCGA-4N-A93T-01        0.051245        0.000000        0.013803
TCGA-4T-AA8H-01        0.000000        0.000000        0.019603
TCGA-5M-AAT4-01        0.119760        0.000000        0.016929

>>> dict = {'Macrophages':['Macrophages M0', 'Macrophages M1', 'Macrophages M2']}
>>> results2 = td.combine_celltypes(results, cols_to_combine=dict)
>>> print(results2)
Patient_ID       B cells naive  B cells memory  Plasma cells  ...  Eosinophils  Neutrophils  Macrophages
TCGA-3L-AA1B-01       0.076577        0.000000      0.019106  ...     0.000000          0.0     0.241362
TCGA-4N-A93T-01       0.004291        0.054434      0.045281  ...     0.007008          0.0     0.065048
TCGA-4T-AA8H-01       0.013573        0.000000      0.118549  ...     0.033994          0.0     0.019603
TCGA-5M-AAT4-01       0.000000        0.066355      0.000000  ...     0.000000          0.0     0.136689

Example: Combining cell types for results generated from a custom signature matrix. Again assume we have run Cibersort with LM22, and saved the output into the variable results:

>>> dict = {'CD8 T cells':['CD8_subtype_1', 'CD8_subtype_2'],
            'CD4 T cells':['CD4_subtype_1', 'CD4_subtype_2'],
            'B cells':['B_cell_subtype_1', 'B_cell_subtype_2'],
            'Monocytes':['mono_subtype_1', 'mono_subtype_2'],
            'NK cells':['NK_subtype_1', 'NK_subtype_2'],
            'Neutrophils':['Neutrophils_subtype_1', 'Neutrophils_subtype_2'],
            'Endothelial':['Endothelial_subtype_1', 'Endothelial_subtype_2'],
            'Fibroblasts':['Fibroblast_subtype_1', 'Fibroblast_subtype_2'],
            }
>>> results2 = td.combine_celltypes(results, cols_to_combine=dict).drop(axis=1, columns=["Endothelial","Fibroblasts"])
>>> td.cell_frequency_barchart(results2)

Example: If no argument is passed in for cols_to_combine, the function will attempt to use a sensible categorization based on the cell types present in the LM22 signature matrix:

>>> print(results.columns)
Index(['B cells naive', 'B cells memory', 'Plasma cells', 'T cells CD8',
       'T cells CD4 naive', 'T cells CD4 memory resting',
       'T cells CD4 memory activated', 'T cells follicular helper',
       'T cells regulatory (Tregs)', 'T cells gamma delta', 'NK cells resting',
       'NK cells activated', 'Monocytes', 'Macrophages M0', 'Macrophages M1',
       'Macrophages M2', 'Dendritic cells resting',
       'Dendritic cells activated', 'Mast cells resting',
       'Mast cells activated', 'Eosinophils', 'Neutrophils'],
      dtype='object', name='Patient_ID')

>>> results_simplified = td.combine_celltypes(results)
WARNING: No dictionary defined for combining columns... Attempting to use default dict for LM22 signatures
>> print(results_simplified.columns)
Index(['Plasma cells', 'T cells gamma delta', 'Monocytes', 'Eosinophils',
       'Neutrophils', 'B cells', 'CD4 T cells', 'CD8 T cells', 'NK cells',
       'Macrophages', 'Mast cells', 'Dendritic cells'],
      dtype='object', name='Patient_ID')