[BC]2 Tutorial

Defining genomic signatures with Non-Negative Matrix Factorization

Carl Herrmann & Andres Quintero

13 September 2021

[BC]2 Tutorial - Defining genomic signatures with Non-Negative Matrix Factorization

Welcome to the [BC]2 Tutorial - Defining genomic signatures with Non-Negative Matrix Factorization This tutorial will guide you through all the necessary steps to extract genomic signatures from high-dimensional data using Non-Negative Matrix Factorization (NMF).

The tutorial will run over one complete day (Monday, 13 September 2021) from 9:00 am to 4:00 pm.


Is this tutorial for me?

The aim of this tutorial is to learn how to use the R package ButchR to perform signature identification in different types of genomic data using NMF. To explore the results of an NMF analysis, we will provide a ready to use Docker image with RStudio, ButchR, and pre-loaded publicly available datasets, including bulk and single-cell RNA-seq data, as well as an interactive application. The tutorial will show how to run an NMF-based analysis from start to end.

If you are a computational biologist dealing with large scale omics datasets (e.g. RNA-seq, ATAC-seq, …) looking for solutions to reduce the dimensionality of the data to a small set of informative signatures, this tutorial will be perfect for you.


While we will start at a very basic level, we would strongly encourage absolute beginners, who have never ever worked with R, to complete a very simple online R intro course on DataCamp ("Introduction to R"), which will give you the very basic first concepts on what R is, and how to do some very simple operations with it.

In order to avoid any software compatibility and installation issues the practical sessions of the tutorial will be done using a Docker image, please follow the instruction given in Run Docker image to install Docker and run the Docker image for the tutorial before Monday, 13 September 2021.


Activity Time
Session 1 - Introduction
Ice breaker: Course expectations 9:00 - 9:30
Introduction to Non-Negative Matrix Factorization (NMF) and its usage in genomics 9:30 - 10:15
Coffee break and discussion 10:15 - 10:45
Session 2 - Matrix decomposition
How to use ButchR with Docker 10:45 - 11:15
Pre-processing data to use with NMF 11:15 - 11:45
Matrix decomposition with ButchR 11:45 - 12:15
Lunch break 12:15 - 13:30
Session 3 - Results interpretation
Selection of optimal factorization rank 13:30 - 14:00
Signature identification 14:00 - 14:30
Feature extraction and enrichment analysis 14:30 - 15:00
Interactive analysis with ShinyButchR 15:00 - 15:30
Session 4 - Discussion
Discussion and concluding remarks 15:30 - 16:00
[BC]2 Welcome lecture 17:00

Preparation before tutorial

Run Docker image

Please complete the following three steps before Monday, 13 September 2021.

  1. Install Docker
  2. Run Docker image hdsu/butcher-bc2
  3. Test Docker image hdsu/butcher-bc2

Please document your progress in this Google Sheet

Practical sessions

Session 2 - Matrix decomposition

On the "Matrix decomposition" session, we will guide you through the steps to perform a NMF decomposition using the R package ButchR on a publicly available RNA-seq dataset from the human hematopoietic system (Corces et al. 2016)

  1. Pre-processing data to use with NMF
  2. Matrix decomposition with ButchR

Session 3 - Results interpretation

On the "Results interpretation" session, we will analyze the NMF decomposition results and learn how to extract relevant features from the inferred molecular signatures.

  1. Selection of optimal factorization rank
  2. Signature identification
  3. Feature extraction and enrichment analysis


To conclude the tutorial we ask you to select one of the four assignments found in the Docker image, run a NMF decomposition, use the resulting matrices to perform UMAP and identify the association of the NMF signatures with the annotation variable Celltype found in the metadata of each dataset.

All the matrices have been previously normalized and are ready to use with ButchR.

These datasets are a small sample from four different tissues of a scRNA-seq human atlas (Han, X., Zhou, Z., Fei, L. et al. Construction of a human cell landscape at single-cell level. Nature 581, 303–309 (2020). https://doi.org/10.1038/s41586-020-2157-4).

The data, and the assignments used in the tutorial can be also found here.


The course will take place onsite.

Registration starts at 8:30 at the Kollegienhaus, please don't forget your Covid Certificate (see [BC]2 Covid-19 Protection Plan).

Technical pre-requisites

The attendees are expected to bring their own laptop with Docker pre-installed. To avoid any delay in setting up the container during the practice sessions, the Docker image for the workshop should be downloaded beforehand. This can be done by opening a command-line terminal (e.g., Powershell and Terminal) and running the command “docker pull hdsu/butchr”. A complete overview of how to install Docker can be found here: https://docs.docker.com/desktop/.

Please check that you can run the hdsu/butchr image without error message!