[BC]2 Tutorial

Defining genomic signatures with Non-Negative Matrix Factorization

Carl Herrmann & Andres Quintero

13 September 2021

NMF analysis

Applying NMF

After preprocessing the data, we are now ready to start an NMF analysis.


Call wrapper function

The wrapper function for the NMF solvers in the ButchR package is run_NMF_tensor. It is called as follows:

Applying Non-Negative Matrix Factorization (NMF) to normalized transcriptome data (RNA-seq). Using:

Depending on the choice of parameters (dimensions of the input matrix, number of iterations), this step may take some time. Note that the algorithm updates the user about the progress in the iterations.

##----------------------------------------------------------------------------##
##                             run NMF                                        ##
##----------------------------------------------------------------------------##
factorization_ranks <- 2:10
rna_nmf_exp <- run_NMF_tensor(X                     = corces_rna_norm,
                              ranks                 = factorization_ranks,
                              method                = "NMF",
                              n_initializations     = 10,
                              iterations            = 10^4,
                              convergence_threshold = 40, 
                              extract_features = TRUE)
Click for Answer
## [1] "2021-09-02 16:23:14 UTC"
## Factorization rank:  2 
## [1] "NMF converged after  71,106,122,177,111,84,168,66,84,106 iterations"
## [1] "2021-09-02 16:23:19 UTC"
## Factorization rank:  3 
## [1] "NMF converged after  141,152,77,129,177,112,95,159,118,91 iterations"
## [1] "2021-09-02 16:23:27 UTC"
## Factorization rank:  4 
## [1] "NMF converged after  122,244,107,213,94,138,239,138,140,93 iterations"
## [1] "2021-09-02 16:23:36 UTC"
## Factorization rank:  5 
## [1] "NMF converged after  182,187,151,174,310,244,192,164,187,191 iterations"
## [1] "2021-09-02 16:23:47 UTC"
## Factorization rank:  6 
## [1] "NMF converged after  159,148,142,215,208,166,153,273,267,355 iterations"
## [1] "2021-09-02 16:23:59 UTC"
## Factorization rank:  7 
## [1] "NMF converged after  220,171,205,313,116,249,168,163,145,295 iterations"
## [1] "2021-09-02 16:24:13 UTC"
## Factorization rank:  8 
## [1] "NMF converged after  217,176,199,289,257,127,216,153,132,138 iterations"
## [1] "2021-09-02 16:24:26 UTC"
## Factorization rank:  9 
## [1] "NMF converged after  167,209,176,155,186,253,241,136,161,191 iterations"
## [1] "2021-09-02 16:24:39 UTC"
## Factorization rank:  10 
## [1] "NMF converged after  304,265,312,368,301,214,172,295,161,279 iterations"
## No optimal K could be determined from the Optimal K stat
rna_nmf_exp
Click for Answer
## class: ButchR_NMF 
## Original matrix dimension:  21811 45 
## Factorization performed for ranks:  2 3 4 5 6 7 8 9 10 
## Optimal K based on factorization metrics:  Please select manualy
##  
## Running parameters: 
## method =  NMF  
## n_initializations =  10  
## iterations =  10000  
## stop_threshold =  40  
## extract_features =  TRUE

Normalize W matrix

To make the features in the W matrix comparable, the factorization is normalized to make all columns of W sum 1.

rna_norm_nmf_exp <- normalizeW(rna_nmf_exp)

Accessor functions

Several functions to access the results are available:

HMatrix

Returns the matrix H for the optimal decomposition (i.e. the one with the minimal residual) for a specific factorization rank k. The number of rows of the matrix H corresponds to the chosen factorization rank.

Extract the matrix H for k=2 and find its dimension:

Hk2 <- HMatrix(rna_norm_nmf_exp, k = 2)
class(Hk2)
Click for Answer
## [1] "matrix" "array"
dim(Hk2)
## [1]  2 45

Inspect exposure values of the first 5 samples in the H matrix k=2:

Hk2[, 1:5]
Click for Answer
X5852.HSC X6792.HSC X7256.HSC X7653.HSC X5852.MPP
73737.72 73937.34 66949.25 74760.02 76202.74
20470.68 21970.29 26204.85 20569.97 18053.79

If no value for k is supplied, the function returns a list of matrices, one for every factorization rank.

Hlist <- HMatrix(rna_norm_nmf_exp)
class(Hlist)
Click for Answer
## [1] "list"
length(Hlist)
## [1] 9
Hlist$k2[, 1:5]
Click for Answer
X5852.HSC X6792.HSC X7256.HSC X7653.HSC X5852.MPP
73737.72 73937.34 66949.25 74760.02 76202.74
20470.68 21970.29 26204.85 20569.97 18053.79

WMatrix

Returns the matrix W for the optimal decomposition (i.e. the one with the minimal residual) for a specific factorization rank k. The number of columns of the matrix W corresponds to the chosen factorization rank.

Extract the matrix W for k=2 and find its dimension:

Wk2 <- WMatrix(rna_norm_nmf_exp, k = 2)
class(Wk2)
Click for Answer
## [1] "matrix" "array"
dim(Wk2)
## [1] 21811     2

Inspect the contribution of the first 5 genes to the signatures in matrix W k=2:

Wk2[1:5, ]
Click for Answer
V1 V2
A1BG 1.99e-05 2.00e-05
A1BG-AS1 1.21e-05 9.50e-06
A1CF 0.00e+00 1.34e-05
A2M 3.88e-05 8.75e-05
A2M-AS1 5.25e-05 4.86e-05

If no value for k is supplied, the function returns a list of matrices, one for every factorization rank.

Wlist <- WMatrix(rna_norm_nmf_exp)
class(Wlist)
Click for Answer
## [1] "list"
length(Wlist)
## [1] 9
Wlist$k2[1:5, ]
Click for Answer
V1 V2
A1BG 1.99e-05 2.00e-05
A1BG-AS1 1.21e-05 9.50e-06
A1CF 0.00e+00 1.34e-05
A2M 3.88e-05 8.75e-05
A2M-AS1 5.25e-05 4.86e-05