After preprocessing the data, we are now ready to start an NMF analysis.
The wrapper function for the NMF solvers in the ButchR package is
run_NMF_tensor
. It is called as follows:
Applying Non-Negative Matrix Factorization (NMF) to normalized transcriptome data (RNA-seq). Using:
Depending on the choice of parameters (dimensions of the input matrix, number of iterations), this step may take some time. Note that the algorithm updates the user about the progress in the iterations.
##----------------------------------------------------------------------------##
## run NMF ##
##----------------------------------------------------------------------------##
factorization_ranks <- 2:10
rna_nmf_exp <- run_NMF_tensor(X = corces_rna_norm,
ranks = factorization_ranks,
method = "NMF",
n_initializations = 10,
iterations = 10^4,
convergence_threshold = 40,
extract_features = TRUE)
## [1] "2021-09-02 16:23:14 UTC"
## Factorization rank: 2
## [1] "NMF converged after 71,106,122,177,111,84,168,66,84,106 iterations"
## [1] "2021-09-02 16:23:19 UTC"
## Factorization rank: 3
## [1] "NMF converged after 141,152,77,129,177,112,95,159,118,91 iterations"
## [1] "2021-09-02 16:23:27 UTC"
## Factorization rank: 4
## [1] "NMF converged after 122,244,107,213,94,138,239,138,140,93 iterations"
## [1] "2021-09-02 16:23:36 UTC"
## Factorization rank: 5
## [1] "NMF converged after 182,187,151,174,310,244,192,164,187,191 iterations"
## [1] "2021-09-02 16:23:47 UTC"
## Factorization rank: 6
## [1] "NMF converged after 159,148,142,215,208,166,153,273,267,355 iterations"
## [1] "2021-09-02 16:23:59 UTC"
## Factorization rank: 7
## [1] "NMF converged after 220,171,205,313,116,249,168,163,145,295 iterations"
## [1] "2021-09-02 16:24:13 UTC"
## Factorization rank: 8
## [1] "NMF converged after 217,176,199,289,257,127,216,153,132,138 iterations"
## [1] "2021-09-02 16:24:26 UTC"
## Factorization rank: 9
## [1] "NMF converged after 167,209,176,155,186,253,241,136,161,191 iterations"
## [1] "2021-09-02 16:24:39 UTC"
## Factorization rank: 10
## [1] "NMF converged after 304,265,312,368,301,214,172,295,161,279 iterations"
## No optimal K could be determined from the Optimal K stat
rna_nmf_exp
## class: ButchR_NMF
## Original matrix dimension: 21811 45
## Factorization performed for ranks: 2 3 4 5 6 7 8 9 10
## Optimal K based on factorization metrics: Please select manualy
##
## Running parameters:
## method = NMF
## n_initializations = 10
## iterations = 10000
## stop_threshold = 40
## extract_features = TRUE
To make the features in the W matrix comparable, the factorization is normalized to make all columns of W sum 1.
rna_norm_nmf_exp <- normalizeW(rna_nmf_exp)
Several functions to access the results are available:
HMatrix
Returns the matrix H
for the optimal decomposition (i.e. the one with
the minimal residual) for a specific factorization rank k
. The number
of rows of the matrix H
corresponds to the chosen factorization rank.
Extract the matrix H for k=2 and find its dimension:
Hk2 <- HMatrix(rna_norm_nmf_exp, k = 2)
class(Hk2)
## [1] "matrix" "array"
dim(Hk2)
## [1] 2 45
Inspect exposure values of the first 5 samples in the H matrix k=2:
Hk2[, 1:5]
X5852.HSC | X6792.HSC | X7256.HSC | X7653.HSC | X5852.MPP |
---|---|---|---|---|
73737.72 | 73937.34 | 66949.25 | 74760.02 | 76202.74 |
20470.68 | 21970.29 | 26204.85 | 20569.97 | 18053.79 |
If no value for k
is supplied, the function returns a list of
matrices, one for every factorization rank.
Hlist <- HMatrix(rna_norm_nmf_exp)
class(Hlist)
## [1] "list"
length(Hlist)
## [1] 9
Hlist$k2[, 1:5]
X5852.HSC | X6792.HSC | X7256.HSC | X7653.HSC | X5852.MPP |
---|---|---|---|---|
73737.72 | 73937.34 | 66949.25 | 74760.02 | 76202.74 |
20470.68 | 21970.29 | 26204.85 | 20569.97 | 18053.79 |
WMatrix
Returns the matrix W
for the optimal decomposition (i.e. the one with
the minimal residual) for a specific factorization rank k
. The number
of columns of the matrix W
corresponds to the chosen factorization
rank.
Extract the matrix W for k=2 and find its dimension:
Wk2 <- WMatrix(rna_norm_nmf_exp, k = 2)
class(Wk2)
## [1] "matrix" "array"
dim(Wk2)
## [1] 21811 2
Inspect the contribution of the first 5 genes to the signatures in matrix W k=2:
Wk2[1:5, ]
V1 | V2 | |
---|---|---|
A1BG | 1.99e-05 | 2.00e-05 |
A1BG-AS1 | 1.21e-05 | 9.50e-06 |
A1CF | 0.00e+00 | 1.34e-05 |
A2M | 3.88e-05 | 8.75e-05 |
A2M-AS1 | 5.25e-05 | 4.86e-05 |
If no value for k
is supplied, the function returns a list of
matrices, one for every factorization rank.
Wlist <- WMatrix(rna_norm_nmf_exp)
class(Wlist)
## [1] "list"
length(Wlist)
## [1] 9
Wlist$k2[1:5, ]
V1 | V2 | |
---|---|---|
A1BG | 1.99e-05 | 2.00e-05 |
A1BG-AS1 | 1.21e-05 | 9.50e-06 |
A1CF | 0.00e+00 | 1.34e-05 |
A2M | 3.88e-05 | 8.75e-05 |
A2M-AS1 | 5.25e-05 | 4.86e-05 |