IRTG Course

Introduction to R for genomics

Carl Herrmann & Carlos Ramirez

8-9 December 2021

Standard Preprocessing using Seurat

Some standard steps are usually carried out in scRNA-Seq prior to further analysis as QC, dimensional reduction and marker visualization. Here, we will use the Seurat R package to perform these steps which is increasingly becoming the most popular tool, however, there are some other options as SingleCellExperiment in R and scanpy available for python. First, we need to define a Seurat object.

Creating Seurat object

We create a Seurat object using the CreateSeuratObject function as follows. Use the ?function helper in R to get information about the parameters that are need to be provided to the function.


pbmc.seurat <- CreateSeuratObject(
  counts = pbmc.mtx, 
  project = 'PBMC', 
  assay = 'RNA', 
  min.cells = 1, 
  min.features = 1

The variable pbmc.seurat now contains the Seurat object that we can feed into the package. If we print the variable we get information about the number of genes and cells.

## An object of class Seurat 
## 12673 features across 500 samples within 1 assay 
## Active assay: RNA (12673 features, 0 variable features)

Exploring the Seurat object

Seurat objects can be seen as a container of different features. At this step it contains our gene expression matrix, but in addition it can store metadata, processed data, information from different assays, for example, scATACSeq, scCITESeq or unspliced transcripts.

We can explore the seurat object using the $ to explore its metadata in combination with the tab key. For example, during the creation of the seurat object the number of counts quality metric is calculated and added to the metadata. We can explore this metric by accessing the metadata as follows.

pbmc.seurat$nCount_RNA %>% head
##             1151             1347             4584             1268 
##             5676             2761

We can do the same with the @ operator to explore the different slots. For example, we can extract the original count matrix that we used to create the seurat object as follows:

pbmc.seurat@assays$RNA@counts[1:5, 1:5]
## 5 x 5 sparse Matrix of class "dgCMatrix"
## AL627309.1                .                .                .                .
## AP006222.2                .                .                .                .
## LINC00115                 .                .                .                .
## NOC2L                     .                .                .                .
## PLEKHN1                   .                .                .                .
##            AAATCAACCCTATT-1
## AL627309.1                .
## AP006222.2                .
## LINC00115                 .
## NOC2L                     .
## PLEKHN1                   .

Extracting expression values

Next, we visualize gene counts to see its behavior. We take a look at the expression of the house keeping gene ACTIN Beta and plot an histogram of count values. We will use the the function FetchData which is used to extract values from selected features in the Seurat object and then plot it using an histogram.

actin <- FetchData(pbmc.seurat, vars = 'ACTB')


Create a Seurat object

Load a seurat object using the following command:

pbmc.seurat <- readRDS(url(''))


Find and display the metadata in the seurat object: which command would you use?
a) %>% head
b) pbmc.seurat@metadata %>% head
c) pbmc[[ ]] %>% head
TIP: You can have a look at the [documentation]( of the seurat objects from the GitHub Wiki. Two options are correct.
Answer: %>% head
pbmc.seurat[[ ]] %>% head


Extract and print the first 5 rows and columns of the count matrix from the seurat object
a) GetAssayData(pbmc.seurat, slot='count')[1:5, 1:5]
b) pbmc.seurat@assays$RNA@counts[,1:5] %>% head(n=5)
c) pbmc.seurat[1:5, 1:5]
TIP: Two options are correct.


The file in the follwing URL:

contains 200 cells down-sampled from the 10x PBMC data and stored in tsv format. Load the count matrix in tsv format using the following command.

pbmc.mtx <- read.table('', sep = '\t')

Create a Seurat object using the count matrix.

How many features and cells are present in the count matrix?

  1. 300 features and 250 samples
  2. 19359 features and 200 samples
  3. 11167 features and 200 samples