Open Positions

Scientific associate for cellular deconvolution benchmarking project

We are looking for a scientific associate for a one-year project to implement a benchmarking platform in a cloud environment to test cellular deconvolution methods for RNA-seq/DNA methylation data. Find the job description here


Group picture as of November, 2019

Ana Luísa Costa

PhD student

Daria Doncevic

Master student

Carl Herrmann

Group leader

Carlos Ramirez


Andres Quintero

PhD student

Ashwini Sharma



  • Calvin Chan
  • Qi Wang
  • Nils Kurzawa
  • Sebastian Steinhauser
  • Paul Saary
  • Ron Schwessinger
  • Christian Heyer
  • Asma Hamid
  • Clothilde Chenal
  • Jérémie Perrin

We are happy to welcome motivated students for lab rotation, bachelor and master thesis.


Applications of non-negative matrix factorization to single-cell and bulk genomic datasets

We use epigenomics datasets to characterize neuroblastoma subtypes

Cohort stratification and chromatin feature extraction using neural networks.

OpenLab Epigenomics

The Epigenomics OpenLab is a joined effort of the DKFZ and Medical Faculty to support groups in the processing of their epigenomics datasets. We offer assistance and expertise, as well as access to our processing pipelines, and are happy to host external members to guide them through the analysis.

Please contact us if you are looking for assistance.


Members of the lab are involved in teaching in the Molecular Biotechnology Bachelor and Master Program at the university Heidelberg.


Winter semester 2019 / 2020

Summer semester 2019

Bachelor thesis projects 2020

Here are some possible topics/projects for students wanting do to their bachelor thesis in our group during the summer semester 2020

Topic 1 : using single-cell data to interpret expression data from patients

Currently, more and more single-cell RNA-seq datasets are generated to increase the resolution of transcriptomics to the single-cell level. These datasets allow to understand the mixture of cell types within a tissue sample, and have been applied to create atlases of cell types from mouse embryos. On the other hand, there are thousands of bulk RNA-seq datasets available, which lack this resolution. We are working on implementing methods to re-interpret bulk datasets using single-cell information, and map for example patient data onto trajectories defined from single-cell expression. The project would be to contribute to the development of this method, in particular, the visualization of the data, and to apply it to a large set of pediatric tumor types. Comparison to datasets of normal tumor would be used to validate the method.

Main aspects:

  • data analysis of sequencing data
  • interactive visualization using Shiny
  • comparative genomics (mouse/human)


Topic 2 : single-cell multi-omics integration using auto-encoder strategies

In the last three years, a new wave of technologies that allows profiling multiple molecular levels in single-cells at the same time has come to light, e.g.; CITEseq, scCAT-seq, scNMT-seq, and scDam&T. Therefore it is crucial to develop new methods that take into account multiple layers of information at the same time to find clusters of cells, identify interactions between such layers and generate signatures or factors underlying the differences between cells.

Auto-encoders are a popular way to achieve dimensional reduction in a non-linear way, and extract relevant features from a dataset. This can be applied e.g. to a single-cell dataset and can be compared to a method based on linear approaches such as principal component analysis or non-negative matrix factorization. Such approaches can also be used to perform integration of multi-omics datasets. The goal of the project ist to explore the possibilities of auto-encoders for integrating single-cell RNA-seq and single-cell ATAC-seq from different in-house and published datasets, and compare the result of these integrations to other methods implemented e.g. in popular R packages or based on integrative non-matrix factorization

Main aspects:

  • machine learning using python
  • handling of large single-cell datasets


Topic 3 : improving stratification of schizophrenia patients using multi-omics datasets

Schizophrenia is a severe disease whose diagnosis is mostly based on clinical interviews. Within a large consortium, we are working on improving this by identifying molecular signatures based on multiple omics data types, for example DNA methylation, and gene expression (RNA-seq). This integration will likely improve stratification of patients based on a single data type. The goal of the project would be to implement several strategies to perform this data integration (neural networks, integrative linear methods, …) to identify patient groups and benchmark these approaches against single data stratification.

Main aspects

  • data processing of primary RNA-seq and methylation data
  • implementation of data integration strategies using neural networks and matrix factorization
  • biological validation of the signatures using literature-based knowledge


Topic 4 : differential “in-silico phenotyping” of tumor and normal tissues

The existence of large RNA-seq datasets of tumor tissue and matching normal tissue allows to conduct comparative studies. In particular, recent approaches allow to determine the activity of pathways and transcription factors from the transcriptomic data, which can be used to understand how pathways and master regulators are jointly activated or seem to have mutually exclusive patterns. In recent projects, we have for example described how mesenchymal phenotypes appear to be tightly related to pathway activation, for example the RAS pathway. The goal of the project is to conduct a large scale analysis of the activity patterns of pathways and master regulators, and to understand how these patterns are perturbed in tumor tissues compared with normal counterpart. We will in particular focus on processes related to ferroptosis across various tumor types to describe how this process is related to other pathways.

Main aspects

  • large scale processing of transcriptomic data from TCGA
  • implementation of statistical methods to study differential correlation
  • visual representation of the data and interactive data mining.