Here is a list of possible topics for Bachelor thesis in 2023 in our group. These are examples, and topics might slightly evolve depending on your precise interests, and the status of these projects at the begining of the thesis!
We have developed an interpretable deep-learning model (more precisely autoencoder models) to map gene expression profiles to cellular functions, pathways or phenotypes. Using this model, we can recover known functions for specific human tissues, but we can also simulate interventions, such as gene knock-outs or overexpression of specific genes. Hence, we can in principle predict the function of specific genes, or the effect of modulating the expression of certain genes. We would like to apply this model to uncover unknown functions for specific genes, by allowing the model to “learn” new functional associations between genes and functions. To do so, our OntoVAE model needs to be modified, and these modifications need to be tested using leave-one-out strategies. As a biological application, we want to test if our model can predict new functions for the BCAT gene, which has a number of different roles in cancer metabolism. We are collaborating on this with the group of Bernhard Radlwimmer at the DKFZ
Doncevic, D. and Herrmann, C. (2022), “Biologically informed variational autoencoders allow predictive modeling of genetic and drug induced perturbations”, bioRxiv, 22 September.
Francois, L., Boskovic, P., Knerr, J., He, W., Sigismondo, G., Schwan, C., More, T.H., et al. (2022), “BCAT1 redox function maintains mitotic fidelity”, Cell Reports, Vol. 41 No. 3, p. 111524.
CTCF is a transcription factor with a number of functions in genome organisation, chromatin structuring and gene activation. It also appears to play a role in viral integration into the genome. As such, it is a multi-functional transcription factor. In particular, CTCF defines boundaries of so called topological associated domains, but binds also to the promoter of genes. We would like to conduct a comprehensive analysis of CTCF binding site landscapes across multiple tissues and cell types to understand what drives these different functions along the genome. We will conduct integration of multiple chromatin components (histone marks, DNA methylation, sequence motifs,…) to understand the differences between these functions, and if co-factors are playing a role. In addition, we would like to describe universal features as well as tissue-specific features of CTCF.
Single-cell experiments allow to observe and measure the expression variability between cells. While cells from different cell types show different transcriptional programs, such a stochastic bahavior is also observed even between cells from isogenic pools. This transcriptional noise is believed to contribute to fast adaptability to changing environmental conditions. Using single-cell chromatin accessibility datasets, we can now also measure the chromatin variability. In previous studies, we have observed a high epigenetic variability (Liu et al., 2019). Using a new method, we can integrated the scRNA and scATAC-seq data to reconstruct gene regulatory networks, and study the level of regulatory variability across different cells. In this project, we want to understand (1) which genes are subject to a high transcriptional and epigenetic variability, and (2) reconstruct gene regulatory networks and connect these two levels of variability using a simple dynamical model. This will be done using published data with both data modalities (scRNA/scATAC), and in data that we will obtain from CD8 T-cells in the course of the project.