Responsible teachers:
The goal of the Data Analysis Module during the Summer Semester is to provide hands-on experience in data analysis of large scale datasets and get first insights into using computational tools to provide a reproducible data analysis.
After this module, you will have
We have defined 5 topics in data and image analysis; each project will comprise up to 5 different sub-projects. Most of the time, these 5 sub-projects are very similar to each other but analyze slightly different datasets.
You can find a description of the 5 topics here:
Topic 01 : Image analysis (Karl Rohr - Python)
Topic 03 : Proteome-wide Screen for RNA-dependent Proteins (Maiwen Caudron-Herger - R)
Topic 05: Drug repurposing for cancer treatment (Carl Herrmann - R)
You will find a description of the projects an a list of supervisors/tutors in these description files.
Important information:
For each project, there will be a tutor assigned to this project. Each team within a project will have a weekly online meeting with his tutor on Wednesday between 10am and 1 pm during 20-30 minutes.
VERY IMPORTANT: as the weekly time which the tutor can dedicate to your project is limited, you should carefully prepare your meeting. We have provided a template which should help you organize your weekly meeting efficiently!
Each student will have an individual evaluation! This will take into account the 2 presentations listed above, as well as the report (markdown report / jupyter notebook depending on the projects).
Here are the relevant points taken into account during the project proposal presentation:
During the oral presentations, each student will be asked to explain part of the analysis, especially to explain the code! So everyone should make sure to be involved in the project.
Final reports will be submitted (or “committed”) to the Github repository of the group as a .Rmd
or .ipynb
file.
Important note: Science is collaboration! so please make sure to share your insights/knowledge with other groups! You are free to choose whatever way to do so, e.g. Whatsapp groups or Slack groups.
Depending on the projects, you will use either R (Topics 02/03/04/05) or python (Topics 01).
You will use RStudio, and create a R markdown document.
These will consist in a mixture of plain text (explanations about the analysis, comments,…) and code pieces (called chunks
). The advantage is that the report will be automatically generated (either as pdf or html document) when compiling the markdown file. All plots will be automatically and dynamically created from the code pieces in the markdown document.
Have a look at this tutorial or this one to get started with RMarkdown; RMarkdown is very easy to generate with RStudio.
Similar to R markdow documents, the Jupyter Notebook offers a way to mix markdown text together with Python code. Installing Jupyter Notebook requires the installation of Python. Just follow the instructions on the previous link.
Git is a system to handle collaborative projects, in which each member of the team is contributing to the project. You can check this website for a simple intro to Git/GitHub.
Git can be used either from the command line, or using GitHub Desktop, a GUI manager which makes commiting changes, etc… very easy.
This tool will help you (and us…) track the progress of your project.
Introduction to GitHub
Here some slides on GitHub
These are the Python Notebook files with Python intro provided by David Schwarzenbacher