When you connect to the VM using the ssh command, you are in the bash console. You can recognize that you are in the bash console by looking at the prompt (i.e. the left part of the input line); it should look line this:
user5@powerfulrutherford-30fb1:~$
Each participant has a home directory on the virtual machine, for example:
/home/user5/
Each user can jump directly into his home directory using the simple command
cd
Sometimes, we will use R to process the data; to activate R, simply type the following into the bash console:
/usr/bin/R
Now, you will be in the R console, which you can recognize by looking at the prompt, which now looks like this:
>
To get back to the bash console, simply type the following in the R console (and answer n
to any question):
q()
You should be back to the bash console (check the prompt)!
In the tutorial, we have written the commands in the grey boxes in such a way that you can copy/paste the command into your bash console, and execute it. If you do so, make sure that you understand how the command is structured and what it does!
You can also type the command yourself in the console; it that case, make sure to respect the blank spaces inside the command!
Each user has in his home directory 2 folders:
data
: this folder contains all the data (fastq / bam / ...) that you will need for the analysisanalysis
: this folder will be used to store all the outputs of your analysis.The data
folder has the following structure:
data
├── ext_data
│ ├── genome.fa
│ ├── genome.fa.fai
│ ├── hg38.genome
│ ├── MA0139.1.jaspar
│ └── motifs.jaspar
├── fastqdata
│ ├── ATACseq
│ └── ChIPseq
└── processed
├── ATACseq
├── CTCF
└── H3K4me3
ext_data
: contains files needed for the analysisfastqdata
: contains some of the raw and trimmed fastq filesprocessed
: contains some of the pre-processed files (like the aligned bam files)During the analysis, we will create further subdirectories into the analysis
directory.
+++ GOOD PRACTICE ADVICE++
## original fastq file
CTCF_rep1_IP.fq.gz
## aligned file
### bad name
IP.bam
### good name
CTCF_rep1_IP.bam
## after filtering
### bad name
filtered_file.bam
### good name
CTCF_rep1_IP.mapq_filtered.dup_removed.bam
We will use a number of software tools for the analysis; we have prepared a virtual environment using conda containing all required tools. You first need to activate this environment, by running the simple command (in the bash console):
conda activate chipatac
You should see the prompt changing from
user5@powerfulrutherford-30fb1:~$
to
(chipatac) user5@powerfulrutherford-30fb1:~$
Now you can use all tools described in the tutorial!
You are working on a server with 26 CPU cores and 512 Gb memory. Some steps of the analysis are quite computationally intensive, and can lead to delays, especially if run simultaneously by several users! In that case, take a break while it is running, and get a fresh cup of coffee, you might need it ;-)