One important step of the secondary analysis in ChIP-seq is the motif analysis. This can be done either for peaks obtained from histone marks, such as H3K27ac, in order to find potential binding transcription factors in regions marked by specific histone marks. But this also makes sense to do this motif analysis in transcription factor ChIP-seq peaks; the reasons are:
We will run a simple motif analysis on the CTCF ChIP-seq peaks, using a motif analysis tool called peak-motif from the RSA Tools tool box.
We will proceed with the following analysis steps:
Extracting the top CTCF peaks: since we have a large number of CTCF peaks (> 50,000), we will extract the top 5000 peaks from the narrowPeak file, based on the enrichment score.
Getting fasta sequence: for the motif analysis, we need the fasta sequence of the corresponding peaks; this can be obtained using the bedtools
toolbox.
Running motif discovery: we will submit these sequences to peak-motif and run a motif discovery analysis
Identifying discovered motifs: we will then use a feature of peak-motif to compare the discovered motifs with known motifs from the motif database JASPAR.
The narrow peak file obtained from MACS2 contains a column with a score value; we will sort the file in decreasing order according to this column, and take the top 5000 lines:
## go back to your home folder
cd
mkdir -p analysis/rsat/CTCF
sort -nr -k7 analysis/MACS2/CTCF/CTCF_peaks.narrowPeak \
| head -n 5000 > analysis/rsat/CTCF/CTCF_peaks.top5000.narrowPeak
bedtools getfasta -fi data/ext_data/genome.fa \
-bed analysis/MACS2/CTCF/CTCF_peaks.top5000.narrowPeak > analysis/rsat/CTCF/CTCF_peaks.top5000.fa
Using CyberDuck, open the directory analysis/rsat/CTCF
, and download the fasta file CTCF_peaks.top5000.fa
to your local disk
Check your email; you should have an email from rsat
with Job submitted
in the subject.
Once the results are completed, you will see a full report; check the following sections, and explore the report!
Which motifs have been identified?
Can you find a CTCF-like motif?
Can you find other motifs besides CTCF?