I am trying to plot a heatmap of read density around a feature of interest (TSSs) very common in genomics papers. something like this (B):
However, I am struggling a bit in getting to look "right". A bit of background: I have mapped ChIP-seq reads for pol2 and calculate the coverage, per nucleotide, using bedtools.
coverageBed -d -abam $bamFile -b $TSSs > $coverage.bed
# output:
chr1 67108226 67110226 uc001dct.3 16 + 1 10
chr1 67108226 67110226 uc001dct.3 16 + 2 10
chr1 67108226 67110226 uc001dct.3 16 + 3 10
chr1 67108226 67110226 uc001dct.3 16 + 4 10
chr1 67108226 67110226 uc001dct.3 16 + 5 8
chr1 67108226 67110226 uc001dct.3 16 + 6 8
chr1 67108226 67110226 uc001dct.3 16 + 7 8
chr1 67108226 67110226 uc001dct.3 16 + 8 8
chr1 67108226 67110226 uc001dct.3 16 + 9 8
chr1 67108226 67110226 uc001dct.3 16 + 10 8
Then in R, the genomic position, in column 7, is converted to relative position to the TSS and read counts normalized to the library size. This is converted to a numeric matrix with each row being a TSS and each column the relative nucleotide position. For the plotting the matrix is ordered number of reads per TSS, and the values logged. This is the outcome:
heatmap(cov.mlog, Rowv=NA, Colv=NA, scale="column", labCol = FALSE, labRow = FALSE, col=brewer.pal(9, "Greens"), margins = c(5, 5))
Although the average coverage plot looks as one would expect for pol2, the heatmap is, well, not great. My question is what am I doing wrong and how to improve it? In this and this paper, the coverage is calculate for a bin of nucleotides and not per nucleotide as I am doing. Would that improve visualization? How to do it in bedtools? Should the sorting of the matrix be done differently?
I am aware that could use ngsplot for this, but I am trying to avoid it because my implementation would fit better with my other analysis.
Thank you!