Hi there,
This question is very basic but I need to ensure that I'm going on the right way. I need to calculate the number of reads falling inside my bed intervals and the number of reads falling outside them. After reading this thread (https://www.biostars.org/p/11832/), I decided to try with this command:
intersectBed -abam my_file.bam -b my_file.bed -wa -f 1 | coverageBed -abam stdin -b my_file.bed
I would like to know what is the difference between using the previous command, or using only the second part:
coverageBed -abam my_file.bam -b my_file.bed
The output is quite different for some hits:
- First command output:
1 50331576 50331667 (.. gene names..) 0 0 91 0.0000000
1 39845848 39846030 (..gene names..) 70 178182 0.9780220
- Second command output:
1 50331576 50331667 (..gene names..) 47 91 91 1.0000000
1 39845848 39846030 (..gene names..) 143 182182 1.0000000
I think that for first command I get only those reads falling strictly within interval, while for the second one also include reads that partially covering the intervals? This is true?
For other hand, I would like also to get the number of reads falling outside the intervals. I can make a new bed file using bedtools complement, but if I use -v option of bedtools intersect would be OK? Like this:
intersectBed -v -abam my_file.bam -b my_file.bed -wa -f 1 | c ...
↧