Quantcast
Channel: Post Feed
Viewing all articles
Browse latest Browse all 3764

What Is The Fastest Method To Determine The Number Of Positions In A Bam File With >N Coverage?

$
0
0

I have two very large BAM files (high depth, human, whole genome). I have a seemingly simple question. I want to know how many positions in each are covered by at least N reads (say 20). For now I am not concerned about requiring a minimum mapping quality for each alignment or a minimum read quality for the reads involved.

Things I have considered:

  • samtools mpileup (then piped to awk to assess the minimum depth requirement, then piped to wc -l). This seemed slow...
  • samtools depth (storing the output to disk so that I can assess coverage at different cutoffs later). Even if I divide the genome into ~133 evenly sized pieces, this seems very slow...
  • bedtools coverage?
  • bedtools genomecov?
  • bedtools multicov?
  • bamtools coverage?

Any idea which of these might be fastest for this question? Something else I haven't thought of? I can use parallel processes to ensure that the performance bottleneck is disk access but want that access to be as efficient as possible. It seems that some of these tools are doing more than I need for this particular task...


Viewing all articles
Browse latest Browse all 3764

Trending Articles