Coveragebed, Depth/Breadth Of Coverage

June 17, 2011, 3:47 pm

≫ Next: Discrepancy In Samtools Mpileup/Depth And Bedtools Genomecoveragebed Counts

≪ Previous: Is It Possible To Filter Only Bookend Reads From A Bed File?

I'm using coverageBed to calculate the depth and breadth of coverage, but I'm not sure I'm doing this right. I want to calculate the two values for each human chromosome.

For example, I've created a bed file with 1 chromosome. When I input my BAM file and the BED file, I get the following output:

chr1    0       249250621       103718897       224950839       249250621       0.9025086

I know the first 3 fields are from my chr BED file, the 4th field is the # of reads, 5th is # of bases covered, 6th is length of chromosome (redundant to field 3), and the last column is the fraction of bases covered (5th field/6th field).

So the 7th/last field gives the breadth of coverage, but I don't see a depth of coverage value. How do I get a depth of coverage?

↧

Discrepancy In Samtools Mpileup/Depth And Bedtools Genomecoveragebed Counts

March 27, 2013, 1:05 pm

≫ Next: Raw Counts From Cufflinks Output

≪ Previous: Coveragebed, Depth/Breadth Of Coverage

I am getting different counts for the number of bases on reference covered by aligned reads using samtools depth/mpileup and BEDTools genomeCoverageBed commands. I am using samtools-0.1.19 and bedtools-2.17.0

samtools mpileup -ABQ0 -d10000000 -f ref.fas qry.bam > qry.mpileup
samtools depth -q0 -Q0 qry.bam > qry.depth

genomeCoverageBed -ibam qry.bam -g ref.genome -dz > qry.dz
wc -l qry.[dm]*
  1026779 qry.depth
  1027173 qry.dz
  1026779 qry.mpileup

Any ideas? Thanks

↧

Raw Counts From Cufflinks Output

February 13, 2013, 2:30 am

≫ Next: Bedtools intersect tab and bed files

≪ Previous: Discrepancy In Samtools Mpileup/Depth And Bedtools Genomecoveragebed Counts

Hi, I want to ask how to get the raw counts from the output of cufflinks. One way to do this is to use the fpkm.

raw counts = FPKM * (length of that transcript/1000) * (# of mapped reads / 1e6)

The FPKM and length of transcript are in the cufflinks FPKM Tracking Files. But how about the # of mapped reads?

For instance, we have a foo.bam. samtools view -c (-f|-F) flag foo.bam can do this job but I am not quite which flag should I set when it's single-end or paired-end.

Thanks!

↧

Bedtools intersect tab and bed files

August 14, 2014, 6:42 am

≫ Next: Getting All Reads That Align To A Region In Compact Bed Format Using Bedtools?

≪ Previous: Raw Counts From Cufflinks Output

How can you call Bedtools intersect on a tab and bed file? without getting the typical:

"Differing number of BED fields encountered at line: #. Exiting..."

Error.

My bed file has 15 columns and my tab file has 18

↧

Getting All Reads That Align To A Region In Compact Bed Format Using Bedtools?

January 16, 2013, 2:49 pm

≫ Next: How To Combine Fpkm Values From Cufflinks With Contigs From De Novo Assembly Program Velvet/Oases?

≪ Previous: Bedtools intersect tab and bed files

I'm trying to find all the reads (by name) from a BAM file that align to various regions in a bed file. Right now I can do this with bedtools using intersectBed:

intersectBed -abam reads.bam -wo -f 1 -b regions.bed -bed

From this one can parse all the read ids that land in every interval in regions.bed, but it's not very compact. Is there a way to get bedtools to natively transform this into a more compact format, e.g.

chr1 x y .... read_id1,read_id2,read_id3

where chr1 x y is a given interval in regions.bed and the comma separated read_id1,... is the list of read ids from reads.bam that fall in that interval. In this compact format, the output BED file would have at most as many entries as there are regions in regions.bed, whereas with the -wo option it can be even larger than the number of reads in reads.bam. Thanks.

↧

How To Combine Fpkm Values From Cufflinks With Contigs From De Novo Assembly Program Velvet/Oases?

November 23, 2011, 2:32 pm

≫ Next: Reproduce Encode/Cshl Long Rna-Seq Data Visualization Viewed In Ucsc, But Failed? [Done]

≪ Previous: Getting All Reads That Align To A Region In Compact Bed Format Using Bedtools?

Hi all,

I am working on RNA-seq data analysis. I've finished running Tophat and Cufflinks to get FPKM values for each read from Illumina pair-end sequence. Also, parallely I've run Velvet to get contig sequences through de novo assembly and Gmap to see if the assembled sequences map to reference genome (this reference genome is not complete for now, but somewhat useful). Now, I am trying to combine all information so I can have sequence information for a contig and FPKM value for the corresponding to the contig. Some suggested I can convert Cufflink and Gmap outputs to bedfiles and then use IntersectBed to see if there's any overlap. However, I am not sure how I can have every information saved in the output from Bedtools. IntersectBed default seems to provide me overlapped region with 'A' file as a template, so I couldn't see any information from 'B' file. Is there any solution for me?? Please let me know. I would appreciate for your suggestion!

↧

Reproduce Encode/Cshl Long Rna-Seq Data Visualization Viewed In Ucsc, But Failed? [Done]

October 5, 2012, 12:53 am

≫ Next: Intersectbed Provides An Empty Output

≪ Previous: How To Combine Fpkm Values From Cufflinks With Contigs From De Novo Assembly Program Velvet/Oases?

Motivation The ENCODE data comes out, and luckily they provide both .bam file and .bigwig file. Thus, it occurs to me that I want to give a try to reproduce the data visualization with tool: BEDtools and other related tools. Result I'll first upload the difference between my-version and official version: Top to Bottom:

Black: my-version-POSitive-strand.bigwig
Blue: Official-version-POSitive-strand.bigwig
Red: Official-version-REVerse-strand.bigwig
Grey: my-version-REVerse-strand.bigwig

From the image, we will find my-version-data and official-version-data roughly share the same peaks, however, my-version-peaks are somehow masked by certain uniform noises. And it drives me crazy. Note that I know not all the bioinformatics works can be reproduces, but this issue dose not get involved with much algorithms, decisions, etc. Therefore, it's supposed to be reproducible, I think. Data Set ENCODE/CSHL long RNA-seq Data set can be found here: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/ And here I use K562-chromatin-subcellular fraction (Rep4) to explore as an example:

BAM ...

↧

Intersectbed Provides An Empty Output

August 16, 2013, 10:53 pm

≫ Next: Tutorial: Piping With Samtools, Bwa And Bedtools

≪ Previous: Reproduce Encode/Cshl Long Rna-Seq Data Visualization Viewed In Ucsc, But Failed? [Done]

Hi,

I've downloaded the recent Cygwin version 1.7.24 and an trying to run bedTools but I get an empty file as my output. When I run the same commandline and files on a colleagues computer also through Cygwin I get a file containing the overlaps I seek. is the new Cygwin not compatable with BedTools? I've put the command line we used below:

./intersectbed -a Gene_body.bed -b EdgeR1.bed -wao > yyy.temp

Any help would be appreciated.

↧

Tutorial: Piping With Samtools, Bwa And Bedtools

April 26, 2012, 4:14 pm

≫ Next: How To Find The Closest Distance From Bed Files Between Genes And Repeats That Are Upstream

≪ Previous: Intersectbed Provides An Empty Output

In this tutorial I will introduce some concepts related to unix piping. Piping is a very useful feature to avoid creation of intermediate use once files. It is assumed that bedtools, samtools, and bwa are installed. Lets begin with a typical command to do paired end mapping with bwa: (./ means look in current directory only)

# -t 4 is for using 4 threads/cores
bwa aln -t 4 ./hg19.fasta ./s1_1.fastq > ./s1_1.sai
bwa aln -t 4 ./hg19.fasta ./s1_2.fastq > ./s1_2.sai
bwa sampe ./hg19.fasta ./s1_1.sai ./s1_2.sai ./s1_1.fastq ./s1_2.fastq > s1.sam

Supposed we wish to compress sam to bam, sort, remove duplicates, and create a bed file.

samtools view -Shu s1.sam > s1.bam
samtools sort s1.bam s1_sorted
samtools rmdup -s s1_sorted.bam s1_sorted_nodup.bam
bamToBed -i s1_sorted_nodup.bam > s1_sorted_nodup.bed

This workflow above creates many files that are only used once (such as s1.bam) and we can use the unix pipe utility to reduce the number intermediate files created. The pipe function is the character | and what it does is ...

↧

How To Find The Closest Distance From Bed Files Between Genes And Repeats That Are Upstream

January 7, 2014, 3:36 am

≫ Next: Intersectbed/Coveragebed -Split Purify Exon?

≪ Previous: Tutorial: Piping With Samtools, Bwa And Bedtools

How can I use the closestBed from bedtools to find the closest locations between two bed files. The important bit here is that i want them to be upstream and in correct oriantation.

When I use the -s option, it does not report anything (everything is -1).

Then I checked the -D a option. It is returning some results but not sure if it is the right thing.

The other thing to mention is that my genes bed file (lets call is gene.bed) is organized as

chr1 123 234 +
chr1 456 789 -

rather than end position being smaller to indicate the negative strand.

Whereas my repeats.bed file are organized as

chr1 239 456
chr3 456 987

Does bedtools get confused with this?

Which options should i use if i want to find the distance to nearest repeat that is upstream and in the correct orientation?

↧

Intersectbed/Coveragebed -Split Purify Exon?

September 15, 2012, 1:58 am

≫ Next: Convert .Txt Into Bed Files

≪ Previous: How To Find The Closest Distance From Bed Files Between Genes And Repeats That Are Upstream

all.reads.bam file records mapped RNA-seq reads data, including:

exon:exon junction
exon body
intron body
exon:intron junction

Q1: When calculating RPKM for given RefSeq gene including all the position reads, will the following command just calculate exon:exon junction reads and at same time ignore all other reads?coverageBED -abam all.reads.bam -b refseq.genes.BED12.bed -s -split >coverage.bed I'm confused by the mannual (Page 62):

When dealing with RNA-seq reads, for example, one typically wants to only tabulate coverage for the portions of the reads that come from exons (and ignore the interstitial intron seqeunce), The -split command allows for such coverage to be performed.

If "-split" is set, the exon:exon read (for example, 30M3000N46M") exists in -abam bam file, and the 3000N will NOT be wrongly intersected when running intersectBED command. But what about coverageBED command? I do hope the 3000N will be not calculated which makes sense, and I also hope the intron body reads and other reads will be NOT ignored.Q2: If one just want to calculate exon's RPKM, does it mean one should prepare -b file to record all the exon information, and run like this:coverageBED -abam all.reads.bam -b ...

↧

Convert .Txt Into Bed Files

July 21, 2011, 8:13 pm

≫ Next: What Is The Fastest Method To Determine The Number Of Positions In A Bam File With >N Coverage?

≪ Previous: Intersectbed/Coveragebed -Split Purify Exon?

I used paired-end sequence data for copy number variation study; and eventually get .txt files as output. I'm hoping to use Bedtools to compare my results with others.

Can I convert .txt files into .bed files? (I don't see option in Bedtools)

If Bedtools is not working, what software can I use for data comparison?

my lines of txt is just like:

deletion    chr9:6169901-6173000    3100
deletion    chr9:7657401-7658800    1400
deletion    chr9:8847501-8848600    1100
deletion    chr9:10010201-10011600    1400
deletion    chr9:10126601-10127700    1100

thx

edit: I converted the txt files into bedpe format, which looks like

chr21    18542801    18543500
chr21    18545701    18545900
chr21    19039901    19040600
chr21    19164301    19169400
chr21    19366001    19370200
chr21    19639601    19640300
chr21    20493701    20495700
chr21    20581401    20583000
chr21    20880901    20882700
chr21    21558601    21559700

Then I started to compare two bedpe, looking for overlapping region, using the command like:

pairToPair -a 1.bedpe -b 2.bedpe > share.bedpe

Then I see the errors:

It looks as though you have less than 6 columns.  Are you sure your files are tab-delimited?

MY bed file have only three columns, seems it requires 6....What's the problem here? thx

↧

What Is The Fastest Method To Determine The Number Of Positions In A Bam File With >N Coverage?

May 21, 2013, 10:16 am

≫ Next: Reporting The Bam Reads Overlapping A Set Of Intervals With Bedtools

≪ Previous: Convert .Txt Into Bed Files

I have two very large BAM files (high depth, human, whole genome). I have a seemingly simple question. I want to know how many positions in each are covered by at least N reads (say 20). For now I am not concerned about requiring a minimum mapping quality for each alignment or a minimum read quality for the reads involved.

Things I have considered:

samtools mpileup (then piped to awk to assess the minimum depth requirement, then piped to wc -l). This seemed slow...
samtools depth (storing the output to disk so that I can assess coverage at different cutoffs later). Even if I divide the genome into ~133 evenly sized pieces, this seems very slow...
bedtools coverage?
bedtools genomecov?
bedtools multicov?
bamtools coverage?

Any idea which of these might be fastest for this question? Something else I haven't thought of? I can use parallel processes to ensure that the performance bottleneck is disk access but want that access to be as efficient as possible. It seems that some of these tools are doing more than I need for this particular task...

↧

Reporting The Bam Reads Overlapping A Set Of Intervals With Bedtools

November 8, 2011, 1:51 am

≫ Next: Genbank to bed conversion for bedtools analysis

≪ Previous: What Is The Fastest Method To Determine The Number Of Positions In A Bam File With >N Coverage?

I am trying to use bedtools to pull out the reads falling directly within a set of BED coordinates. While this command does it successfully:

intersectBed -abam mybam.bam -b intervals.gff -wa -wb -f 1 | coverageBed -abam stdin -b intervals.gff

I find that it loses key information that I need. I'd like to get a listing of the BAM reads -- getting at least their ID -- split by exon. In other words, all the read IDs that fall into the first interval in intervals.gff, all the read IDs that fall into the second interval in intervals.gff... ideally, it would also report the CIGAR string for these reads, but I'd settle for just the ID.

Is there a way to report these reads, such that it's easy to tell from the output which set of reads landed in a given interval in the input BED file?

Thanks you.

↧

Genbank to bed conversion for bedtools analysis

August 20, 2014, 5:28 pm

≫ Next: Calculating Exome Coverage

≪ Previous: Reporting The Bam Reads Overlapping A Set Of Intervals With Bedtools

Hi,

I need to use bedtools to obtain the coverage across two bam files for comparison. However, to do this, I need a .bed file of the genome features (of the reference genome used to generate the bams). However, I only have genbank format, and whatever else I can get from NCBI etc. (not .bed). Is there a tool / script to convert the annotation in genbank to .bed? Don't want ot have to start writing a script when it must have been done already.

Thanks,

Theo

↧

Calculating Exome Coverage

April 3, 2014, 2:00 am

≫ Next: How To Get Annotation For Bed File From Another Bed File

≪ Previous: Genbank to bed conversion for bedtools analysis

*// Edit to make the post more clear (Mapping done via Bowtie2). My problem is that when counting Exome Coverage via coverageBed gives different results than via genomeCoverageBed. So I'm not sure if I'm doing something wrong, or which of the 2 methods is correct.

1) My first step is to build an .bed file of my Illumina Paired-End reads, returning the positions that only fall in targeted exon regions. I'm doing that via intersectBed -a [data.bed] -b [illuminaexonregions.bed].

2) My next step is to calculate the coverage of my new datafile via coverageBed -a [newdata.bed] -b [illuminaexonregions.bed]. I calculated some statistics:

Number of exons 214126 with a total length of 45326818

Number of matched nucleotides 10993449.0

Nucleotides/Length*100 24.253740909 % Coverage.

3) The next step was to calculate the coverage of my new datafile via genomeCoverageBed -i [newdata.bed] -g [genome.txt] -d awk '$3>0 {print $1"\t"$2"\t"$3}'. I calculated some statistics:

Number of exons 214126 with a total length of 45326818

Number of matched nucleotides 10576907.0

Nucleotides/Length*100 23.3347661863 % Coverage.

Somehow there's a difference in matched nucleotides, which I can't explain. What am I doing wrong?

↧

How To Get Annotation For Bed File From Another Bed File

November 23, 2012, 7:52 pm

≫ Next: Bedgraph Not Displayed In Igv

≪ Previous: Calculating Exome Coverage

Hello All,

I have a bed file (with Chr, Start, End, Name, Score and Strand)

Chr1 5678 5680 NA 7  +
Chr1 700  800  NA 8  -
Chr1 900  1200 NA 10 -

and would like to know, how can I get the annotation for the name column from another bed file

Chr1 5500 6000 Gene1 x +
Chr1  500 1000 Gene2 x -

or any standard genome file formats like gbk or .fna files or for that matter another bed file? So mu output file will be a bed file with Chr, Start, End, Name and Strand.

Chr1 5678 5680 Gene1 7 +
Chr1 700  800  Gene2 8 -
Chr1 900  1200 Gene2 10 -

Any easy and standard way to do this??

Bedtools usually operates more on the features but not sure if annotation from one bedfile can be extracted into the other based on overlapping feaures.

Thanks in advance!

↧

Bedgraph Not Displayed In Igv

January 24, 2014, 8:24 am

≫ Next: Help With Exception When Using Bedtools Coveragebed With Paired Alignment. [Resolved]

≪ Previous: How To Get Annotation For Bed File From Another Bed File

Hi, I am new and so facing problem. I was trying to make a bed graph file using bed tools genomecov command. The command was: bedtools genomecov -ibam filename.sorted.bam -g chromosome sizes.txt > O.bedgraph I got a bedgraph file which is much smaller in size. It is 500kb instead of ~6Mb. And when I load that 500kb file into IGV, I see nothing. Please help me out.

↧

Help With Exception When Using Bedtools Coveragebed With Paired Alignment. [Resolved]

January 3, 2014, 5:32 am

≫ Next: Intersectbed Overlap

≪ Previous: Bedgraph Not Displayed In Igv

I use bwa mem to align paired reads to few hundreds of microbial contigs; then I sort the alignment, and trying to get a coverage using bedtools genomecov -ibam alignments.paired.sorted.bam -bg >ranges.txt, which fails with an exception:

*** glibc detected *** bedtools: double free or corruption (out): 0x0000000001c5f270 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3d7b2750c6]
bedtools[0x45ab43]
bedtools[0x45b146]
bedtools[0x45c163]
bedtools[0x45e2ed]
bedtools[0x434c4b]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3d7b21ecdd]

if I run the same using not paired alignment, everything is ok. So I am really not sure where is my mistake... maybe bedtools doesn't digest the paired alignment?

-- edit: works with the latest versions of these tools. Here are the ones that failed:

$ bwa
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.0-r313
Contact: Heng Li <lh3@sanger.ac.uk>

$ bedtools -version
bedtools v2.16.1

↧

Intersectbed Overlap

November 23, 2011, 9:20 am

≫ Next: Problem With Counting Mapped Reads

≪ Previous: Help With Exception When Using Bedtools Coveragebed With Paired Alignment. [Resolved]

Hi,

I've a question about intersectBed. Is it possible to extract only alignment like this :

chromosome ===============================================================
BED/BAM A               ==============              =================
BED FILE B               ============
RESULT                  ==============

But no alignment like this (even if the read overlapp 100% of the feature, I don't want to extract these reads)

chromosome ===============================================================
BED/BAM A    =========================              =================
BED FILE B               =============
RESULT

So, only extracting reads that have 90-95% of its sequence overlapping 90-95% of the feature.

Is it clear ?

Thanks,

↧