Different coverage from bedtools and in vcf file - HELP PLEASE

July 2, 2014, 6:12 am

≫ Next: bedtools intersect - something wrong with chromosome numbers >= 10?

≪ Previous: How Can I Compare And Merge Bed Files

Dear all, I have trouble to understand how bedtools computing coverage. I have vcf file (generated from Illumina somatic caller) - I was created bed file from my vcf. Then I try to compute coverag with bedtools and I have for the same coordinate different coverage. Could you help me to explain why? Probably some reads where filter out during vcf calling algorithm. Output is: coverage in vcf: chr13 32890572 DP=45048 chr13 32893197 DP=1494 chr13 32899359 DP=12809 chr13 32900176 DP=57850 chr13 32900177 DP=61728 and in my bedtools output: chr13 32890572 32890573 47857 1 1 1.0000000 chr13 32893197 32893198 1686 1 1 1.0000000 chr13 32899359 32899360 15673 1 1 1.0000000 chr13 32900176 32900177 65461 1 1 1.0000000 chr13 32900177 32900178 65461 1 1 1.0000000 Note, that my vcf coverage is before filtering. Can anybody now why is this different? And if you can explain (mathematically) how bedtools computing coverage? Is it sum of all coverage in the interval and then divide length of interval? Of it is just sum of all coverage in interval? Thank you so much for clarification. Paul. ...

↧

bedtools intersect - something wrong with chromosome numbers >= 10?

May 31, 2014, 2:28 am

≫ Next: Running BedTools on Linux Cluster: Permission Denied

≪ Previous: Different coverage from bedtools and in vcf file - HELP PLEASE

Hi!

I have an alignment (.bam) of reads to mm9 genome. I sorted it with samtools sort, so that later I can use -sorted key with bedtools. I also created a .bed-file with regions of interest, in which I want to count number of reads, that mapped to them. I tried this: converted .bam to .bed with bedtools bamtobed, and then intersected them counting number of hits (bedtools intersect -a regions_of_interest.bed -b alignment_sorted.bed -c -sorted > Neg2H_counts.bedgraph). The problem is, it looks fine for all chromosomes with numbers from 0 to 9 (and X), but all counts for all regions of interest of chromosomes with higher number (chr10, chr11, etc) are 0. There is no biological reason for that, in fact the highest signal should be on chr11. What could be wrong here? I am fairly new to all these tools.

UPDATE
I tried to do the same intersection with bedmap and the result is identical... So there probably is something wrong with my files - what could it be?
I also tried sorting the alignment-derived bed-file in the same way, as I did with the files with regions of interest and it doesn't help.

↧

Running BedTools on Linux Cluster: Permission Denied

August 3, 2014, 10:50 am

≫ Next: Memory Efficient Bedtools Sort And Merge With Millions Of Entries?

≪ Previous: bedtools intersect - something wrong with chromosome numbers >= 10?

I been having some problems with running BedTools binaries in a linux cluster. I have the binaries in my own $HOME/bin file and when I try to run bedtools I get this error message

-bash: bedtools: Permission Denied

I followed the instructions here and still got the same error message.

Any clue what do to>

↧

Memory Efficient Bedtools Sort And Merge With Millions Of Entries?

May 8, 2013, 6:52 am

≫ Next: How Do You Get The Quality Score And Coverage For Every Single Position Of A Reference Assembly

≪ Previous: Running BedTools on Linux Cluster: Permission Denied

I would like to know if there is a memory-efficent way of sorting and merging a large amount of bed files, each of them containing millions of entries, into a single bed file that merges the entries, either duplicated or partially overlapping, so that they are unique in the file.

I have tried the following but it blows up in memory beyond the 32G I have available here:

find /my/path -name '*.bed.gz' | xargs gunzip -c | ~/src/bedtools-2.17.0/bin/bedtools sort | ~/src/bedtools-2.17.0/bin/bedtools merge | gzip -c > bed.all.gz

Any suggestions?

↧

How Do You Get The Quality Score And Coverage For Every Single Position Of A Reference Assembly

January 31, 2012, 2:12 pm

≫ Next: Intersectbed Tool Generating Empty File

≪ Previous: Memory Efficient Bedtools Sort And Merge With Millions Of Entries?

Hi,

I am trying to extract the coverage and the average quality score for each position of a reference assembly in bam/sam format. I have managed to get the coverage using BEDtools

 genomeCoverageBed -ibam mybamfile.bam -g my_genome -d > my_coverage.txt

but am at a loss on how to get some measure of the quality of the base calls at each position. I was thinking that I could use the bcftools to get a variant call formatted file

samtools mpileup -uf ref.fa mybamfile.bam | bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf

but this only provides the sites for which there are SNPs. Any advice greatly appreciated.

Joseph

↧

Intersectbed Tool Generating Empty File

August 28, 2012, 10:13 am

≫ Next: Is It Possible To Filter Only Bookend Reads From A Bed File?

≪ Previous: How Do You Get The Quality Score And Coverage For Every Single Position Of A Reference Assembly

I have used the Bedtools command intersectBed to check the overlap between two bed files. A is my INDEL file and B is my Reference file. But it is producing an empty output file. I thought the problem was that the file B is much larger than file A. But I tried changing the file order and it is still not creating any output.

Here is the reference B file (larger):

gff_seqname      0        1395    gene    0    +
gff_seqname      0        1395    exon    0    +
gff_seqname    1397    2498    gene    0    +
gff_seqname    1397    2498    exon    0    +
gff_seqname    2524    3619    gene    0    +

Here is my A file with just 51 INDELS:

NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    174708    174713    -GCCGG:2/6
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    1078686    1078686    +A:105/112
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    1229123    1229125    -CT:800/870
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    1234830    1234830    +AT:134/134
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    1234833    1234834    -A:134/134

here is my command:

intersectBed -a SOD_pal_BWA_GMM.PE.sorted.bam.sorted_cleaned_GMM.bam.sorted.hr.bam.raw.bed  -b sodalis_galaxy.bed  -wa -wb  >test13.bed

↧

Is It Possible To Filter Only Bookend Reads From A Bed File?

January 28, 2014, 3:58 am

≫ Next: How Can I Merge Intervals ?

≪ Previous: Intersectbed Tool Generating Empty File

I have a bed file with many fragments, some overlapping, some on their own and some adjacent to each other (book-ended) features.

I know can group overlapping and book-ended features using bedtools like

bedtools cluster -i fragments.bed

However I was wondering if anyone knew of a way of obtaining from the input file only the fragments that contain book-ended adjacent fragments.

Any ideas?

Best regards

↧

How Can I Merge Intervals ?

August 7, 2012, 10:18 am

≫ Next: Coveragebed, Depth/Breadth Of Coverage

≪ Previous: Is It Possible To Filter Only Bookend Reads From A Bed File?

Hello everybody, I should be grateful if you would kindly help me de fix my problem. I have a table like that :

Chromosome   start   end    info1    info2
chr01        1       100    15       35
chr01        150     300    15       39
chr01        299     750    16       39

I would like to merge the intervals that overlap ( line 2 and 3) and those that are closest (line 1 and 2) in addition to perform some operation basing in the other column ! for example I would like to merge the tree line above into one interval (chr1 1-750), sum basing on the info1 (15+15+16) and finally did the mean basing on the info line to (35+39+39)/3 the output I'd like will be as this :

chr1 1-750  46  37.66

I know that Bedtools can merge interval ( galaxy tool ! too )but accept only BED format that contain only 3 coloumn chr start and end !
Thanks in advance for your help !

↧

Coveragebed, Depth/Breadth Of Coverage

June 17, 2011, 3:47 pm

≫ Next: How to get the rRNA ratio from a RNAseq dataset

≪ Previous: How Can I Merge Intervals ?

I'm using coverageBed to calculate the depth and breadth of coverage, but I'm not sure I'm doing this right. I want to calculate the two values for each human chromosome.

For example, I've created a bed file with 1 chromosome. When I input my BAM file and the BED file, I get the following output:

chr1    0       249250621       103718897       224950839       249250621       0.9025086

I know the first 3 fields are from my chr BED file, the 4th field is the # of reads, 5th is # of bases covered, 6th is length of chromosome (redundant to field 3), and the last column is the fraction of bases covered (5th field/6th field).

So the 7th/last field gives the breadth of coverage, but I don't see a depth of coverage value. How do I get a depth of coverage?

↧

How to get the rRNA ratio from a RNAseq dataset

September 30, 2014, 9:25 am

≫ Next: Convert .Txt Into Bed Files

≪ Previous: Coveragebed, Depth/Breadth Of Coverage

Hello,

I want to know if there is any way using the bedtools and miRdeep2 output bed file to get the rRNA ratio in my miRNAseq fastq data. Thank you very much!

I have a gtf file, a genome.fa, a bed file from the miRdeep2. Thanks!

↧

Convert .Txt Into Bed Files

July 21, 2011, 8:13 pm

≫ Next: error with bedtools slop

≪ Previous: How to get the rRNA ratio from a RNAseq dataset

I used paired-end sequence data for copy number variation study; and eventually get .txt files as output. I'm hoping to use Bedtools to compare my results with others.

Can I convert .txt files into .bed files? (I don't see option in Bedtools)

If Bedtools is not working, what software can I use for data comparison?

my lines of txt is just like:

deletion    chr9:6169901-6173000    3100
deletion    chr9:7657401-7658800    1400
deletion    chr9:8847501-8848600    1100
deletion    chr9:10010201-10011600    1400
deletion    chr9:10126601-10127700    1100

thx

edit: I converted the txt files into bedpe format, which looks like

chr21    18542801    18543500
chr21    18545701    18545900
chr21    19039901    19040600
chr21    19164301    19169400
chr21    19366001    19370200
chr21    19639601    19640300
chr21    20493701    20495700
chr21    20581401    20583000
chr21    20880901    20882700
chr21    21558601    21559700

Then I started to compare two bedpe, looking for overlapping region, using the command like:

pairToPair -a 1.bedpe -b 2.bedpe > share.bedpe

Then I see the errors:

It looks as though you have less than 6 columns.  Are you sure your files are tab-delimited?

MY bed file have only three columns, seems it requires 6....What's the problem here? thx

↧

error with bedtools slop

April 17, 2014, 2:28 am

≫ Next: Merging/Intersecting Different Gene Annotations - Should I Extend Coordinates?

≪ Previous: Convert .Txt Into Bed Files

Hi,

I am trying to run a bedtools slop on my.bed file and hg19.genome

bedtools slop -i H3K27me3.bed -g hg19.genome -b 30

I get the following error:

Less than the req'd two fields were encountered in the genome file (genomes/hg19.genome) at line 2. Exiting.

Any suggestions?

Thanks in advance

Samad

↧

Merging/Intersecting Different Gene Annotations - Should I Extend Coordinates?

October 12, 2013, 3:47 am

≫ Next: Comparative Snp Analysis

≪ Previous: error with bedtools slop

I want to create gene data-set (as big as possible), hence I am using several gene annotations. However, genes in different annotations overlap (it's the same gene). For reducing biases I overlap different annotations and if genes overlap leave only one gene.

Question:

To ensure this overlap I was thinking to expand gene coordinates - is this necessary? If so, how big extension should be (5bp/100bp)?

Example:

Want to create lncRNA data-set (in the following steps it will be used to search for genomic features).
Input:

GENCODE lncRNA annotation (version 18 - 04/09/2013);
Cabili lncRNA annotation (Cabili et al., 2011 (CSHLP)).

Workflow:

Extract GENCODE genes start/end coordinates;
Extract Cabili genes start/end coordinates;
Extend Cabili coordinates ( -/+ nbp );
Use BedTools intersect;
If genes intersect leave GENCODE gene (as it's a newer annotation (though this step is really subjective)).

I do realize that this extension question depends on the situation and how reliable annotation is, but still hope that someone could suggest something.

↧

Comparative Snp Analysis

January 7, 2013, 7:33 pm

≫ Next: N Closest Genes To A Given Location

≪ Previous: Merging/Intersecting Different Gene Annotations - Should I Extend Coordinates?

Hello, I am trying to compare the degree of A-to-G editing in a near-to-isogenic pair of cell lines. I have two biological replicates and have mapped with Bowtie and BWA, followed by a samtools mpileup | VarScan analysis. After this, I have used bedtools intersect to extract variants not annotated in dbSNP, but are in Alu repeats. Here is where I have some doubts, mainly two questions: QUESTION 1: In the vcf file (VarScan output),

#CHROM  POS     ID      REF     ALT     QUAL    FILTER    INFO    FORMAT  Sample1    Sample2
   chrM    73      .           G       A       PASS     DP=238  GT:GQ:DP           1/1:71:121  1/1:69:117

What exactly is the meaning of

FORMAT   Sample1    Sample2
GT:GQ:DP 1/1:71:121  1/1:69:117

QUESTION 2:

I have higher number of editing sites "called" in sample 1 than in sample 2 in the 1st biological replicate (about 16% difference). However this difference is reversed in the 2nd biological replicate. What is the proper way of comparing the degree of RNA editing in two different samples? Is there a quantitative procedure? I have naively compared them with bedtools intersect, using or omitting option -v. Is this the correct way to go about it?

Many thanks. G.

↧

N Closest Genes To A Given Location

September 25, 2012, 1:11 pm

≫ Next: Intersect Gene Annotation With Specific Position Or Genomic Interval

≪ Previous: Comparative Snp Analysis

Hi,

This is basically an extension of the following question already asked in biostar (http://biostars.org/post/show/53561/python-finding-gene-closest-to-a-given-location/).

Let us say I have a list of genomic regions (as a bed file), and also a list of genes (as a bed file). For each genomic region I want to find the 5 (or N to be general) closest genes. How would I try to do that? Any suggestions?

Thanks!

↧

Intersect Gene Annotation With Specific Position Or Genomic Interval

August 29, 2013, 6:42 am

≫ Next: Determining Each Samples Coverage Area

≪ Previous: N Closest Genes To A Given Location

Hi,

I've several genomic interval and I want to check if they are overlapping with known gene. I've a gtf file with the coordinates of gene exons. My idea was to use intersectBed from bedtools but I've a little problem with small genomic interval that are are overlapping intron coordinates and not exons ( it do ot report me the gene where this interval is). Is it possible to specifiy to intersectBed to take into account introns ? or is there an another tool ?

Thanks

↧

Determining Each Samples Coverage Area

May 16, 2013, 7:06 am

≫ Next: Bedtools Intersectbed

≪ Previous: Intersect Gene Annotation With Specific Position Or Genomic Interval

First time I am working with NGS data. I've got a BAM file with mapped reads for my samples and a BED file with the regions in HG19 that were targeted (used an Ion-torrent ampliseq panel). Are there any tools that can output something similar to this:

**Sample      Amplicon           Chromosome           Start_coordinate_of_coverage             End_coordinate_of_coverage**
Sample1       amp_001                chr6                 1,000,000                                   1,000,250
Sample2       amp_001                chr6                 1,000,111                                   1,000,255
Sample1       amp_002                chr6                 1,000,200                                   1,000,333

I basically want to know for each gene what coverage we have for each sample.

EDIT: changed column headings, I'm looking for coordinates that have coverage, not depth at each exon.

↧

Bedtools Intersectbed

November 17, 2011, 10:15 am

≫ Next: Given gene ID and genomic coordinates, how can I create a GFF formatted file?

≪ Previous: Determining Each Samples Coverage Area

Apologies if this is blatantly obvious!

I would like to compare coordinates in setA with those of setB. The output should have the same number of coordinates as setA and tell me how many nucleotides of each setA coordinate are overlapped by any coordinate in setB.

For example a large coordinate in setA may be overlapped by two setB coordinates, but i want to know how many nucleotides of the setA coordinate are covered by both setB coordinate in total.

I know how to do this on GALAXY as there is the handy 'Coverage' tool in 'Operate on Genomic Intervals'. However, i want to do this on the command line. I have been trying to get BEDTools to do this using 'intersectBed', but i can only seem to get just the overlapping setA coords (using -u), or get the nucleotide over for multiple setB coordinates on separate line (using -wao), or a count of how many setB overlaps setA (using -c).

SetB coordinates are non-overlapping themselves, so i guess i could tally up those SetB coordinates that overlap the same setA coordinate.

Can BEDTools do what i want or there another command line way of doing what i want?

Thank you!

PS I have also sent the to BEDTools discussion, so apologies for any double postings!

↧

Given gene ID and genomic coordinates, how can I create a GFF formatted file?

June 19, 2014, 2:39 pm

≫ Next: Help With Exception When Using Bedtools Coveragebed With Paired Alignment. [Resolved]

≪ Previous: Bedtools Intersectbed

I have downloaded a list of coordinates of yeast genes from Xu et al., 2009 (see table S3). Unfortunately its current format is not a standard format so it does not appear to be compatible with the programs I would like to use i.e. HOMER, bedops or bedtools. I was wondering if anyone could help me get it into a gff format using unix or R (other languages are also welcome if the code is just copy and paste)? I tried to recreate what I saw at the ensembl website, but said programs were still not recognizing it as gff. Here is the beginning of the file (there are actually ~7K lines): ID chr strand start end type name commonName endConfidence source ST0001 1 + 9369 9601 SUTs SUT001 SUT001 bothEndsMapped Manual ST0002 1 + 30073 30905 CUTs CUT001 CUT001 bothEndsMapped Automatic ST0003 1 + 31153 32985 ORF-T YAL062W GDH3 bothEndsMapped Manual ST0004 1 + 33361 34897 ORF-T YAL061W BDH2 bothEndsMapped Manual ST0005 1 + 35097 36393 ORF-T YAL060W BDH1 bothEndsMapped Manual ST0006 1 + 36545 37329 ORF-T YAL059W ECM1 bothEndsMapped Manual ST0007 1 + 37409 39033 ORF-T YAL058W CNE1 bothEndsMapped Manual ST0008 1 ...

↧

Help With Exception When Using Bedtools Coveragebed With Paired Alignment. [Resolved]

January 3, 2014, 5:32 am

≫ Next: how to get -nms for bedtools

≪ Previous: Given gene ID and genomic coordinates, how can I create a GFF formatted file?

I use bwa mem to align paired reads to few hundreds of microbial contigs; then I sort the alignment, and trying to get a coverage using bedtools genomecov -ibam alignments.paired.sorted.bam -bg >ranges.txt, which fails with an exception:

*** glibc detected *** bedtools: double free or corruption (out): 0x0000000001c5f270 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3d7b2750c6]
bedtools[0x45ab43]
bedtools[0x45b146]
bedtools[0x45c163]
bedtools[0x45e2ed]
bedtools[0x434c4b]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3d7b21ecdd]

if I run the same using not paired alignment, everything is ok. So I am really not sure where is my mistake... maybe bedtools doesn't digest the paired alignment?

-- edit: works with the latest versions of these tools. Here are the ones that failed:

$ bwa
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.0-r313
Contact: Heng Li <lh3@sanger.ac.uk>

$ bedtools -version
bedtools v2.16.1

↧

Latest Images