Different coverage from bedtools and in vcf file - HELP PLEASE
bedtools intersect - something wrong with chromosome numbers >= 10?
Hi!
I have an alignment (.bam) of reads to mm9 genome. I sorted it with samtools sort
, so that later I can use -sorted key with bedtools. I also created a .bed-file with regions of interest, in which I want to count number of reads, that mapped to them. I tried this: converted .bam to .bed with bedtools bamtobed
, and then intersected them counting number of hits (bedtools intersect -a regions_of_interest.bed -b alignment_sorted.bed -c -sorted > Neg2H_counts.bedgraph
). The problem is, it looks fine for all chromosomes with numbers from 0 to 9 (and X), but all counts for all regions of interest of chromosomes with higher number (chr10, chr11, etc) are 0. There is no biological reason for that, in fact the highest signal should be on chr11. What could be wrong here? I am fairly new to all these tools.
UPDATE
I tried to do the same intersection with bedmap and the result is identical... So there probably is something wrong with my files - what could it be?
I also tried sorting the alignment-derived bed-file in the same way, as I did with the files with regions of interest and it doesn't help.
Running BedTools on Linux Cluster: Permission Denied
I been having some problems with running BedTools binaries in a linux cluster. I have the binaries in my own $HOME/bin file and when I try to run bedtools I get this error message
-bash: bedtools: Permission Denied
I followed the instructions here and still got the same error message.
Any clue what do to>
Memory Efficient Bedtools Sort And Merge With Millions Of Entries?
I would like to know if there is a memory-efficent way of sorting and merging a large amount of bed files, each of them containing millions of entries, into a single bed file that merges the entries, either duplicated or partially overlapping, so that they are unique in the file.
I have tried the following but it blows up in memory beyond the 32G I have available here:
find /my/path -name '*.bed.gz' | xargs gunzip -c | ~/src/bedtools-2.17.0/bin/bedtools sort | ~/src/bedtools-2.17.0/bin/bedtools merge | gzip -c > bed.all.gz
Any suggestions?
How Do You Get The Quality Score And Coverage For Every Single Position Of A Reference Assembly
Hi,
I am trying to extract the coverage and the average quality score for each position of a reference assembly in bam/sam format. I have managed to get the coverage using BEDtools
genomeCoverageBed -ibam mybamfile.bam -g my_genome -d > my_coverage.txt
but am at a loss on how to get some measure of the quality of the base calls at each position. I was thinking that I could use the bcftools to get a variant call formatted file
samtools mpileup -uf ref.fa mybamfile.bam | bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf
but this only provides the sites for which there are SNPs. Any advice greatly appreciated.
Joseph
Intersectbed Tool Generating Empty File
I have used the Bedtools command intersectBed to check the overlap between two bed files. A is my INDEL file and B is my Reference file. But it is producing an empty output file. I thought the problem was that the file B is much larger than file A. But I tried changing the file order and it is still not creating any output.
Here is the reference B file (larger):
gff_seqname 0 1395 gene 0 +
gff_seqname 0 1395 exon 0 +
gff_seqname 1397 2498 gene 0 +
gff_seqname 1397 2498 exon 0 +
gff_seqname 2524 3619 gene 0 +
Here is my A file with just 51 INDELS:
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME 174708 174713 -GCCGG:2/6
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME 1078686 1078686 +A:105/112
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME 1229123 1229125 -CT:800/870
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME 1234830 1234830 +AT:134/134
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME 1234833 1234834 -A:134/134
here is my command:
intersectBed -a SOD_pal_BWA_GMM.PE.sorted.bam.sorted_cleaned_GMM.bam.sorted.hr.bam.raw.bed -b sodalis_galaxy.bed -wa -wb >test13.bed
Is It Possible To Filter Only Bookend Reads From A Bed File?
I have a bed file with many fragments, some overlapping, some on their own and some adjacent to each other (book-ended) features.
I know can group overlapping and book-ended features using bedtools like
bedtools cluster -i fragments.bed
However I was wondering if anyone knew of a way of obtaining from the input file only the fragments that contain book-ended adjacent fragments.
Any ideas?
Best regards
How Can I Merge Intervals ?
Hello everybody, I should be grateful if you would kindly help me de fix my problem. I have a table like that :
Chromosome start end info1 info2
chr01 1 100 15 35
chr01 150 300 15 39
chr01 299 750 16 39
I would like to merge the intervals that overlap ( line 2 and 3) and those that are closest (line 1 and 2) in addition to perform some operation basing in the other column ! for example I would like to merge the tree line above into one interval (chr1 1-750), sum basing on the info1 (15+15+16) and finally did the mean basing on the info line to (35+39+39)/3 the output I'd like will be as this :
chr1 1-750 46 37.66
I know that Bedtools can merge interval ( galaxy tool ! too )but accept only BED format that contain only 3 coloumn chr start and end !
Thanks in advance for your help !
Coveragebed, Depth/Breadth Of Coverage
I'm using coverageBed to calculate the depth and breadth of coverage, but I'm not sure I'm doing this right. I want to calculate the two values for each human chromosome.
For example, I've created a bed file with 1 chromosome. When I input my BAM file and the BED file, I get the following output:
chr1 0 249250621 103718897 224950839 249250621 0.9025086
I know the first 3 fields are from my chr BED file, the 4th field is the # of reads, 5th is # of bases covered, 6th is length of chromosome (redundant to field 3), and the last column is the fraction of bases covered (5th field/6th field).
So the 7th/last field gives the breadth of coverage, but I don't see a depth of coverage value. How do I get a depth of coverage?
How to get the rRNA ratio from a RNAseq dataset
Hello,
I want to know if there is any way using the bedtools and miRdeep2 output bed file to get the rRNA ratio in my miRNAseq fastq data. Thank you very much!
I have a gtf file, a genome.fa, a bed file from the miRdeep2. Thanks!
Convert .Txt Into Bed Files
I used paired-end sequence data for copy number variation study; and eventually get .txt files as output. I'm hoping to use Bedtools to compare my results with others.
Can I convert .txt files into .bed files? (I don't see option in Bedtools)
If Bedtools is not working, what software can I use for data comparison?
my lines of txt is just like:
deletion chr9:6169901-6173000 3100
deletion chr9:7657401-7658800 1400
deletion chr9:8847501-8848600 1100
deletion chr9:10010201-10011600 1400
deletion chr9:10126601-10127700 1100
thx
edit: I converted the txt files into bedpe format, which looks like
chr21 18542801 18543500
chr21 18545701 18545900
chr21 19039901 19040600
chr21 19164301 19169400
chr21 19366001 19370200
chr21 19639601 19640300
chr21 20493701 20495700
chr21 20581401 20583000
chr21 20880901 20882700
chr21 21558601 21559700
Then I started to compare two bedpe, looking for overlapping region, using the command like:
pairToPair -a 1.bedpe -b 2.bedpe > share.bedpe
Then I see the errors:
It looks as though you have less than 6 columns. Are you sure your files are tab-delimited?
MY bed file have only three columns, seems it requires 6....What's the problem here? thx
error with bedtools slop
Hi,
I am trying to run a bedtools slop on my.bed file and hg19.genome
bedtools slop -i H3K27me3.bed -g hg19.genome -b 30
I get the following error:
Less than the req'd two fields were encountered in the genome file (genomes/hg19.genome) at line 2. Exiting.
Any suggestions?
Thanks in advance
Samad
Merging/Intersecting Different Gene Annotations - Should I Extend Coordinates?
I want to create gene data-set (as big as possible), hence I am using several gene annotations. However, genes in different annotations overlap (it's the same gene). For reducing biases I overlap different annotations and if genes overlap leave only one gene.
Question:
To ensure this overlap I was thinking to expand gene coordinates - is this necessary? If so, how big extension should be (5bp/100bp)?
Example:
Want to create lncRNA data-set (in the following steps it will be used to search for genomic features).
Input:
- GENCODE lncRNA annotation (version 18 - 04/09/2013);
- Cabili lncRNA annotation (Cabili et al., 2011 (CSHLP)).
Workflow:
- Extract GENCODE genes start/end coordinates;
- Extract Cabili genes start/end coordinates;
- Extend Cabili coordinates ( -/+ nbp );
- Use BedTools intersect;
- If genes intersect leave GENCODE gene (as it's a newer annotation (though this step is really subjective)).
I do realize that this extension question depends on the situation and how reliable annotation is, but still hope that someone could suggest something.
Comparative Snp Analysis
Hello, I am trying to compare the degree of A-to-G editing in a near-to-isogenic pair of cell lines. I have two biological replicates and have mapped with Bowtie and BWA, followed by a samtools mpileup | VarScan analysis. After this, I have used bedtools intersect to extract variants not annotated in dbSNP, but are in Alu repeats. Here is where I have some doubts, mainly two questions: QUESTION 1: In the vcf file (VarScan output),
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2
chrM 73 . G A PASS DP=238 GT:GQ:DP 1/1:71:121 1/1:69:117
What exactly is the meaning of
FORMAT Sample1 Sample2
GT:GQ:DP 1/1:71:121 1/1:69:117
QUESTION 2:
I have higher number of editing sites "called" in sample 1 than in sample 2 in the 1st biological replicate (about 16% difference). However this difference is reversed in the 2nd biological replicate. What is the proper way of comparing the degree of RNA editing in two different samples? Is there a quantitative procedure? I have naively compared them with bedtools intersect, using or omitting option -v. Is this the correct way to go about it?
Many thanks. G.
N Closest Genes To A Given Location
Hi,
This is basically an extension of the following question already asked in biostar (http://biostars.org/post/show/53561/python-finding-gene-closest-to-a-given-location/).
Let us say I have a list of genomic regions (as a bed file), and also a list of genes (as a bed file). For each genomic region I want to find the 5 (or N to be general) closest genes. How would I try to do that? Any suggestions?
Thanks!
Intersect Gene Annotation With Specific Position Or Genomic Interval
Hi,
I've several genomic interval and I want to check if they are overlapping with known gene. I've a gtf file with the coordinates of gene exons. My idea was to use intersectBed from bedtools but I've a little problem with small genomic interval that are are overlapping intron coordinates and not exons ( it do ot report me the gene where this interval is). Is it possible to specifiy to intersectBed to take into account introns ? or is there an another tool ?
Thanks
N.
Determining Each Samples Coverage Area
First time I am working with NGS data. I've got a BAM file with mapped reads for my samples and a BED file with the regions in HG19 that were targeted (used an Ion-torrent ampliseq panel). Are there any tools that can output something similar to this:
**Sample Amplicon Chromosome Start_coordinate_of_coverage End_coordinate_of_coverage**
Sample1 amp_001 chr6 1,000,000 1,000,250
Sample2 amp_001 chr6 1,000,111 1,000,255
Sample1 amp_002 chr6 1,000,200 1,000,333
I basically want to know for each gene what coverage we have for each sample.
EDIT: changed column headings, I'm looking for coordinates that have coverage, not depth at each exon.
Bedtools Intersectbed
Apologies if this is blatantly obvious!
I would like to compare coordinates in setA with those of setB. The output should have the same number of coordinates as setA and tell me how many nucleotides of each setA coordinate are overlapped by any coordinate in setB.
For example a large coordinate in setA may be overlapped by two setB coordinates, but i want to know how many nucleotides of the setA coordinate are covered by both setB coordinate in total.
I know how to do this on GALAXY as there is the handy 'Coverage' tool in 'Operate on Genomic Intervals'. However, i want to do this on the command line. I have been trying to get BEDTools to do this using 'intersectBed', but i can only seem to get just the overlapping setA coords (using -u), or get the nucleotide over for multiple setB coordinates on separate line (using -wao), or a count of how many setB overlaps setA (using -c).
SetB coordinates are non-overlapping themselves, so i guess i could tally up those SetB coordinates that overlap the same setA coordinate.
Can BEDTools do what i want or there another command line way of doing what i want?
Thank you!
PS I have also sent the to BEDTools discussion, so apologies for any double postings!
Given gene ID and genomic coordinates, how can I create a GFF formatted file?
Help With Exception When Using Bedtools Coveragebed With Paired Alignment. [Resolved]
I use bwa mem
to align paired reads to few hundreds of microbial contigs; then I sort the alignment, and trying to get a coverage using bedtools genomecov -ibam alignments.paired.sorted.bam -bg >ranges.txt
, which fails with an exception:
*** glibc detected *** bedtools: double free or corruption (out): 0x0000000001c5f270 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3d7b2750c6]
bedtools[0x45ab43]
bedtools[0x45b146]
bedtools[0x45c163]
bedtools[0x45e2ed]
bedtools[0x434c4b]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3d7b21ecdd]
if I run the same using not paired alignment, everything is ok. So I am really not sure where is my mistake... maybe bedtools doesn't digest the paired alignment?
-- edit: works with the latest versions of these tools. Here are the ones that failed:
$ bwa
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.0-r313
Contact: Heng Li <lh3@sanger.ac.uk>
$ bedtools -version
bedtools v2.16.1