Multi Thread Bedtools
Question about number of reads within intervals
General Considerations For Genomic Overlaps?
Heatmap Of Read Coverage Around Tsss
coverageBed -d -abam $bamFile -b $TSSs > $coverage.bed
# output:
chr1 67108226 67110226 uc001dct.3 16 + 1 10
chr1 67108226 67110226 uc001dct.3 16 + 2 10
chr1 67108226 67110226 uc001dct.3 16 + 3 10
chr1 67108226 67110226 uc001dct.3 16 + 4 10
chr1 67108226 67110226 uc001dct.3 16 + 5 8
chr1 67108226 67110226 uc001dct.3 16 + 6 8
chr1 67108226 67110226 uc001dct.3 16 + 7 8
chr1 67108226 67110226 uc001dct.3 16 + 8 8
chr1 67108226 67110226 uc001dct.3 16 + 9 8
chr1 67108226 67110226 uc001dct.3 16 + 10 8
Then in R, the genomic position, in column 7, is converted to relative position to the TSS and read counts normalized to the library size. This is converted to a numeric matrix with each row being a TSS and each column the relative nucleotide position. For the plotting the matrix is ordered number of reads per TSS, and the values logged. This is the outcome:
heatmap(cov.mlog, Rowv=NA, Colv= ...
Discrepancy In Samtools Mpileup/Depth And Bedtools Genomecoveragebed Counts
I am getting different counts for the number of bases on reference covered by aligned reads using samtools depth/mpileup and BEDTools genomeCoverageBed commands. I am using samtools-0.1.19 and bedtools-2.17.0
samtools mpileup -ABQ0 -d10000000 -f ref.fas qry.bam > qry.mpileup
samtools depth -q0 -Q0 qry.bam > qry.depth
genomeCoverageBed -ibam qry.bam -g ref.genome -dz > qry.dz
wc -l qry.[dm]*
1026779 qry.depth
1027173 qry.dz
1026779 qry.mpileup
Any ideas? Thanks
How To Check Whole Genome With Bigwigsummary ?
Hi,
I have question about bigwigsummary tools ,
I have my start and end positions and my bigwig file but I want to check whole genome instead of chromosome by chromosome Is there any option to use this tool in that way ?
I know that for each chromosome I have to use :
bigWigSummary -type=X bigwigfile chrN start end datapoints
I want to check from chr1 to chrX.
Thanks in Advance.
Bed File Of Mapq Sliding Window On A Bam File?
input.bam
file. I think bedtools
or bedops
are the way to go:http://bedtools.readthedocs.org/en/latest/content/tools/bamtobed.htmlhttp://bedops.readthedocs.org/en/latest/content/reference/file-management/conversion/bam2bed.html
Other than simply running bamtobed
/bam2bed
, I would like to be able to define a sliding window size and step for the windows, of say, size=1000 and step=200.
I also would like to generate the bam2bed information only from a list of regions in regions.bed
. E.g., something like:mapq_sliding_windows --bam input.bam --wsize 1000 -wstep 200 --regions regions.bed > mapq_sliding_windows.bed
EDITED:
Thank you Aaron for you answer. I got it working but it's slow for my 30x WGS bams:
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e "select chrom, size from hg19.chromInfo" > hg19.genome
bedtools makewindows -g hg19.genome -w 1000 -s 200 > hg19.windows.bed
bedtools map -a hg19.windows.bed -b <(bedtools bamtobed -i input.bam | grep -v chrM) -c 5 -o mean > ...
How To Get Annotation For Bed File From Another Bed File
Hello All,
I have a bed file (with Chr, Start, End, Name, Score and Strand)
Chr1 5678 5680 NA 7 +
Chr1 700 800 NA 8 -
Chr1 900 1200 NA 10 -
and would like to know, how can I get the annotation for the name column from another bed file
Chr1 5500 6000 Gene1 x +
Chr1 500 1000 Gene2 x -
or any standard genome file formats like gbk or .fna files or for that matter another bed file? So mu output file will be a bed file with Chr, Start, End, Name and Strand.
Chr1 5678 5680 Gene1 7 +
Chr1 700 800 Gene2 8 -
Chr1 900 1200 Gene2 10 -
Any easy and standard way to do this??
Bedtools usually operates more on the features but not sure if annotation from one bedfile can be extracted into the other based on overlapping feaures.
Thanks in advance!
Tool: Bedtools: Analyzing Genomic Features
- how many reads map upstream/downstream of one or more locations in the genome?
- how many reads cover a certain base in the genome?
- which sections of the genome are not overlapping with target intervals?
- what are the sequences specified by the coordinates?
- ...
Intersectbed Tool Generating Empty File
I have used the Bedtools command intersectBed to check the overlap between two bed files. A is my INDEL file and B is my Reference file. But it is producing an empty output file. I thought the problem was that the file B is much larger than file A. But I tried changing the file order and it is still not creating any output.
Here is the reference B file (larger):
gff_seqname 0 1395 gene 0 +
gff_seqname 0 1395 exon 0 +
gff_seqname 1397 2498 gene 0 +
gff_seqname 1397 2498 exon 0 +
gff_seqname 2524 3619 gene 0 +
Here is my A file with just 51 INDELS:
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME 174708 174713 -GCCGG:2/6
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME 1078686 1078686 +A:105/112
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME 1229123 1229125 -CT:800/870
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME 1234830 1234830 +AT:134/134
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME 1234833 1234834 -A:134/134
here is my command:
intersectBed -a SOD_pal_BWA_GMM.PE.sorted.bam.sorted_cleaned_GMM.bam.sorted.hr.bam.raw.bed -b sodalis_galaxy.bed -wa -wb >test13.bed
Intersectbed - Overlap Analysis Usign Vcf And Bed Files
I am trying to do an overlap analysis between 200 danish exomes (VCF courtsey: Zev) and 10 different gene regions.
I would like to know what percentage overlaps between my region of interest (in mygenes.bed total of 36
lines representing the region) and a VCF file (Danish_*.flt.vcf.gz).
I have tried this command and got result: intersectBed -a Danish1.flt.vcf.gz -b mygenes.bed > D1result.txt
Danish1.flt.vcf.gz: here mygenes.bed: here D1overlapped.txt: here
My assumption is that the output should have lines <= the total number of lines in the mygenes.bed file. But in many instances I am getting more than 36 lines as output. May be am missing something important or may be another tool / option in bedtools can do this task more efficiently. Please let me know your thoughts.
Extract Only Paired-End Reads That Map A Specific Interval
Hi,
Is it possible to extract paired-end reads that map to a specific interval ( from a bam file ). I tried with intersectBed :
intersectBed -abam align.bam -b interval.gff3 -wa > result.bam
here's the result :
But I only want reads that map to the feature in bold blue (one of the paired reads is enough). For example, I don't want the reads that map either side of this feature (red arrow).
Is it possible with intersectbed or an other program ?
Thanks,
N.
Convert Bamtobed Score
Hey,
just a short question....is there a possibility to set the score in the bed file to "1" an not to the the alignment score?? arguments -tag and -ed only use BAM alignment tags... ?!? :/
Cheers!
Help With Exception When Using Bedtools Coveragebed With Paired Alignment. [Resolved]
I use bwa mem
to align paired reads to few hundreds of microbial contigs; then I sort the alignment, and trying to get a coverage using bedtools genomecov -ibam alignments.paired.sorted.bam -bg >ranges.txt
, which fails with an exception:
*** glibc detected *** bedtools: double free or corruption (out): 0x0000000001c5f270 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3d7b2750c6]
bedtools[0x45ab43]
bedtools[0x45b146]
bedtools[0x45c163]
bedtools[0x45e2ed]
bedtools[0x434c4b]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3d7b21ecdd]
if I run the same using not paired alignment, everything is ok. So I am really not sure where is my mistake... maybe bedtools doesn't digest the paired alignment?
-- edit: works with the latest versions of these tools. Here are the ones that failed:
$ bwa
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.0-r313
Contact: Heng Li <lh3@sanger.ac.uk>
$ bedtools -version
bedtools v2.16.1
How To Install Bedtools In A User Directory
Bedtools Intersectbed
Apologies if this is blatantly obvious!
I would like to compare coordinates in setA with those of setB. The output should have the same number of coordinates as setA and tell me how many nucleotides of each setA coordinate are overlapped by any coordinate in setB.
For example a large coordinate in setA may be overlapped by two setB coordinates, but i want to know how many nucleotides of the setA coordinate are covered by both setB coordinate in total.
I know how to do this on GALAXY as there is the handy 'Coverage' tool in 'Operate on Genomic Intervals'. However, i want to do this on the command line. I have been trying to get BEDTools to do this using 'intersectBed', but i can only seem to get just the overlapping setA coords (using -u), or get the nucleotide over for multiple setB coordinates on separate line (using -wao), or a count of how many setB overlaps setA (using -c).
SetB coordinates are non-overlapping themselves, so i guess i could tally up those SetB coordinates that overlap the same setA coordinate.
Can BEDTools do what i want or there another command line way of doing what i want?
Thank you!
PS I have also sent the to BEDTools discussion, so apologies for any double postings!
Getting Unmapped Reads: Comparing Fastq To Bam
given a FASTQ file and a BAM file of aligned reads, is there an efficient way to get all FASTQ reads that are in the original FASTQ but not in the BAM? Perhaps using bedtools. i.e.:
unmapped_script original.fastq aligned.bam > unmapped.fastq
should create an unmapped.fastq file, which is a subset of original.fastq containing only those entries that do not appear in aligned.bam
thank you.
How To Extract Scores From Bedgraph File Using Bed Tools
file1
chr1 10 20 name 0 +
file2
chr1 12 14 2.5
chr1 14 15 0.5
How could i extract average scores of file1 using file2, like below? I am trying to extract phastcons (file2) average scores of file1.
chr1 10 20 name 0 + 1.5
Merging/Intersecting Different Gene Annotations - Should I Extend Coordinates?
I want to create gene data-set (as big as possible), hence I am using several gene annotations. However, genes in different annotations overlap (it's the same gene). For reducing biases I overlap different annotations and if genes overlap leave only one gene.
Question:
To ensure this overlap I was thinking to expand gene coordinates - is this necessary? If so, how big extension should be (5bp/100bp)?
Example:
Want to create lncRNA data-set (in the following steps it will be used to search for genomic features).
Input:
- GENCODE lncRNA annotation (version 18 - 04/09/2013);
- Cabili lncRNA annotation (Cabili et al., 2011 (CSHLP)).
Workflow:
- Extract GENCODE genes start/end coordinates;
- Extract Cabili genes start/end coordinates;
- Extend Cabili coordinates ( -/+ nbp );
- Use BedTools intersect;
- If genes intersect leave GENCODE gene (as it's a newer annotation (though this step is really subjective)).
I do realize that this extension question depends on the situation and how reliable annotation is, but still hope that someone could suggest something.
Getting Number Of Reads In Intervals With Bedtools
What is the correct way to get the total number of reads strictly contained in each interval in a GFF from a BAM file while enforcing strandedness? What I am looking for is very close to this intersectBed
feature:
-c For each entry in A, report the number of overlaps with B.
- Reports 0 for A entries that have no overlap with B.
- Overlaps restricted by -f and -r.
Except that I'd like the number of overlaps in A for each entry in B (i.e. the other way around). If I do:
intersectBed -abam mybam.bam -b mygff.gff -s -f 1 -wb
Then my understanding is that this will report the entry in B for each overlap with A. But I'd like each entry in B to be outputted exactly once, with the number of reads from A that are contained strictly within it. I'm not sure how to enforce strict containment here.
Is coverageBed
the solution to this? Or multicov
? I'm not sure how to enforce strict containment using coverageBed
- it's not clear to me if that's the default from the docs. Thanks.