genomeCoverageBed -bg -ibam sample1.bam > sample1.bedgraph
genomeCoverageBed -bg -ibam sample2.bam > sample2.bedgraph
unionBedGraphs -header -i sample1.bedgraph sample2. ...
Extracting Genomic Coverage Information Across Different Samples
Which Of The Genes Are Enriched With Repeat Elements
chr start end gene number_of_repeats
chr1 100 200 gene1 70
chr1 190 240 gene1 40
chr1 250 400 gene1 100
chr2 500 600 gene2 150
if i sort and merge them i will get
chr1 100 240 gene1 90
chr1 250 400 gene1 100
chr2 500 600 gene2 150
So gene1 will have 190 (90 + 100) and gene 2 will have 150 number of repeats.
Or
b) shall I count the number of repeats which for each short sequence without any merging? ...
Random shuffling of features leaving gene models intact
I am looking for a tool that can randomly shuffle gff features into intergenic regions, but leaving the gene-models 'intact', so that at least all features of a gene are placed on the same contig and related features are placed inside the interval of their parent region. Bedtools shuffle doesn't seem to do that, I am trying:
shuffleBed -i genes.gff3 -excl genes.gff3 -g chromsizes.txt -f 0
This command distributes sub-features to different contigs and leads to invalid gene-models, if I add -chrom, features are placed on the same contig, but not all features can be placed at all and the resulting gene-models are still not valid. Does anyone maybe have some R-code for this use-case?
GTF2/GFF3 "feature" types and expression analysis
- exon vs CDS
- transcript vs mRNA
Bedtools on Cygwin problem.
NijbroekK@UTWKS11498 /cygdrive/g/Stage_Enschede/methods/methods_Bedtoolsnew
$ make clean
* Cleaning-up BamTools API
* Cleaning up.
NijbroekK@UTWKS11498 /cygdrive/g/Stage_Enschede/methods/methods_Bedtoolsnew
$ make
Building BEDTools:
=========================================================
DETECTED_VERSION = v2.20.1
CURRENT_VERSION = v2.20.1
* Creating BamTools API
- Building in src/utils/bedFile
* compiling bedFile.cpp
bedFile.cpp:1:0: warning: -fPIC ignored for target (all code is position independent) [enabled by default]
/*****************************************************************************
^
- Building in src/utils/BinTree
* compiling BinTree.cpp
BinTree.cpp:1:0: warning: -fPIC ignored for target (all code is position independent) [enabled by default]
#include "BinTree.h"
^
In file included from ../../utils//FileRecordTools/FileReaders/BufferedStreamMgr.h:16:0,
from ../../utils//FileRecordTools/FileRecordMgr.h:19,
from ../../utils//FileRecordTools/FileRecordMergeMgr.h:11,
from ../../utils//Contexts/ContextBase.h:23,
from ../../utils//Contexts/ContextIntersect.h:11,
from BinTree.h:20,
from BinTree.cpp:1:
../.. ...
Getting The Average Coverage From The Coverage Counts At Each Depth.
genomeCoverageBed
for my analysis. And I used the following command:genomeCoverageBed -ibam file.bam -g ~/refs/human_g1k_v37.fasta > coverage.txt
As many are aware, the output of the file looks something like this:
genome 0 26849578 100286070 0.26773
genome 1 30938928 100286070 0.308507
genome 2 21764479 100286070 0.217024
genome 3 11775917 100286070 0.117423
genome 4 5346208 100286070 0.0533096
genome 5 2135366 100286070 0.0212927
genome 6 785983 100286070 0.00783741
genome 7 281282 100286070 0.0028048
genome 8 106971 100286070 0.00106666
genome 9 47419 100286070 0.000472837
genome 10 27403 100286070 0.000273248
To find the coverage, I multiplied col2
(depth) with col3
(number of bases in genome with that depth) and then summed the entire column. Then, I divided it by genome length to get the coverage. In this case, col2 * col3
is:0
30938928
43528958
35327751
21384832
10676830
4715898
1968974
855768
426771
274030
And the sum is: 150098740
. Since the genome length is 1002860 ...
bedtools: extracting no coverage regions
Hello,
I am not sure if this has been answered before as I looked and couldn't find a simple answer.
I have a bam file, and all I want is to annotated all regions with 0 coverage in bed format. Is that possible?
Thank you,
Adrian
Counting Features In A Bed File
I have a file in the following BED format
Chr1 1022071 1022105 +
Chr1 1022071 1022105 +
Chr1 1022072 1022106 -
Chr1 1022072 1022106 -
Chr1 1022072 1022106 -
Chr1 1022072 1022106 -
I am trying get the counts of each feature represented in this file.
mergeBed -i R5_chr.bed -n -s -d 0 > Output/R5_chr_counts.bed
I am interested in the counts of the features and I do not want to merge features by any number of base pairs. Then the output should be as follows
Chr1 1022071 1022105 2 +
Chr1 1022072 1022106 4 +
Any suggestions on how to achieve this using bedtools or in bash or awk? Thanks in advance!
How To Count Genes In Genomic Regions Using A Gtf/Gff3 And A Bed File Of Regions
I'd like to count the number of unique genes in a gff file falling within a list of genomic regions. With bedtools I can count the number of regions within the gff which is almost what I want, but not quite.
bedtools intersect -a regions.bed -b my.gff -c
UPDATE:
I should have made my question a bit more specific. I have a modified ensemble style gtf file (not a gff) that has unique transcript IDs. This means that simply selecting unique fields in the 9th column of the gtf file actually counts transcript IDs.
To circumvent this problem I first truncated the gtf file:
cat my.gff | sed -e 's/;.*//' > delete.me.gtf
Then I ran the bedtools map command:
bedtools map -a regions.bed -b delete.me.gtf -c 9 -o count_distinct > counts.genes_in_windows.bed
I almost forgot to delete the intermediate file:
rm delete.me.gtf
There is probably a way to make this a oneliner, without the intermediate file, but I have a dissertation to write!
Extract coverage per feature from a bam and bed to a file
Profile Coverage Of Rnaseq Samples?
Hi all,
I have a quick question:
How can I visualize aligned paired-end reads from RNAseq datasets in UCSC browser?
I already mapped the reads and assembled the transcripts with Tophat/Cufflinks but I'm not sure how to proceed to visualize the mappings
After sorting the BAM files and fixing the mate pairs, I tried to compute the coverage using the following commands:
genomeCoverageBed -bg -split -ibam F.T0.rep2-accepted_hits-fS.bam -g ~/conversion_util/chrom.hg19.sizes > F.T0.rep2-accepted_hits-fS.bg
bedGraphToBigWig F.T0.rep2-accepted_hits-fS.bg ~/conversion_util/chrom.hg19.sizes F.T0.rep2-accepted_hits-fS.bw
But I was not able to visualize properly the mappings. Here I paste a screenshot of how it looks like:
Do you know where is the mistake?
Thanks!
Renaming SNPs or SNP matching
1 exm2253575 0 881627 G A
1 exm269 0 881918 A G
1 exm340 0 888659 T C
1 exm348 0 889238 A G
1 exm2264981 0 894573 G A
1 exm773 0 909238 G C
1 exm782 0 909309 C T
1 exm912 0 949608 A G
1 exm991 0 977028 T G
1 exm1024 0 978762 A G
And I have all of the SNPs in dbSNP 138 downloaded as a large VCF file:
#CHROM POS ID REF ALT QUAL FILTER INFO
1 10019 rs376643643 TA T . . RS=376643643;RSPOS=10020;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000200;WGT=1;VC=DIV;R5;OTHERKG
1 10054 rs373328635 CAA C,CA . . RS=373328635;RSPOS=10055;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000210;WGT=1;VC=DIV;R5;OTHERKG;NOC
1 10109 rs376007522 A T . . RS=376007522;RSPOS=10109;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000100;WGT=1;VC=SNV;R5;OTHERKG
1 10139 rs368469931 A T . . RS=368469931;RSPOS=10139;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000100;WGT=1;VC=SNV;R5;OTHERKG
1 10144 rs144773400 TA T . . RS=1447734 ...
Getting All Reads That Align To A Region In Compact Bed Format Using Bedtools?
I'm trying to find all the reads (by name) from a BAM file that align to various regions in a bed file. Right now I can do this with bedtools
using intersectBed
:
intersectBed -abam reads.bam -wo -f 1 -b regions.bed -bed
From this one can parse all the read ids that land in every interval in regions.bed
, but it's not very compact. Is there a way to get bedtools
to natively transform this into a more compact format, e.g.
chr1 x y .... read_id1,read_id2,read_id3
where chr1 x y
is a given interval in regions.bed
and the comma separated read_id1,...
is the list of read ids from reads.bam
that fall in that interval. In this compact format, the output BED file would have at most as many entries as there are regions in regions.bed
, whereas with the -wo
option it can be even larger than the number of reads in reads.bam
. Thanks.
bedtools intersect - something wrong with chromosome numbers >= 10?
Hi!
I have an alignment (.bam) of reads to mm9 genome. I sorted it with samtools sort
, so that later I can use -sorted key with bedtools. I also created a .bed-file with regions of interest, in which I want to count number of reads, that mapped to them. I tried this: converted .bam to .bed with bedtools bamtobed
, and then intersected them counting number of hits (bedtools intersect -a regions_of_interest.bed -b alignment_sorted.bed -c -sorted > Neg2H_counts.bedgraph
). The problem is, it looks fine for all chromosomes with numbers from 0 to 9 (and X), but all counts for all regions of interest of chromosomes with higher number (chr10, chr11, etc) are 0. There is no biological reason for that, in fact the highest signal should be on chr11. What could be wrong here? I am fairly new to all these tools.
UPDATE
I tried to do the same intersection with bedmap and the result is identical... So there probably is something wrong with my files - what could it be?
I also tried sorting the alignment-derived bed-file in the same way, as I did with the files with regions of interest and it doesn't help.
Extract Only Paired-End Reads That Map A Specific Interval
Hi,
Is it possible to extract paired-end reads that map to a specific interval ( from a bam file ). I tried with intersectBed :
intersectBed -abam align.bam -b interval.gff3 -wa > result.bam
here's the result :
But I only want reads that map to the feature in bold blue (one of the paired reads is enough). For example, I don't want the reads that map either side of this feature (red arrow).
Is it possible with intersectbed or an other program ?
Thanks,
N.
Snps Comparison
Hello,
I would like to compare SNPs from different methods:
- number of SNPs
- SNPs postion (position where method A has SNPs but not B and vice versa. Where both have SNPs)
I would be interested to get a output file which contain all above information and also would like to see the differences visualized i.e. where could load the two files which contain the SNPs and aligment.
In which format the SNPs has to be stored and which tools have to be used in order to make a comparison possibel?
Thank you in advance.
How To Install Bedtools In A User Directory
To Group Items In Bed Files
For example, we now have a bed file:
chr1 23455 45678
chr1 23446 45663
chr1 23449 45669
chr1 30000 31000
Is there anyway to group the first three lines, while leaving the last line alone? I know Bedtools have mergeBed function, merging those overlapping span, which, however will include the last line.
This may sound a pure computational question; but I'm just curious if we have available tools already to tackle such questions
thx
Creating Bed File For Lncrna Using Gencode Gtf File
Hi all,
I want to get the bed file of lncRNA based on GENCODE GTF file
I download the file "gencode.v16.long_noncoding_RNAs.gtf.gz", and extract the chr, start, end info from the file, then I use mergeBed to merge those overlapped lncRNA, am I correct? Since I know we can merge the exon genomic position using this kind of method
While for lncRNA I am not so sure, and is there any place already offering such kind of bed files?
actually, we should got 22444 Long non-coding RNA loci transcripts, however only 11817 genomic regions after merging process.
Anyone knows the answer, could you give me some help?
Converting Bam To Bedgraph For Viewing On Ucsc?
I'm trying to go from a BAM file to a representation viewable in UCSC, ideally bedGraph. I am trying to use Bedtools's genomeCoverage
like this:
genomeCoverageBed -ibam accepted_hits.sorted.bam -bg -trackline -split -g ... > mytrack.bedGraph
I'm not sure what the -g
argument is supposed to be or how to generate it. The documentation does not explicitly say what it is supposed to be, though it gives an example where it is some sort of BED file. I am simply looking for a bedGraph or other UCSC-friendly compact representation that will allow me to visualize read densities using UCSC from the BAM.EDIT When I generate a bedGraph and put it in UCSC, I get tracks that look like this:
not a histogram. How can I make it a histogram?
How can I generate the genome file for use with genomeCoverageBed
? Also Is this the best way to get a UCSC viewable file with Bedtools? To clarify, I want to visualize the BAM as a histogram. I'm not sure this is possible with bedGraph? Thank you.