Quantcast
Channel: Post Feed
Viewing all 3764 articles
Browse latest View live

How To Use Bedtools Windows To Overlap Upstream For Positive Strand Strand

$
0
0

Hi,

I am trying to use bedtools windows. It has been explained in the manual of the bedtools but I am still bit confused and thought a confirmation would be good. And I have no biological background.

I have divided my bedfile into two, based on the strand information(For example, posStrand.bed and negStrand.bed).

I would like to screen overlaps of LINEs within 5000bp upstream of my postStrand.bed file.

  1. In this case shall I use -l or -r option from bedtools window?
  2. since all are on + strand, do I need to use the -sw option?

Intersectbed/Coveragebed -Split Purify Exon?

$
0
0
all.reads.bam file records mapped RNA-seq reads data, including:
  1. exon:exon junction
  2. exon body
  3. intron body
  4. exon:intron junction
Q1: When calculating RPKM for given RefSeq gene including all the position reads, will the following command just calculate exon:exon junction reads and at same time ignore all other reads?coverageBED -abam all.reads.bam -b refseq.genes.BED12.bed -s -split >coverage.bed I'm confused by the mannual (Page 62):
When dealing with RNA-seq reads, for example, one typically wants to only tabulate coverage for the portions of the reads that come from exons (and ignore the interstitial intron seqeunce), The -split command allows for such coverage to be performed.
If "-split" is set, the exon:exon read (for example, 30M3000N46M") exists in -abam bam file, and the 3000N will NOT be wrongly intersected when running intersectBED command. But what about coverageBED command? I do hope the 3000N will be not calculated which makes sense, and I also hope the intron body reads and other reads will be NOT ignored.Q2: If one just want to calculate exon's RPKM, does it mean one should prepare -b file to record all the exon information, and run like this:coverageBED -abam all.reads.bam -b ...

how to get -nms for bedtools

$
0
0

I'd like to merge bed files and preserve the names of the merged features using bedtools -nms option.

However, this option (-nms) is deprecated in the newer bedtools.

The documentation says I can use -o option to get -nms behavior.

How do I get translate the new bedtools merge command to get:

 

bedtools merge -i file.bed -nms

 

 

bedtools: extracting no coverage regions

$
0
0

Hello,

I am not sure if this has been answered before as I looked and couldn't find a simple answer.

I have a bam file, and all I want is to annotated all regions with 0 coverage in bed format. Is that possible?

Thank you,

Adrian

 

Bedtools: Top N Most Similar Regions When Comparing Two Bed/Wig/Bam Files?

$
0
0

Is there an easy way of finding, probably with bedtools, given a window size, the top N most correlated regions when comparing two bed/wig files? For example, in comparing two bed/wig/bam files that have PolII data for 2 conditions, to give the top N windows where the wiggle profiles are most similar?

Does Bedtools Intersect -V Consider Unmapped Reads "As Not In B"

$
0
0
bedtools intersect -v -abam my.bam -b myregions.gff > notinmyregions.bam

would we see reads with 4 in the FLAG field - i.e. unmapped reads in notinmyregions.bam

Extract coverage per feature from a bam and bed to a file

$
0
0
Hi,   a simple task.. or should be. I need to extract the average coverage per feature in a bam  file. I have a genbank and bed file for the reference the bam was mapped to. if I map with e.g. Geneous I can see good, variable coverage over the reference genome. I have tried GATK (could not get to run) and Bedtools (genomecov and coverage) -coverage will give me an output file but all the features have zero coverage.. here's the top of the .bed file: track name="Example E.coli" o26chr.gb 189 255 thrL gene 0 + o26chr.gb 189 255 thrL CDS 0 + o26chr.gb 336 2799 thrA gene 0 + o26chr.gb 336 2799 thrA CDS 0 + o26chr.gb 2800 3733 thrB gene 0 + o26chr.gb 2800 3733 thrB CDS 0 + o26chr.gb 3733 5020 thrC gene 0 + o26chr.gb 3733 5020 thrC CDS 0 + o26chr.gb 5233 5530 yaaX gene 0 + Here's the top of the output from bedtools coveage -ibam file.bam -b file.bed o26chr.gb 1047122 1048841 poxB gene 0 - 0 0 1719 0.0000000 o26chr.gb 1047122 1048841 poxB CDS 0 - 0 0 1719 0.0000000 o26chr.gb 2096828 2097287 gene 0 + 0 0 459 0.0000000 o26chr.gb 3144900 3148635 yfaL gene 0 - 0 0 3735 0.0000000 o26chr.gb 3144900 3148635 yfaL CDS 0 - 0 0 3735 0.0000000 o26chr.gb 4194149 4194368 tdcR gene 0 + 0 0 219 0.00 ...

Bedtools "Segmentation Fault" While Working With Genome.Fa

$
0
0
I wanted to use BEDTools to extract genomic sequences (fastaFromBed). My BED file has all 24 chromosomes, hence I want to use whole genome (merged from chromosome.fa). Tried to: fastaFromBed -fi genome.fa -bed all.chromosomes.bed -fo output but gotSegmentation fault (core dumped) Tried to use every chromosome.fa separately and it worked: fastaFromBed -fi chromosome${i}.fa -bed all.chromosomes.bed -fo output Of course I am getting annoyingWARNING. chromosome (chr..) was not found in the FASTA file. Skipping. But it's still better than nothing and really fast. I prefer to use BEDTools for sequence extraction so I am wondering is it possible to solve this segmentation fault thing? It seems that large genome.fa file can't be handled by BEDTools as I also tried nucBed and got the same thing or it might be some genome merging problem.EDITED This is the bed file I used for: intersectBed; closestBed; fastaFromBed ([www.box.com][1]). There were problems only with fastaFromBed and only when I tried to use the whole genome.fa (~3.15GB). As I mentioned before - used every chromosome separately, got warnings but there was no segmentation fault and output was fine. I am wandering that it might be genome.fa problem (used cat to me ...

How Can I Compare And Merge Bed Files

$
0
0

I have three bed files with chrNo, start, end position and type. I need to compare each chrNo, start and end position of one file with 2 other files and write the common one in a new file. Can any one suggest how can I do this efficiently? I wrote the simple perl script, but as the file is huge, it is taking a lot of time, thus is not feasible. Thanks in advance

Example files:

file1.bed:

1 20 30

1 100 120

1 200 300

file2.bed:

1 2 5

1 25 34

1 200 300

file3.bed:

1 30 33

1 200 300

1 500 600

common.bed

1 30 34 --> coordinates with overlapping 5bp is considered as same but outermost coordinates of the 3 is taken in common file

1 200 300

how to run subtract command in java

$
0
0

I want to run subtract command in java, could somebody tell me how to use.

Thank you very much.

How To Find The Closest Distance From Bed Files Between Genes And Repeats That Are Upstream

$
0
0

How can I use the closestBed from bedtools to find the closest locations between two bed files. The important bit here is that i want them to be upstream and in correct oriantation.

When I use the -s option, it does not report anything (everything is -1).

Then I checked the -D a option. It is returning some results but not sure if it is the right thing.

The other thing to mention is that my genes bed file (lets call is gene.bed) is organized as

chr1 123 234 +
chr1 456 789 -

rather than end position being smaller to indicate the negative strand.

Whereas my repeats.bed file are organized as

chr1 239 456
chr3 456 987

Does bedtools get confused with this?

Which options should i use if i want to find the distance to nearest repeat that is upstream and in the correct orientation?

Converting Bam To Bedgraph For Viewing On Ucsc?

$
0
0

I'm trying to go from a BAM file to a representation viewable in UCSC, ideally bedGraph. I am trying to use Bedtools's genomeCoverage like this:

genomeCoverageBed -ibam accepted_hits.sorted.bam -bg -trackline -split -g ... > mytrack.bedGraph

I'm not sure what the -g argument is supposed to be or how to generate it. The documentation does not explicitly say what it is supposed to be, though it gives an example where it is some sort of BED file. I am simply looking for a bedGraph or other UCSC-friendly compact representation that will allow me to visualize read densities using UCSC from the BAM.EDIT When I generate a bedGraph and put it in UCSC, I get tracks that look like this:

enter image description here

not a histogram. How can I make it a histogram? How can I generate the genome file for use with genomeCoverageBed? Also Is this the best way to get a UCSC viewable file with Bedtools? To clarify, I want to visualize the BAM as a histogram. I'm not sure this is possible with bedGraph? Thank you.

Convert .Txt Into Bed Files

$
0
0

I used paired-end sequence data for copy number variation study; and eventually get .txt files as output. I'm hoping to use Bedtools to compare my results with others.

Can I convert .txt files into .bed files? (I don't see option in Bedtools)

If Bedtools is not working, what software can I use for data comparison?

my lines of txt is just like:

deletion    chr9:6169901-6173000    3100
deletion    chr9:7657401-7658800    1400
deletion    chr9:8847501-8848600    1100
deletion    chr9:10010201-10011600    1400
deletion    chr9:10126601-10127700    1100

thx

edit: I converted the txt files into bedpe format, which looks like

chr21    18542801    18543500
chr21    18545701    18545900
chr21    19039901    19040600
chr21    19164301    19169400
chr21    19366001    19370200
chr21    19639601    19640300
chr21    20493701    20495700
chr21    20581401    20583000
chr21    20880901    20882700
chr21    21558601    21559700

Then I started to compare two bedpe, looking for overlapping region, using the command like:

pairToPair -a 1.bedpe -b 2.bedpe > share.bedpe

Then I see the errors:

It looks as though you have less than 6 columns.  Are you sure your files are tab-delimited?

MY bed file have only three columns, seems it requires 6....What's the problem here? thx

Reporting The Bam Reads Overlapping A Set Of Intervals With Bedtools

$
0
0

I am trying to use bedtools to pull out the reads falling directly within a set of BED coordinates. While this command does it successfully:

intersectBed -abam mybam.bam -b intervals.gff -wa -wb -f 1 | coverageBed -abam stdin -b intervals.gff

I find that it loses key information that I need. I'd like to get a listing of the BAM reads -- getting at least their ID -- split by exon. In other words, all the read IDs that fall into the first interval in intervals.gff, all the read IDs that fall into the second interval in intervals.gff... ideally, it would also report the CIGAR string for these reads, but I'd settle for just the ID.

Is there a way to report these reads, such that it's easy to tell from the output which set of reads landed in a given interval in the input BED file?

Thanks you.

Does Windowbed Extend Reads?

$
0
0

I am using WindowBed, part of the BedTools suite, to align reads to a reference file and I obtained a very interesting result. I am trying to rule out an analysis artifact that could be caused by extending the reads or by aligning read midpoints rather than 5' ends. It is my understanding that WindowBed aligns the 5' end of the read to the reference point, rather than extending than mapping the read midpoint, or extending the 3' end of the read and mapping the midpoint. Am I correct in this assumption, that the 5' end of the read is in fact what is being aligned?

Any help here would be appreciated. The BedTools manual, which is very good, doesn't seem to address this.

Thanks


Bedtools Genomecoveragebed Usage : How To Create A Genome File?

$
0
0

I am using BEDTOOLS and the following command to get the coverage file:

$ ./genomeCoverageBed -ibam ~/GG_project/trim/ecoli.bam -g > ~/GG_project/trim/coverage

where ecoli.bam is my sorted bam file, and coverage is my output file

From where do I get the genome file? How do I create a genome file?? Specifically I would need a ecoli.genome file.

Renaming SNPs or SNP matching

$
0
0
This should be easy to do by now, but... we have SNP data from an Illumina exome array given to us in PLINK format. The BIM file looks like this:1 exm2253575 0 881627 G A 1 exm269 0 881918 A G 1 exm340 0 888659 T C 1 exm348 0 889238 A G 1 exm2264981 0 894573 G A 1 exm773 0 909238 G C 1 exm782 0 909309 C T 1 exm912 0 949608 A G 1 exm991 0 977028 T G 1 exm1024 0 978762 A G And I have all of the SNPs in dbSNP 138  downloaded as a large VCF file: #CHROM POS ID REF ALT QUAL FILTER INFO 1 10019 rs376643643 TA T . . RS=376643643;RSPOS=10020;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000200;WGT=1;VC=DIV;R5;OTHERKG 1 10054 rs373328635 CAA C,CA . . RS=373328635;RSPOS=10055;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000210;WGT=1;VC=DIV;R5;OTHERKG;NOC 1 10109 rs376007522 A T . . RS=376007522;RSPOS=10109;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000100;WGT=1;VC=SNV;R5;OTHERKG 1 10139 rs368469931 A T . . RS=368469931;RSPOS=10139;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000100;WGT=1;VC=SNV;R5;OTHERKG 1 10144 rs144773400 TA T . . RS=1447734 ...

How To Create A Read Density Profile Within A Interval?

$
0
0

HI!

I need some help: I have to create density profile with a window specific of 1kb (how many time a sequence is detected after NGS method). I have to use SAM and BEDtools, I think I can use genomeCov in BEDtools but I don't have genome reference.

So, if anybody is abble to help me...

Thanks

How Do You Get The Quality Score And Coverage For Every Single Position Of A Reference Assembly

$
0
0

Hi,

I am trying to extract the coverage and the average quality score for each position of a reference assembly in bam/sam format. I have managed to get the coverage using BEDtools

 genomeCoverageBed -ibam mybamfile.bam -g my_genome -d > my_coverage.txt

but am at a loss on how to get some measure of the quality of the base calls at each position. I was thinking that I could use the bcftools to get a variant call formatted file

samtools mpileup -uf ref.fa mybamfile.bam | bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf

but this only provides the sites for which there are SNPs. Any advice greatly appreciated.

Joseph

Correlation Of Fpkm And Length Normalized Transcript Mapped Read Count

$
0
0
Hello, in the process of estimating expression for a 16 human tissue dataset ("Human Body Map 2.0 GSE30611") I used different methods to estimate the expression of the genes. After mapping against hg19 genome version, I used the UCSC provided refseq annotation for hg19 to count mapped reads for ~40,000 human genes in two ways:
  1. Counting with cufflinks outputs a Fragments Per Kilobase Per Million mapped fragments value (FPKM) for each transcript. The FPKM value basically accounts for library size and also the length of the transcript comprising all the annotated exons + some additional likelihood estimator to assign reads (see here).
  2. Counting mapped reads with bedtools and divide a transcript's mapped count by the sum of all the exon lengths. This gained a length normalized expression estimate to compare between genes.
However, the correlation of (1.) and (2.) is always around ~0.65 between same tissues (technically the same experiment). I would expect this correlation to be > 0.9.Below, I plotted (2.) against (1.) for all ~40,000 transcripts. It seems like normal length normalization is simply overestimating some expression.Can someone she ...
Viewing all 3764 articles
Browse latest View live