Converting Gff To Bed With Bedtools?

January 20, 2013, 1:51 pm

≫ Next: How To Use Bedtools Windows To Overlap Upstream For Positive Strand Strand

≪ Previous: Can Bedtools/Bedops Used To Extract Regions Where Scores Are Higher Than A Given Value?

I use bedtools's sortBed utility to sort BED files for various operations. It takes as input GFF files as well. However, when I feed it a GFF file as in:

sortBed -i myfile.gff

it outputs it as GFF, not BED. Is there a way to make bedtools sort and then convert the result to BED? Many bedtools utilities have a -bed flag. Do I need to use a different subutility of bedtools to achieve this? thanks.

↧

How To Use Bedtools Windows To Overlap Upstream For Positive Strand Strand

November 21, 2013, 12:48 pm

≫ Next: Splice Junction file intersection with genome annotation

≪ Previous: Converting Gff To Bed With Bedtools?

Hi,

I am trying to use bedtools windows. It has been explained in the manual of the bedtools but I am still bit confused and thought a confirmation would be good. And I have no biological background.

I have divided my bedfile into two, based on the strand information(For example, posStrand.bed and negStrand.bed).

I would like to screen overlaps of LINEs within 5000bp upstream of my postStrand.bed file.

In this case shall I use -l or -r option from bedtools window?
since all are on + strand, do I need to use the -sw option?

↧

Splice Junction file intersection with genome annotation

June 16, 2014, 10:38 pm

≫ Next: Bedtools To Compare A Vcf File From Samtools Mpileup With Dbsnp?

≪ Previous: How To Use Bedtools Windows To Overlap Upstream For Positive Strand Strand

Hello, I have a tab delimited format Splice Junction file and the file looks something like this: chr1 11212 12009 1 1 0 0 2 48 chr1 11672 12009 1 1 0 0 1 31 chr1 11845 12009 1 1 0 0 1 28 chr1 12228 12612 1 1 1 0 1 32 chr1 12722 13220 1 1 1 0 3 9 chr1 14830 14969 2 2 1 0 218 50 chr1 15039 15795 2 2 1 0 98 50 chr1 15948 16606 2 2 1 1 10 48 chr1 16766 16857 2 2 1 0 24 44 chr1 16766 16875 2 2 0 0 2 36 The task is to filter out lines in which Column 6 has value 1, Column 7 has value 1 and Column 8 has value 10 or greater. I have been going through the bedtools documentation but I am not quite sure on how to get started, I would appreciate a few pointers on how to get going. My input file is going to be in the tab delimited format and I also have the Gencode V.19 GTF file for annotation. Thanks! *** Edit *** Column 1: chromosome Column 2: first base of the intron (1-based) Column 3: last base of the intron (1-based) Column 4: strand Column 5: intron motif: 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT Column 6: 0: unannotated, 1: annotated (only if splice junctions database is used) Column 7: number of uniquely mapping reads crossing the junction Column 8: number of multi-mapping reads crossing th ...

↧

Bedtools To Compare A Vcf File From Samtools Mpileup With Dbsnp?

December 1, 2011, 7:43 pm

≫ Next: how to run subtract command in java

≪ Previous: Splice Junction file intersection with genome annotation

Hello,

I have one big vcf file which is genereated by samtools mpileup by comparing 6 cell lines to see whether there are SNP differences between them.

I would like to use bedtools for intersecting. How can I do it? do you have some scripts for that.

Thanks

↧

how to run subtract command in java

September 3, 2014, 2:20 pm

≫ Next: Changing Column Order In Bed File

≪ Previous: Bedtools To Compare A Vcf File From Samtools Mpileup With Dbsnp?

I want to run subtract command in java, could somebody tell me how to use.

Thank you very much.

↧

Changing Column Order In Bed File

August 31, 2012, 3:27 pm

≫ Next: Calculating Exome Coverage

≪ Previous: how to run subtract command in java

Here is my data with A, B, C and D columns in my bed file.

   A.     B.     C.     D.
  Chr 1.  1.    12.     +
  Chr 2.  24.   56.     +

How can I move my D column to position 1 where the Column A right now?

↧

Calculating Exome Coverage

April 3, 2014, 2:00 am

≫ Next: What Is The Best Way To Run Bedtools In Parallel With Blocking

≪ Previous: Changing Column Order In Bed File

*// Edit to make the post more clear (Mapping done via Bowtie2). My problem is that when counting Exome Coverage via coverageBed gives different results than via genomeCoverageBed. So I'm not sure if I'm doing something wrong, or which of the 2 methods is correct.

1) My first step is to build an .bed file of my Illumina Paired-End reads, returning the positions that only fall in targeted exon regions. I'm doing that via intersectBed -a [data.bed] -b [illuminaexonregions.bed].

2) My next step is to calculate the coverage of my new datafile via coverageBed -a [newdata.bed] -b [illuminaexonregions.bed]. I calculated some statistics:

Number of exons 214126 with a total length of 45326818

Number of matched nucleotides 10993449.0

Nucleotides/Length*100 24.253740909 % Coverage.

3) The next step was to calculate the coverage of my new datafile via genomeCoverageBed -i [newdata.bed] -g [genome.txt] -d awk '$3>0 {print $1"\t"$2"\t"$3}'. I calculated some statistics:

Number of exons 214126 with a total length of 45326818

Number of matched nucleotides 10576907.0

Nucleotides/Length*100 23.3347661863 % Coverage.

Somehow there's a difference in matched nucleotides, which I can't explain. What am I doing wrong?

↧

What Is The Best Way To Run Bedtools In Parallel With Blocking

January 15, 2013, 3:27 pm

≫ Next: bedtools intersect - something wrong with chromosome numbers >= 10?

≪ Previous: Calculating Exome Coverage

Say I am working on a server with a shared file system and 4 quad core nodes (I/O is not an issue, 16 cores total). I want to run coverageBed across 20 files. Currently I have a shell script that would do this sequentially. It is possible to just background the command so they run in parallel but I am not sure how to block in BASH. (next step requires counting between the files) Assuming I/O is not a bottleneck, what are ways of leveraging the advantage of multiple nodes/cores when running bedtools (or any other sequential commands for that matter).

From my rudimentary understanding of parallel programming the concept I am trying to get at is how do you 'block' so that that the next command after coverageBed will not be executed until all coverageBed runs are done.

I was thinking of wrapping the shell commands in a python script and having queue of coverageBed commands and a function to feed commands 4 at a time (since quad cores) and the function would only return when queue is empty. Is there a better way of doing this?

↧

bedtools intersect - something wrong with chromosome numbers >= 10?

May 31, 2014, 2:28 am

≫ Next: Renaming SNPs or SNP matching

≪ Previous: What Is The Best Way To Run Bedtools In Parallel With Blocking

Hi!

I have an alignment (.bam) of reads to mm9 genome. I sorted it with samtools sort, so that later I can use -sorted key with bedtools. I also created a .bed-file with regions of interest, in which I want to count number of reads, that mapped to them. I tried this: converted .bam to .bed with bedtools bamtobed, and then intersected them counting number of hits (bedtools intersect -a regions_of_interest.bed -b alignment_sorted.bed -c -sorted > Neg2H_counts.bedgraph). The problem is, it looks fine for all chromosomes with numbers from 0 to 9 (and X), but all counts for all regions of interest of chromosomes with higher number (chr10, chr11, etc) are 0. There is no biological reason for that, in fact the highest signal should be on chr11. What could be wrong here? I am fairly new to all these tools.

UPDATE
I tried to do the same intersection with bedmap and the result is identical... So there probably is something wrong with my files - what could it be?
I also tried sorting the alignment-derived bed-file in the same way, as I did with the files with regions of interest and it doesn't help.

↧

Renaming SNPs or SNP matching

September 30, 2014, 8:22 am

≫ Next: How Can I Compare And Merge Bed Files

≪ Previous: bedtools intersect - something wrong with chromosome numbers >= 10?

This should be easy to do by now, but... we have SNP data from an Illumina exome array given to us in PLINK format. The BIM file looks like this:

1       exm2253575      0       881627  G       A
1       exm269  0       881918  A       G
1       exm340  0       888659  T       C
1       exm348  0       889238  A       G
1       exm2264981      0       894573  G       A
1       exm773  0       909238  G       C
1       exm782  0       909309  C       T
1       exm912  0       949608  A       G
1       exm991  0       977028  T       G
1       exm1024 0       978762  A       G

And I have all of the SNPs in dbSNP 138 downloaded as a large VCF file: #CHROM POS ID REF ALT QUAL FILTER INFO 1 10019 rs376643643 TA T . . RS=376643643;RSPOS=10020;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000200;WGT=1;VC=DIV;R5;OTHERKG 1 10054 rs373328635 CAA C,CA . . RS=373328635;RSPOS=10055;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000210;WGT=1;VC=DIV;R5;OTHERKG;NOC 1 10109 rs376007522 A T . . RS=376007522;RSPOS=10109;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000100;WGT=1;VC=SNV;R5;OTHERKG 1 10139 rs368469931 A T . . RS=368469931;RSPOS=10139;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000100;WGT=1;VC=SNV;R5;OTHERKG 1 10144 rs144773400 TA T . . RS=1447734 ...

↧

How Can I Compare And Merge Bed Files

July 22, 2012, 1:46 pm

≫ Next: To Group Items In Bed Files

≪ Previous: Renaming SNPs or SNP matching

I have three bed files with chrNo, start, end position and type. I need to compare each chrNo, start and end position of one file with 2 other files and write the common one in a new file. Can any one suggest how can I do this efficiently? I wrote the simple perl script, but as the file is huge, it is taking a lot of time, thus is not feasible. Thanks in advance

Example files:

file1.bed:

1 20 30

1 100 120

1 200 300

file2.bed:

1 2 5

1 25 34

1 200 300

file3.bed:

1 30 33

1 200 300

1 500 600

common.bed

1 30 34 --> coordinates with overlapping 5bp is considered as same but outermost coordinates of the 3 is taken in common file

1 200 300

↧

To Group Items In Bed Files

January 20, 2012, 5:50 pm

≫ Next: Get The Idea Of Splicing From Reads Mapped In Rna-Seq

≪ Previous: How Can I Compare And Merge Bed Files

For example, we now have a bed file:

chr1 23455 45678
chr1 23446 45663
chr1 23449 45669
chr1 30000 31000

Is there anyway to group the first three lines, while leaving the last line alone? I know Bedtools have mergeBed function, merging those overlapping span, which, however will include the last line.

This may sound a pure computational question; but I'm just curious if we have available tools already to tackle such questions

thx

↧

Get The Idea Of Splicing From Reads Mapped In Rna-Seq

January 30, 2014, 6:49 am

≫ Next: Converting Sam Files To Bam Files - Reproduce Results Nature Paper: Transcriptome Genetics Using Second Generation Sequencing In A Caucasian Population

≪ Previous: To Group Items In Bed Files

I've got a set of 100 bam files from a public experiment, I want to have an idea of splicing in each of them regarding three exons,without entering in some kind of depth-level procedure like Cufflinks or DEXSeq,

Lets say that my exons are named 1,2 and 3, and I want to know in how many samples I have a splicing event of the number two, so i was looking in the threads and I found that using coverageBed with my bed file of the three exons I could get some kind of idea per bam file

coverageBed -split -abam my_alignment -b exons_to.bed

Am I correct?

I was also thinking of getting the reads mapped in flanking end positions of read 1 and start of read 3 with samtools

What do you think about it? Any idea will be kindly appreciated

Thanks in advance!

↧

Converting Sam Files To Bam Files - Reproduce Results Nature Paper: Transcriptome Genetics Using Second Generation Sequencing In A Caucasian Population

February 9, 2012, 9:02 am

≫ Next: Profile Coverage Of Rnaseq Samples?

≪ Previous: Get The Idea Of Splicing From Reads Mapped In Rna-Seq

I want to reproduce the results that people achieved in the following Nature paper: Transcriptome genetics using second generation sequencing in a Caucasian populationhttp://www.nature.com/nature/journal/vaop/ncurrent/full/nature08903.html I downloaded their SAM files from the groups website:http://funpopgen.unige.ch/data/ceu60 I downloaded a reference fasta and fai file from: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/technical/reference/ The main problems seem to exist that I'm not able to convert these SAM files into proper "working" BAM files so that I can get BED files that is the input format for FluxCapacitor (http://flux.sammeth.net/). I tried using the following steps (as there is no "proper" header in the SAM files I've to do some additional steps):

samtools view -bt human_b36_male.fa.gz.fai first.sam> first.bam
samtools sort first.bam first.bam.sorted
samtools index first.bam.sorted
samtools index aln-sorted.bam

When I the ...

↧

Profile Coverage Of Rnaseq Samples?

February 14, 2013, 3:51 pm

≫ Next: Extracting Genomic Coverage Information Across Different Samples

≪ Previous: Converting Sam Files To Bam Files - Reproduce Results Nature Paper: Transcriptome Genetics Using Second Generation Sequencing In A Caucasian Population

Hi all,

I have a quick question:

How can I visualize aligned paired-end reads from RNAseq datasets in UCSC browser?

I already mapped the reads and assembled the transcripts with Tophat/Cufflinks but I'm not sure how to proceed to visualize the mappings

After sorting the BAM files and fixing the mate pairs, I tried to compute the coverage using the following commands:

genomeCoverageBed -bg -split -ibam F.T0.rep2-accepted_hits-fS.bam -g ~/conversion_util/chrom.hg19.sizes > F.T0.rep2-accepted_hits-fS.bg
bedGraphToBigWig F.T0.rep2-accepted_hits-fS.bg ~/conversion_util/chrom.hg19.sizes F.T0.rep2-accepted_hits-fS.bw

But I was not able to visualize properly the mappings. Here I paste a screenshot of how it looks like:

Do you know where is the mistake?

Thanks!

↧

Extracting Genomic Coverage Information Across Different Samples

March 21, 2014, 1:39 pm

≫ Next: Convert .Txt Into Bed Files

≪ Previous: Profile Coverage Of Rnaseq Samples?

Hello, I have 3 bam files that i wanted to compare against each other. For example i have reference file with 10,000 sequences. I have paired end reads sequenced for 3 different samples. 1) Sample 1 is 100% same as reference so we expect all reads to map to it 2) Sample 2 is 80% similar to reference so 20% of reference sequences wont have any reads 3) Sample 3 is 60% similar to reference and 40% of reference wont have any reads. Now my goal is to identify what reference sequences doesnot have any reads mapped in Sample 2 and 3.I need to identify the 20% reference sequences from Sample 2 and 40% from Sample 3. Also in some cases in a reference which is approx 10kb long, sample 1 maps to entire 10kb, sample 2 maps to first 5kb and sample 3 maps to last 3kb. so i need to identify the partial regions for those reference sequences as well. I have the mapped sorted bam files for all these three samples. I am looking in to using bedtools but not sure what in bedtools will give the answer i needed. i have the following commands which might do similar but it ouputs differences at every base.

genomeCoverageBed -bg -ibam sample1.bam > sample1.bedgraph

genomeCoverageBed -bg -ibam sample2.bam > sample2.bedgraph

unionBedGraphs -header -i sample1.bedgraph sample2. ...

↧

Convert .Txt Into Bed Files

July 21, 2011, 8:13 pm

≫ Next: How To Combine Fpkm Values From Cufflinks With Contigs From De Novo Assembly Program Velvet/Oases?

≪ Previous: Extracting Genomic Coverage Information Across Different Samples

I used paired-end sequence data for copy number variation study; and eventually get .txt files as output. I'm hoping to use Bedtools to compare my results with others.

Can I convert .txt files into .bed files? (I don't see option in Bedtools)

If Bedtools is not working, what software can I use for data comparison?

my lines of txt is just like:

deletion    chr9:6169901-6173000    3100
deletion    chr9:7657401-7658800    1400
deletion    chr9:8847501-8848600    1100
deletion    chr9:10010201-10011600    1400
deletion    chr9:10126601-10127700    1100

thx

edit: I converted the txt files into bedpe format, which looks like

chr21    18542801    18543500
chr21    18545701    18545900
chr21    19039901    19040600
chr21    19164301    19169400
chr21    19366001    19370200
chr21    19639601    19640300
chr21    20493701    20495700
chr21    20581401    20583000
chr21    20880901    20882700
chr21    21558601    21559700

Then I started to compare two bedpe, looking for overlapping region, using the command like:

pairToPair -a 1.bedpe -b 2.bedpe > share.bedpe

Then I see the errors:

It looks as though you have less than 6 columns.  Are you sure your files are tab-delimited?

MY bed file have only three columns, seems it requires 6....What's the problem here? thx

↧

How To Combine Fpkm Values From Cufflinks With Contigs From De Novo Assembly Program Velvet/Oases?

November 23, 2011, 2:32 pm

≫ Next: Which Of The Genes Are Enriched With Repeat Elements

≪ Previous: Convert .Txt Into Bed Files

Hi all,

I am working on RNA-seq data analysis. I've finished running Tophat and Cufflinks to get FPKM values for each read from Illumina pair-end sequence. Also, parallely I've run Velvet to get contig sequences through de novo assembly and Gmap to see if the assembled sequences map to reference genome (this reference genome is not complete for now, but somewhat useful). Now, I am trying to combine all information so I can have sequence information for a contig and FPKM value for the corresponding to the contig. Some suggested I can convert Cufflink and Gmap outputs to bedfiles and then use IntersectBed to see if there's any overlap. However, I am not sure how I can have every information saved in the output from Bedtools. IntersectBed default seems to provide me overlapped region with 'A' file as a template, so I couldn't see any information from 'B' file. Is there any solution for me?? Please let me know. I would appreciate for your suggestion!

↧

Which Of The Genes Are Enriched With Repeat Elements

November 14, 2013, 3:12 am

≫ Next: How To Rearrange Paired End Bam File?

≪ Previous: How To Combine Fpkm Values From Cufflinks With Contigs From De Novo Assembly Program Velvet/Oases?

I would like to know which of my genes are enriched with repeats of LINE/SINE/ERV etc. elements. I have a bam file and the repeats in bed format. As far as I know BAM files contains aligned data for each short read sequence from the fastq file. I am trying to figure out what is the best approach to know which genes (+- 1000 bp) have more repeats elements. I am thinking about two approaches to implement but not sure which one is the best. here are the approaches i was thinking to use a) Shall I convert the bam file into bed file and then use bedtools merge. So that I can overlap with the repeats file using bedtools window -c -l -r option. And I know how many of the repeats are overlapping or near by the short reads. Then count this number for each gene. For example,

chr   start  end gene number_of_repeats
chr1 100  200  gene1 70
chr1 190  240  gene1 40
chr1 250  400  gene1 100
chr2 500  600  gene2 150

if i sort and merge them i will get

chr1 100  240  gene1 90
chr1 250  400  gene1 100
chr2 500  600  gene2 150

So gene1 will have 190 (90 + 100) and gene 2 will have 150 number of repeats. Or b) shall I count the number of repeats which for each short sequence without any merging? ...

↧

How To Rearrange Paired End Bam File?

May 16, 2013, 10:17 am

≫ Next: Bedtools Intersectbed

≪ Previous: Which Of The Genes Are Enriched With Repeat Elements

Hello all,

I have a paired end bam file and I want to use bedtools for them. After merging, the paired end read alignments are not lying next to each other. It is making problems in the bedtools process. Is there any tool available to rearrange the paired end read alignments in bam file?

Thanks, Deeps

↧