Converting Gff To Bed With Bedtools?

April 17, 2014, 6:02 am

≫ Next: How Can I Include One Bed File In Another Bed File ?

≪ Previous: Error In Bedtools Getfasta: Chromosome Not Found

I use bedtools's sortBed utility to sort BED files for various operations. It takes as input GFF files as well. However, when I feed it a GFF file as in:

sortBed -i myfile.gff

it outputs it as GFF, not BED. Is there a way to make bedtools sort and then convert the result to BED? Many bedtools utilities have a -bed flag. Do I need to use a different subutility of bedtools to achieve this? thanks.

↧

How Can I Include One Bed File In Another Bed File ?

April 17, 2014, 6:02 am

≫ Next: Per Base Coverage

≪ Previous: Converting Gff To Bed With Bedtools?

Hello, I have 2 bedfiles that share some common features let's call the first file A.bed (bigger file) and the second B.bed (smaller file). I would like to have a new bed file that includes everything in B.bed in the A.bed file. I don't need the intersect, I more like need the merge option I checked bedtools's manual... couldn't find an answer for merging 2 bedfiles. Can someone help?

Thanks in advance

↧

Per Base Coverage

April 17, 2014, 6:02 am

≫ Next: Convert .Txt Into Bed Files

≪ Previous: How Can I Include One Bed File In Another Bed File ?

Is there a way to obtain per-base coverage for a define chromosome interval using a bam file generated from Illumina single-end reads? genomeCoverageBed in Bedtools does not seem to have an option for it.

↧

Convert .Txt Into Bed Files

April 17, 2014, 6:02 am

≫ Next: Calculating Exome Coverage

≪ Previous: Per Base Coverage

I used paired-end sequence data for copy number variation study; and eventually get .txt files as output. I'm hoping to use Bedtools to compare my results with others.

Can I convert .txt files into .bed files? (I don't see option in Bedtools)

If Bedtools is not working, what software can I use for data comparison?

my lines of txt is just like:

deletion    chr9:6169901-6173000    3100
deletion    chr9:7657401-7658800    1400
deletion    chr9:8847501-8848600    1100
deletion    chr9:10010201-10011600    1400
deletion    chr9:10126601-10127700    1100

thx

edit: I converted the txt files into bedpe format, which looks like

chr21    18542801    18543500
chr21    18545701    18545900
chr21    19039901    19040600
chr21    19164301    19169400
chr21    19366001    19370200
chr21    19639601    19640300
chr21    20493701    20495700
chr21    20581401    20583000
chr21    20880901    20882700
chr21    21558601    21559700

Then I started to compare two bedpe, looking for overlapping region, using the command like:

pairToPair -a 1.bedpe -b 2.bedpe > share.bedpe

Then I see the errors:

It looks as though you have less than 6 columns.  Are you sure your files are tab-delimited?

MY bed file have only three columns, seems it requires 6....What's the problem here? thx

↧

Calculating Exome Coverage

April 17, 2014, 6:02 am

≫ Next: Bedgraph Not Displayed In Igv

≪ Previous: Convert .Txt Into Bed Files

*// Edit to make the post more clear (Mapping done via Bowtie2). My problem is that when counting Exome Coverage via coverageBed gives different results than via genomeCoverageBed. So I'm not sure if I'm doing something wrong, or which of the 2 methods is correct.

1) My first step is to build an .bed file of my Illumina Paired-End reads, returning the positions that only fall in targeted exon regions. I'm doing that via intersectBed -a [data.bed] -b [illuminaexonregions.bed].

2) My next step is to calculate the coverage of my new datafile via coverageBed -a [newdata.bed] -b [illuminaexonregions.bed]. I calculated some statistics:

Number of exons 214126 with a total length of 45326818

Number of matched nucleotides 10993449.0

Nucleotides/Length*100 24.253740909 % Coverage.

3) The next step was to calculate the coverage of my new datafile via genomeCoverageBed -i [newdata.bed] -g [genome.txt] -d awk '$3>0 {print $1"\t"$2"\t"$3}'. I calculated some statistics:

Number of exons 214126 with a total length of 45326818

Number of matched nucleotides 10576907.0

Nucleotides/Length*100 23.3347661863 % Coverage.

Somehow there's a difference in matched nucleotides, which I can't explain. What am I doing wrong?

↧

Bedgraph Not Displayed In Igv

April 17, 2014, 6:02 am

≫ Next: Getting Rna Sequences From Gff And Fa Files

≪ Previous: Calculating Exome Coverage

Hi, I am new and so facing problem. I was trying to make a bed graph file using bed tools genomecov command. The command was: bedtools genomecov -ibam filename.sorted.bam -g chromosome sizes.txt > O.bedgraph I got a bedgraph file which is much smaller in size. It is 500kb instead of ~6Mb. And when I load that 500kb file into IGV, I see nothing. Please help me out.

↧

Getting Rna Sequences From Gff And Fa Files

April 17, 2014, 6:02 am

≫ Next: Intersectbed Provides An Empty Output

≪ Previous: Bedgraph Not Displayed In Igv

Hi. I have a folder full of .fa files, and a .gff. The gff file contains information about which loci look like they code for RNA sequences. The .fa contain the DNA sequences for a set of human chromosomes. I want to get all the sequences which code for RNA, as defined by the gff file, out of the DNA in the fasta files. I also have a file telling me which RNA types have higher priority (lincRNA is higher priority than miRNA for example), this tells me which are more important and how I should decided between RNAs for overlapping reads in the gff.

I have been trying to code my own little program in F# that will read these files and give me each RNA read defined in the gff, and its corresponding DNA. However I am a bit confused about how it works. Do the start and end of each feature in the gff file define a character in the corresponding .fa file? Are they 1 or 0 indexed? Does it matter what strand they are ('+' or '-') for my purposes?

Ultimately my goal is to get a bunch of RNAs with their corresponding types (miRNA, lincRNA, snRNA... etc) to do some computations on.

My question is this: what is the easiest way to get it out of the data I have?

The data I am using is freely available here: http://wanglab.pcbi.upenn.edu/coral/ under the heading "Annotation packages" if anyone is interested or needs specifics.

Thank you!

↧

Intersectbed Provides An Empty Output

April 17, 2014, 6:02 am

≫ Next: Getting The Average Coverage From The Coverage Counts At Each Depth.

≪ Previous: Getting Rna Sequences From Gff And Fa Files

Hi,

I've downloaded the recent Cygwin version 1.7.24 and an trying to run bedTools but I get an empty file as my output. When I run the same commandline and files on a colleagues computer also through Cygwin I get a file containing the overlaps I seek. is the new Cygwin not compatable with BedTools? I've put the command line we used below:

./intersectbed -a Gene_body.bed -b EdgeR1.bed -wao > yyy.temp

Any help would be appreciated.

↧

Getting The Average Coverage From The Coverage Counts At Each Depth.

April 18, 2014, 6:14 am

≫ Next: How Do You Get The Quality Score And Coverage For Every Single Position Of A Reference Assembly

≪ Previous: Intersectbed Provides An Empty Output

Hi, I have read quite a few posts here about coverage already. But I still had a few questions. I have a BAM file I'm trying to find the coverage of it (typically like say 30X). So, I decided to use genomeCoverageBed for my analysis. And I used the following command:genomeCoverageBed -ibam file.bam -g ~/refs/human_g1k_v37.fasta > coverage.txt As many are aware, the output of the file looks something like this:

genome    0    26849578    100286070 0.26773
genome    1    30938928    100286070     0.308507
genome    2    21764479    100286070    0.217024
genome    3    11775917    100286070    0.117423
genome    4    5346208    100286070    0.0533096
genome    5    2135366    100286070    0.0212927
genome    6    785983    100286070    0.00783741
genome    7    281282    100286070    0.0028048
genome    8    106971    100286070    0.00106666
genome    9    47419    100286070    0.000472837
genome    10    27403    100286070    0.000273248

To find the coverage, I multiplied col2 (depth) with col3 (number of bases in genome with that depth) and then summed the entire column. Then, I divided it by genome length to get the coverage. In this case, col2 * col3 is:

And the sum is: 150098740. Since the genome length is 1002860 ...

↧

How Do You Get The Quality Score And Coverage For Every Single Position Of A Reference Assembly

April 18, 2014, 6:14 am

≫ Next: How To Check Whole Genome With Bigwigsummary ?

≪ Previous: Getting The Average Coverage From The Coverage Counts At Each Depth.

Hi,

I am trying to extract the coverage and the average quality score for each position of a reference assembly in bam/sam format. I have managed to get the coverage using BEDtools

 genomeCoverageBed -ibam mybamfile.bam -g my_genome -d > my_coverage.txt

but am at a loss on how to get some measure of the quality of the base calls at each position. I was thinking that I could use the bcftools to get a variant call formatted file

samtools mpileup -uf ref.fa mybamfile.bam | bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf

but this only provides the sites for which there are SNPs. Any advice greatly appreciated.

Joseph

↧

How To Check Whole Genome With Bigwigsummary ?

April 18, 2014, 6:14 am

≫ Next: Does Bedops Have A Command Similar To The Bedtools Makewindows?

≪ Previous: How Do You Get The Quality Score And Coverage For Every Single Position Of A Reference Assembly

Hi,

I have question about bigwigsummary tools ,

I have my start and end positions and my bigwig file but I want to check whole genome instead of chromosome by chromosome Is there any option to use this tool in that way ?

I know that for each chromosome I have to use :

bigWigSummary -type=X bigwigfile chrN start end datapoints

I want to check from chr1 to chrX.

Thanks in Advance.

↧

Does Bedops Have A Command Similar To The Bedtools Makewindows?

April 18, 2014, 6:14 am

≫ Next: How To Count Genes In Genomic Regions Using A Gtf/Gff3 And A Bed File Of Regions

≪ Previous: How To Check Whole Genome With Bigwigsummary ?

With bedtools you can make genomic windows from a genome file or a bed file

input.bed

chr1    1000000 1500000
chr3    500000  900000

[prompt]$ bedtools makewindows -b input.bed -w 250000

chr1    1000000 1250000
chr1    1250000 1500000
chr3    500000  750000
chr3    750000  900000

Does the bedops suite provide a similar way to create genomic windows?

↧

How To Count Genes In Genomic Regions Using A Gtf/Gff3 And A Bed File Of Regions

April 18, 2014, 6:14 am

≫ Next: Bedtools Genomecoveragebed Usage : How To Create A Genome File?

≪ Previous: Does Bedops Have A Command Similar To The Bedtools Makewindows?

I'd like to count the number of unique genes in a gff file falling within a list of genomic regions. With bedtools I can count the number of regions within the gff which is almost what I want, but not quite.

bedtools intersect -a regions.bed -b my.gff -c

UPDATE:

I should have made my question a bit more specific. I have a modified ensemble style gtf file (not a gff) that has unique transcript IDs. This means that simply selecting unique fields in the 9th column of the gtf file actually counts transcript IDs.

To circumvent this problem I first truncated the gtf file:

cat my.gff | sed -e 's/;.*//' > delete.me.gtf

Then I ran the bedtools map command:

bedtools map -a regions.bed -b delete.me.gtf -c 9 -o count_distinct > counts.genes_in_windows.bed

I almost forgot to delete the intermediate file:

rm delete.me.gtf

There is probably a way to make this a oneliner, without the intermediate file, but I have a dissertation to write!

↧

Bedtools Genomecoveragebed Usage : How To Create A Genome File?

April 18, 2014, 6:14 am

≫ Next: Get The Idea Of Splicing From Reads Mapped In Rna-Seq

≪ Previous: How To Count Genes In Genomic Regions Using A Gtf/Gff3 And A Bed File Of Regions

I am using BEDTOOLS and the following command to get the coverage file:

$ ./genomeCoverageBed -ibam ~/GG_project/trim/ecoli.bam -g > ~/GG_project/trim/coverage

where ecoli.bam is my sorted bam file, and coverage is my output file

From where do I get the genome file? How do I create a genome file?? Specifically I would need a ecoli.genome file.

↧

Get The Idea Of Splicing From Reads Mapped In Rna-Seq

April 18, 2014, 6:14 am

≫ Next: How To Find The Closest Distance From Bed Files Between Genes And Repeats That Are Upstream

≪ Previous: Bedtools Genomecoveragebed Usage : How To Create A Genome File?

I've got a set of 100 bam files from a public experiment, I want to have an idea of splicing in each of them regarding three exons,without entering in some kind of depth-level procedure like Cufflinks or DEXSeq,

Lets say that my exons are named 1,2 and 3, and I want to know in how many samples I have a splicing event of the number two, so i was looking in the threads and I found that using coverageBed with my bed file of the three exons I could get some kind of idea per bam file

coverageBed -split -abam my_alignment -b exons_to.bed

Am I correct?

I was also thinking of getting the reads mapped in flanking end positions of read 1 and start of read 3 with samtools

What do you think about it? Any idea will be kindly appreciated

Thanks in advance!

↧

How To Find The Closest Distance From Bed Files Between Genes And Repeats That Are Upstream

April 18, 2014, 6:14 am

≫ Next: Tutorial: Piping With Samtools, Bwa And Bedtools

≪ Previous: Get The Idea Of Splicing From Reads Mapped In Rna-Seq

How can I use the closestBed from bedtools to find the closest locations between two bed files. The important bit here is that i want them to be upstream and in correct oriantation.

When I use the -s option, it does not report anything (everything is -1).

Then I checked the -D a option. It is returning some results but not sure if it is the right thing.

The other thing to mention is that my genes bed file (lets call is gene.bed) is organized as

chr1 123 234 +
chr1 456 789 -

rather than end position being smaller to indicate the negative strand.

Whereas my repeats.bed file are organized as

chr1 239 456
chr3 456 987

Does bedtools get confused with this?

Which options should i use if i want to find the distance to nearest repeat that is upstream and in the correct orientation?

↧

Tutorial: Piping With Samtools, Bwa And Bedtools

April 18, 2014, 6:14 am

≫ Next: Intersectbed Tool Generating Empty File

≪ Previous: How To Find The Closest Distance From Bed Files Between Genes And Repeats That Are Upstream

In this tutorial I will introduce some concepts related to unix piping. Piping is a very useful feature to avoid creation of intermediate use once files. It is assumed that bedtools, samtools, and bwa are installed. Lets begin with a typical command to do paired end mapping with bwa: (./ means look in current directory only)

#-t 4 is for using 4 threads/cores
bwa aln -t 4 ./hg19.fasta ./s1_1.fastq > ./s1_1.sai
bwa aln -t 4 ./hg19.fasta ./s1_2.fastq > ./s1_2.sai
bwa sampe ./hg19.fasta ./s1_1.sai ./s1_2.sai ./s1_1.fastq ./s1_2.fastq > s1.sam

Supposed we wish to compress sam to bam, sort, remove duplicates, and create a bed file.

samtools view -Shu s1.sam > s1.bam
samtools sort s1.bam s1_sorted
samtools rmdup -s s1_sorted.bam s1_sorted_nodup.bam
bamToBed -i s1_sorted_nodup.bam > s1_sorted_nodup.bed

This workflow above creates many files that are only used once (such as s1.bam) and we can use the unix pipe utility to reduce the number intermediate files created. The pipe function is the character | and what it does is ta ...

↧

Intersectbed Tool Generating Empty File

April 18, 2014, 6:14 am

≫ Next: Tool: Bedtools: Analyzing Genomic Features

≪ Previous: Tutorial: Piping With Samtools, Bwa And Bedtools

I have used the Bedtools command intersectBed to check the overlap between two bed files. A is my INDEL file and B is my Reference file. But it is producing an empty output file. I thought the problem was that the file B is much larger than file A. But I tried changing the file order and it is still not creating any output.

Here is the reference B file (larger):

gff_seqname      0        1395    gene    0    +
gff_seqname      0        1395    exon    0    +
gff_seqname    1397    2498    gene    0    +
gff_seqname    1397    2498    exon    0    +
gff_seqname    2524    3619    gene    0    +

Here is my A file with just 51 INDELS:

NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    174708    174713    -GCCGG:2/6
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    1078686    1078686    +A:105/112
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    1229123    1229125    -CT:800/870
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    1234830    1234830    +AT:134/134
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    1234833    1234834    -A:134/134

here is my command:

intersectBed -a SOD_pal_BWA_GMM.PE.sorted.bam.sorted_cleaned_GMM.bam.sorted.hr.bam.raw.bed  -b sodalis_galaxy.bed  -wa -wb  >test13.bed

↧

Tool: Bedtools: Analyzing Genomic Features

April 18, 2014, 6:14 am

≫ Next: Converting Sam Files To Bam Files - Reproduce Results Nature Paper: Transcriptome Genetics Using Second Generation Sequencing In A Caucasian Population

≪ Previous: Intersectbed Tool Generating Empty File

All practicing bioinformaticians will face problems that require them to compare, query and select genomic features across an entire genome. As it happens efficient interval representation and query is a surprisingly challenging problem that needs a specialized representation. The BEDTools suite contains a set of programs that support a broad range of interval analyses that involve selecting certain locations in the genome. The name reflects the original intent to process BED files but the tools operate just as well on GFF formats. The scripts need to be run in command line format and are available for UNIX type systems: Linux, Mac OSX, and Cygwin (on Windows). The link to the site is: http://code.google.com/p/bedtools/ With BEDTools one can answer questions such as:

how many reads map upstream/downstream of one or more locations in the genome?
how many reads cover a certain base in the genome?
which sections of the genome are not overlapping with target intervals?
what are the sequences specified by the coordinates?
...

The suite consists of multiple tools but for beginners the most important is ...

↧

Converting Sam Files To Bam Files - Reproduce Results Nature Paper: Transcriptome Genetics Using Second Generation Sequencing In A Caucasian Population

April 18, 2014, 6:14 am

≫ Next: Fastafrombed Problem

≪ Previous: Tool: Bedtools: Analyzing Genomic Features

I want to reproduce the results that people achieved in the following Nature paper: Transcriptome genetics using second generation sequencing in a Caucasian populationhttp://www.nature.com/nature/journal/vaop/ncurrent/full/nature08903.html I downloaded their SAM files from the groups website:http://funpopgen.unige.ch/data/ceu60 I downloaded a reference fasta and fai file from: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/technical/reference/ The main problems seem to exist that I'm not able to convert these SAM files into proper "working" BAM files so that I can get BED files that is the input format for FluxCapacitor (http://flux.sammeth.net/). I tried using the following steps (as there is no "proper" header in the SAM files I've to do some additional steps):

samtools view -bt human_b36_male.fa.gz.fai first.sam> first.bam
samtools sort first.bam first.bam.sorted
samtools index first.bam.sorted
samtools index aln-sorted.bam

When I the ...

↧