Profile Coverage Of Rnaseq Samples?

February 14, 2013, 3:51 pm

≪ Previous: Getting All Reads That Align To A Region In Compact Bed Format Using Bedtools?

Hi all,

I have a quick question:

How can I visualize aligned paired-end reads from RNAseq datasets in UCSC browser?

I already mapped the reads and assembled the transcripts with Tophat/Cufflinks but I'm not sure how to proceed to visualize the mappings

After sorting the BAM files and fixing the mate pairs, I tried to compute the coverage using the following commands:

genomeCoverageBed -bg -split -ibam F.T0.rep2-accepted_hits-fS.bam -g ~/conversion_util/chrom.hg19.sizes > F.T0.rep2-accepted_hits-fS.bg
bedGraphToBigWig F.T0.rep2-accepted_hits-fS.bg ~/conversion_util/chrom.hg19.sizes F.T0.rep2-accepted_hits-fS.bw

But I was not able to visualize properly the mappings. Here I paste a screenshot of how it looks like:

Do you know where is the mistake?

Thanks!

↧

Calculating Exome Coverage

April 3, 2014, 2:00 am

≫ Next: Tool: Bedtools: Analyzing Genomic Features

≪ Previous: Profile Coverage Of Rnaseq Samples?

*// Edit to make the post more clear (Mapping done via Bowtie2). My problem is that when counting Exome Coverage via coverageBed gives different results than via genomeCoverageBed. So I'm not sure if I'm doing something wrong, or which of the 2 methods is correct.

1) My first step is to build an .bed file of my Illumina Paired-End reads, returning the positions that only fall in targeted exon regions. I'm doing that via intersectBed -a [data.bed] -b [illuminaexonregions.bed].

2) My next step is to calculate the coverage of my new datafile via coverageBed -a [newdata.bed] -b [illuminaexonregions.bed]. I calculated some statistics:

Number of exons 214126 with a total length of 45326818

Number of matched nucleotides 10993449.0

Nucleotides/Length*100 24.253740909 % Coverage.

3) The next step was to calculate the coverage of my new datafile via genomeCoverageBed -i [newdata.bed] -g [genome.txt] -d awk '$3>0 {print $1"\t"$2"\t"$3}'. I calculated some statistics:

Number of exons 214126 with a total length of 45326818

Number of matched nucleotides 10576907.0

Nucleotides/Length*100 23.3347661863 % Coverage.

Somehow there's a difference in matched nucleotides, which I can't explain. What am I doing wrong?

↧

Tool: Bedtools: Analyzing Genomic Features

April 24, 2012, 10:54 am

≫ Next: Bedtools intersect tab and bed files

≪ Previous: Calculating Exome Coverage

All practicing bioinformaticians will face problems that require them to compare, query and select genomic features across an entire genome. As it happens efficient interval representation and query is a surprisingly challenging problem that needs a specialized representation. The BEDTools suite contains a set of programs that support a broad range of interval analyses that involve selecting certain locations in the genome. The name reflects the original intent to process BED files but the tools operate just as well on GFF formats. The scripts need to be run in command line format and are available for UNIX type systems: Linux, Mac OSX, and Cygwin (on Windows). The link to the site is: http://code.google.com/p/bedtools/ With BEDTools one can answer questions such as:

how many reads map upstream/downstream of one or more locations in the genome?
how many reads cover a certain base in the genome?
which sections of the genome are not overlapping with target intervals?
what are the sequences specified by the coordinates?
...

The suite consists of multiple tools but for beginners the most important is ...

↧

Bedtools intersect tab and bed files

August 14, 2014, 6:42 am

≫ Next: Intersectbed Overlap

≪ Previous: Tool: Bedtools: Analyzing Genomic Features

How can you call Bedtools intersect on a tab and bed file? without getting the typical:

"Differing number of BED fields encountered at line: #. Exiting..."

Error.

My bed file has 15 columns and my tab file has 18

↧

Intersectbed Overlap

November 23, 2011, 9:20 am

≫ Next: Reporting The Bam Reads Overlapping A Set Of Intervals With Bedtools

≪ Previous: Bedtools intersect tab and bed files

Hi,

I've a question about intersectBed. Is it possible to extract only alignment like this :

chromosome ===============================================================
BED/BAM A               ==============              =================
BED FILE B               ============
RESULT                  ==============

But no alignment like this (even if the read overlapp 100% of the feature, I don't want to extract these reads)

chromosome ===============================================================
BED/BAM A    =========================              =================
BED FILE B               =============
RESULT

So, only extracting reads that have 90-95% of its sequence overlapping 90-95% of the feature.

Is it clear ?

Thanks,

↧

Reporting The Bam Reads Overlapping A Set Of Intervals With Bedtools

November 8, 2011, 1:51 am

≫ Next: Does Windowbed Extend Reads?

≪ Previous: Intersectbed Overlap

I am trying to use bedtools to pull out the reads falling directly within a set of BED coordinates. While this command does it successfully:

intersectBed -abam mybam.bam -b intervals.gff -wa -wb -f 1 | coverageBed -abam stdin -b intervals.gff

I find that it loses key information that I need. I'd like to get a listing of the BAM reads -- getting at least their ID -- split by exon. In other words, all the read IDs that fall into the first interval in intervals.gff, all the read IDs that fall into the second interval in intervals.gff... ideally, it would also report the CIGAR string for these reads, but I'd settle for just the ID.

Is there a way to report these reads, such that it's easy to tell from the output which set of reads landed in a given interval in the input BED file?

Thanks you.

↧

Does Windowbed Extend Reads?

October 21, 2013, 10:08 am

≫ Next: Creating Bed File For Lncrna Using Gencode Gtf File

≪ Previous: Reporting The Bam Reads Overlapping A Set Of Intervals With Bedtools

I am using WindowBed, part of the BedTools suite, to align reads to a reference file and I obtained a very interesting result. I am trying to rule out an analysis artifact that could be caused by extending the reads or by aligning read midpoints rather than 5' ends. It is my understanding that WindowBed aligns the 5' end of the read to the reference point, rather than extending than mapping the read midpoint, or extending the 3' end of the read and mapping the midpoint. Am I correct in this assumption, that the 5' end of the read is in fact what is being aligned?

Any help here would be appreciated. The BedTools manual, which is very good, doesn't seem to address this.

Thanks

↧

Creating Bed File For Lncrna Using Gencode Gtf File

May 12, 2013, 9:29 am

≫ Next: Genbank to bed conversion for bedtools analysis

≪ Previous: Does Windowbed Extend Reads?

Hi all,

I want to get the bed file of lncRNA based on GENCODE GTF file

I download the file "gencode.v16.long_noncoding_RNAs.gtf.gz", and extract the chr, start, end info from the file, then I use mergeBed to merge those overlapped lncRNA, am I correct? Since I know we can merge the exon genomic position using this kind of method

While for lncRNA I am not so sure, and is there any place already offering such kind of bed files?

actually, we should got 22444 Long non-coding RNA loci transcripts, however only 11817 genomic regions after merging process.

Anyone knows the answer, could you give me some help?

↧

Genbank to bed conversion for bedtools analysis

August 20, 2014, 5:28 pm

≫ Next: Bedtools: Top N Most Similar Regions When Comparing Two Bed/Wig/Bam Files?

≪ Previous: Creating Bed File For Lncrna Using Gencode Gtf File

Hi,

I need to use bedtools to obtain the coverage across two bam files for comparison. However, to do this, I need a .bed file of the genome features (of the reference genome used to generate the bams). However, I only have genbank format, and whatever else I can get from NCBI etc. (not .bed). Is there a tool / script to convert the annotation in genbank to .bed? Don't want ot have to start writing a script when it must have been done already.

Thanks,

Theo

↧

Bedtools: Top N Most Similar Regions When Comparing Two Bed/Wig/Bam Files?

February 13, 2012, 1:51 pm

≫ Next: Multi Thread Bedtools

≪ Previous: Genbank to bed conversion for bedtools analysis

Is there an easy way of finding, probably with bedtools, given a window size, the top N most correlated regions when comparing two bed/wig files? For example, in comparing two bed/wig/bam files that have PolII data for 2 conditions, to give the top N windows where the wiggle profiles are most similar?

↧

Multi Thread Bedtools

December 20, 2011, 7:59 am

≫ Next: Converting Sam Files To Bam Files - Reproduce Results Nature Paper: Transcriptome Genetics Using Second Generation Sequencing In A Caucasian Population

≪ Previous: Bedtools: Top N Most Similar Regions When Comparing Two Bed/Wig/Bam Files?

Hi,

Is there a multi thread version of bedtools ? or is this feature in development ?

Thanks,

↧

Converting Sam Files To Bam Files - Reproduce Results Nature Paper: Transcriptome Genetics Using Second Generation Sequencing In A Caucasian Population

February 9, 2012, 9:02 am

≫ Next: Intersectbed Provides An Empty Output

≪ Previous: Multi Thread Bedtools

I want to reproduce the results that people achieved in the following Nature paper: Transcriptome genetics using second generation sequencing in a Caucasian populationhttp://www.nature.com/nature/journal/vaop/ncurrent/full/nature08903.html I downloaded their SAM files from the groups website:http://funpopgen.unige.ch/data/ceu60 I downloaded a reference fasta and fai file from: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/technical/reference/ The main problems seem to exist that I'm not able to convert these SAM files into proper "working" BAM files so that I can get BED files that is the input format for FluxCapacitor (http://flux.sammeth.net/). I tried using the following steps (as there is no "proper" header in the SAM files I've to do some additional steps):

samtools view -bt human_b36_male.fa.gz.fai first.sam> first.bam
samtools sort first.bam first.bam.sorted
samtools index first.bam.sorted
samtools index aln-sorted.bam

When I the ...

↧

Intersectbed Provides An Empty Output

August 16, 2013, 10:53 pm

≫ Next: Bedtools Multicov Need A Bam Index File Specification Option

≪ Previous: Converting Sam Files To Bam Files - Reproduce Results Nature Paper: Transcriptome Genetics Using Second Generation Sequencing In A Caucasian Population

Hi,

I've downloaded the recent Cygwin version 1.7.24 and an trying to run bedTools but I get an empty file as my output. When I run the same commandline and files on a colleagues computer also through Cygwin I get a file containing the overlaps I seek. is the new Cygwin not compatable with BedTools? I've put the command line we used below:

./intersectbed -a Gene_body.bed -b EdgeR1.bed -wao > yyy.temp

Any help would be appreciated.

↧

Bedtools Multicov Need A Bam Index File Specification Option

May 28, 2013, 2:02 am

≫ Next: How To Get Annotation For Bed File From Another Bed File

≪ Previous: Intersectbed Provides An Empty Output

bedtools version 2.16.2 multicov used to compute the multiple sample coverage given a feature file(gtf bed).

format: bedtools multicov -bams alin1.bam aln2.bam .. -bed capturRegion.bed >out.coverage

official doc has mentioned that input bam files should be sorted and indexed, but it does not mention the details. suppose the bam file name is: sample1.bam, then the index file should be named: sample1.bam.bai(not sample1.bai) ,otherwise multicov will report an error: indexes not found.

I think it would be better to add an option which will allow the user to specify the bam index files or the suffix used for these index files.

↧

How To Get Annotation For Bed File From Another Bed File

November 23, 2012, 7:52 pm

≫ Next: Counting Features In A Bed File

≪ Previous: Bedtools Multicov Need A Bam Index File Specification Option

Hello All,

I have a bed file (with Chr, Start, End, Name, Score and Strand)

Chr1 5678 5680 NA 7  +
Chr1 700  800  NA 8  -
Chr1 900  1200 NA 10 -

and would like to know, how can I get the annotation for the name column from another bed file

Chr1 5500 6000 Gene1 x +
Chr1  500 1000 Gene2 x -

or any standard genome file formats like gbk or .fna files or for that matter another bed file? So mu output file will be a bed file with Chr, Start, End, Name and Strand.

Chr1 5678 5680 Gene1 7 +
Chr1 700  800  Gene2 8 -
Chr1 900  1200 Gene2 10 -

Any easy and standard way to do this??

Bedtools usually operates more on the features but not sure if annotation from one bedfile can be extracted into the other based on overlapping feaures.

Thanks in advance!

↧

Counting Features In A Bed File

November 22, 2012, 4:02 am

≫ Next: How To Create A Read Density Profile Within A Interval?

≪ Previous: How To Get Annotation For Bed File From Another Bed File

I have a file in the following BED format

Chr1 1022071 1022105  +
Chr1 1022071 1022105  +
Chr1 1022072 1022106  -
Chr1 1022072 1022106  -
Chr1 1022072 1022106  -
Chr1 1022072 1022106  -

I am trying get the counts of each feature represented in this file.

mergeBed -i R5_chr.bed -n -s -d 0 > Output/R5_chr_counts.bed

I am interested in the counts of the features and I do not want to merge features by any number of base pairs. Then the output should be as follows

Chr1 1022071 1022105 2 +
Chr1 1022072 1022106 4 +

Any suggestions on how to achieve this using bedtools or in bash or awk? Thanks in advance!

↧

How To Create A Read Density Profile Within A Interval?

February 22, 2013, 6:06 am

≫ Next: Simple Redirection, I/O Problem With Bedtools

≪ Previous: Counting Features In A Bed File

HI!

I need some help: I have to create density profile with a window specific of 1kb (how many time a sequence is detected after NGS method). I have to use SAM and BEDtools, I think I can use genomeCov in BEDtools but I don't have genome reference.

So, if anybody is abble to help me...

Thanks

↧

Simple Redirection, I/O Problem With Bedtools

January 24, 2013, 7:41 am

≫ Next: bedtools: extracting no coverage regions

≪ Previous: How To Create A Read Density Profile Within A Interval?

Hi Guys, Just a quick question. Its more of a Bash question rather than Bioinformatics, with Bedtools in question.

I mostly pipe the bedtools I/O. Here's a general scenario :

sed 1d fileA.bed | intersectBed -a stdin -b peaks.bed | intersectBed -u -a stdin -b fileB.bed

Now, the problem is fileB is also having a head, which is reported as an error by intersectBed (makes sense, non-integer start).

How can I remove the first line or the head of the fileB on the fly in the pipe.

Thanks

↧

bedtools: extracting no coverage regions

April 26, 2014, 10:32 am

≫ Next: How Can I Compare And Merge Bed Files

≪ Previous: Simple Redirection, I/O Problem With Bedtools

Hello,

I am not sure if this has been answered before as I looked and couldn't find a simple answer.

I have a bam file, and all I want is to annotated all regions with 0 coverage in bed format. Is that possible?

Thank you,

Adrian

↧

How Can I Compare And Merge Bed Files

July 22, 2012, 1:46 pm

≫ Next: Different coverage from bedtools and in vcf file - HELP PLEASE

≪ Previous: bedtools: extracting no coverage regions

I have three bed files with chrNo, start, end position and type. I need to compare each chrNo, start and end position of one file with 2 other files and write the common one in a new file. Can any one suggest how can I do this efficiently? I wrote the simple perl script, but as the file is huge, it is taking a lot of time, thus is not feasible. Thanks in advance

Example files:

file1.bed:

1 20 30

1 100 120

1 200 300

file2.bed:

1 2 5

1 25 34

1 200 300

file3.bed:

1 30 33

1 200 300