Profile Coverage Of Rnaseq Samples?

February 14, 2013, 3:51 pm

≫ Next: Get The Idea Of Splicing From Reads Mapped In Rna-Seq

≪ Previous: Can Bedtools/Bedops Used To Extract Regions Where Scores Are Higher Than A Given Value?

Hi all,

I have a quick question:

How can I visualize aligned paired-end reads from RNAseq datasets in UCSC browser?

I already mapped the reads and assembled the transcripts with Tophat/Cufflinks but I'm not sure how to proceed to visualize the mappings

After sorting the BAM files and fixing the mate pairs, I tried to compute the coverage using the following commands:

genomeCoverageBed -bg -split -ibam F.T0.rep2-accepted_hits-fS.bam -g ~/conversion_util/chrom.hg19.sizes > F.T0.rep2-accepted_hits-fS.bg
bedGraphToBigWig F.T0.rep2-accepted_hits-fS.bg ~/conversion_util/chrom.hg19.sizes F.T0.rep2-accepted_hits-fS.bw

But I was not able to visualize properly the mappings. Here I paste a screenshot of how it looks like:

Do you know where is the mistake?

Thanks!

↧

Get The Idea Of Splicing From Reads Mapped In Rna-Seq

January 30, 2014, 6:49 am

≫ Next: Simple Redirection, I/O Problem With Bedtools

≪ Previous: Profile Coverage Of Rnaseq Samples?

I've got a set of 100 bam files from a public experiment, I want to have an idea of splicing in each of them regarding three exons,without entering in some kind of depth-level procedure like Cufflinks or DEXSeq,

Lets say that my exons are named 1,2 and 3, and I want to know in how many samples I have a splicing event of the number two, so i was looking in the threads and I found that using coverageBed with my bed file of the three exons I could get some kind of idea per bam file

coverageBed -split -abam my_alignment -b exons_to.bed

Am I correct?

I was also thinking of getting the reads mapped in flanking end positions of read 1 and start of read 3 with samtools

What do you think about it? Any idea will be kindly appreciated

Thanks in advance!

↧

Simple Redirection, I/O Problem With Bedtools

January 24, 2013, 7:41 am

≫ Next: Bedtools To Compare A Vcf File From Samtools Mpileup With Dbsnp?

≪ Previous: Get The Idea Of Splicing From Reads Mapped In Rna-Seq

Hi Guys, Just a quick question. Its more of a Bash question rather than Bioinformatics, with Bedtools in question.

I mostly pipe the bedtools I/O. Here's a general scenario :

sed 1d fileA.bed | intersectBed -a stdin -b peaks.bed | intersectBed -u -a stdin -b fileB.bed

Now, the problem is fileB is also having a head, which is reported as an error by intersectBed (makes sense, non-integer start).

How can I remove the first line or the head of the fileB on the fly in the pipe.

Thanks

↧

Bedtools To Compare A Vcf File From Samtools Mpileup With Dbsnp?

December 1, 2011, 7:43 pm

≫ Next: Converting Gff To Bed With Bedtools?

≪ Previous: Simple Redirection, I/O Problem With Bedtools

Hello,

I have one big vcf file which is genereated by samtools mpileup by comparing 6 cell lines to see whether there are SNP differences between them.

I would like to use bedtools for intersecting. How can I do it? do you have some scripts for that.

Thanks

↧

Converting Gff To Bed With Bedtools?

January 20, 2013, 1:51 pm

≫ Next: Correlation Of Fpkm And Length Normalized Transcript Mapped Read Count

≪ Previous: Bedtools To Compare A Vcf File From Samtools Mpileup With Dbsnp?

I use bedtools's sortBed utility to sort BED files for various operations. It takes as input GFF files as well. However, when I feed it a GFF file as in:

sortBed -i myfile.gff

it outputs it as GFF, not BED. Is there a way to make bedtools sort and then convert the result to BED? Many bedtools utilities have a -bed flag. Do I need to use a different subutility of bedtools to achieve this? thanks.

↧

Correlation Of Fpkm And Length Normalized Transcript Mapped Read Count

August 20, 2012, 10:36 am

≫ Next: Counting Number Of Bam Reads Directly Within Set Of Intervals With Bedtools

≪ Previous: Converting Gff To Bed With Bedtools?

Hello, in the process of estimating expression for a 16 human tissue dataset ("Human Body Map 2.0 GSE30611") I used different methods to estimate the expression of the genes. After mapping against hg19 genome version, I used the UCSC provided refseq annotation for hg19 to count mapped reads for ~40,000 human genes in two ways:

Counting with cufflinks outputs a Fragments Per Kilobase Per Million mapped fragments value (FPKM) for each transcript. The FPKM value basically accounts for library size and also the length of the transcript comprising all the annotated exons + some additional likelihood estimator to assign reads (see here).
Counting mapped reads with bedtools and divide a transcript's mapped count by the sum of all the exon lengths. This gained a length normalized expression estimate to compare between genes.

However, the correlation of (1.) and (2.) is always around ~0.65 between same tissues (technically the same experiment). I would expect this correlation to be > 0.9.Below, I plotted (2.) against (1.) for all ~40,000 transcripts. It seems like normal length normalization is simply overestimating some expression.Can someone she ...

↧

Counting Number Of Bam Reads Directly Within Set Of Intervals With Bedtools

September 7, 2011, 1:04 am

≫ Next: How To Use Bedtools To Extract Promoters From A Mouse Bed File

≪ Previous: Correlation Of Fpkm And Length Normalized Transcript Mapped Read Count

how can I count the number of BAM reads falling directly within a set of intervals, given in a GFF format? Note that I do not want reads overlapping the intervals, but ones that fall directly within them.

I tried the following:

intersectBed -abam reads.bam -b exons.gff -wb -f 1

this has redundancies, so I pipe it into coverageBed as follows:

intersectBed -abam reads.bam -b exons.gff -wb -f 1 | coverageBed -abam stdin -b exons.gff

Is this correct? Thanks.

↧

How To Use Bedtools To Extract Promoters From A Mouse Bed File

February 8, 2012, 12:36 pm

≫ Next: Picking Random Genomic Positions

≪ Previous: Counting Number Of Bam Reads Directly Within Set Of Intervals With Bedtools

Hello, I would like to know how to use Bedtools to extract promoter sequences (as FASTAs) from the mouse genome (mm9) starting from a BED file.

↧

Picking Random Genomic Positions

July 9, 2012, 5:11 am

≫ Next: How To Extract Scores From Bedgraph File Using Bed Tools

≪ Previous: How To Use Bedtools To Extract Promoters From A Mouse Bed File

I do have a set of TF binding coordinates and want to see if there is any significant overlap with an open chromatin annotation.

Example of TF coord:
chr1 19280 19298
chr1 245920 245938
chr2 97290 97308
chr9 752910 752938
...

Example of open chrom. coord. (UCSC track):
chr2 33031543 33032779
chr3 2304169 2304825
chr5 330899 330940
...

I have checked the intersection with the Bedtools (open chrom. coord vs TF coord. -/+ 100bp) and now I want to check the intersection between random genomic coordinates and open chrom.

The idea is to:

Pick random genomic position (from the same chromosome as TF coordinate);
-/+9bp (binding site size);
-/+ 100bp;
Run this simulation for 1000 times (TF x 1000);
Bedtools;

Any ideas how can I do this simulation to pick random genomic positions from the same chromosome? I know a little bit of bash and Perl, but won't be able to write the script by myself.
Is it possible to measure the length of every chromosome;
Pick TF chromosome and from it's length get a random number which would represent a genomic position?

Can someone help me with the simulation and the pipeline.

↧

How To Extract Scores From Bedgraph File Using Bed Tools

January 23, 2013, 1:49 am

≫ Next: Using Gnu Parallel For Bedtools

≪ Previous: Picking Random Genomic Positions

file1

chr1 10 20 name 0 +

file2

chr1 12 14 2.5
chr1 14 15 0.5

How could i extract average scores of file1 using file2, like below? I am trying to extract phastcons (file2) average scores of file1.

chr1  10 20 name 0 + 1.5

↧

Using Gnu Parallel For Bedtools

February 5, 2014, 4:36 am

≫ Next: Fastafrombed Problem

≪ Previous: How To Extract Scores From Bedgraph File Using Bed Tools

I am trying to run gnu:parallel on bedtools multicov function where the original command is

bedtools multicov -bams bam1 bam2 bam3.. -bed anon.bed  > Q1_Counst.bed

I would like to implement the above command using gnu parallel. But when I run the command below

parallel -j 25 "bedtools multicov -bams {1} -bed {2} > Q1_Counst.bed" ::: minus_1_common_sorted_q1.bam minus_2_common_sorted_q1.bam minus_3_common_sorted_q1.bam plus_1_common_sorted_q1.bam plus_2_common_sorted_q1.bam plus_3_common_sorted_q1.bam ::: '/genome/genes_exon_2.bed'

each bam file is taken as separate argument , hence the processes starting are like

bedtools multicov -bams  bam1 -bed anon.bed  > Q1_Counst.bed
bedtools multicov -bams  bam2 -bed anon.bed  > Q1_Counst.bed
bedtools multicov -bams  bam3 -bed anon.bed  > Q1_Counst.bed

instead of taking all files as separate arguments. Hence Q1_Counst.bed is overwritten randomly. Could any one help me in getting exact command ? My server has around 30 cores.

↧

Fastafrombed Problem

August 28, 2011, 10:46 pm

≫ Next: Getting Rna Sequences From Gff And Fa Files

≪ Previous: Using Gnu Parallel For Bedtools

hi,

I try this tools from BedTools but it doesnt work!

$ cat testgenome404.fa

>chr1
AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG

$ cat test.bed
chr1    5       10

$ ./fastaFromBed -fi testgenome404.fa -bed test.bed  -fo test.fa.out

**index file testgenome404.fa.fai not found, generating...

unable to find FASTA index entry for 'chr1'**

$ cat testgenome404.fa.fai
chr1    46      7       46      47

what is this file "testgenome404.fa.fai" what does means this number? chr1 46 7 46 47

why this message?

unable to find FASTA index entry for 'chr1'

Thanks in advance for any help Sara

↧

Getting Rna Sequences From Gff And Fa Files

August 24, 2013, 7:30 am

≫ Next: How To Get Annotation For Bed File From Another Bed File

≪ Previous: Fastafrombed Problem

Hi. I have a folder full of .fa files, and a .gff. The gff file contains information about which loci look like they code for RNA sequences. The .fa contain the DNA sequences for a set of human chromosomes. I want to get all the sequences which code for RNA, as defined by the gff file, out of the DNA in the fasta files. I also have a file telling me which RNA types have higher priority (lincRNA is higher priority than miRNA for example), this tells me which are more important and how I should decided between RNAs for overlapping reads in the gff.

I have been trying to code my own little program in F# that will read these files and give me each RNA read defined in the gff, and its corresponding DNA. However I am a bit confused about how it works. Do the start and end of each feature in the gff file define a character in the corresponding .fa file? Are they 1 or 0 indexed? Does it matter what strand they are ('+' or '-') for my purposes?

Ultimately my goal is to get a bunch of RNAs with their corresponding types (miRNA, lincRNA, snRNA... etc) to do some computations on.

My question is this: what is the easiest way to get it out of the data I have?

The data I am using is freely available here: http://wanglab.pcbi.upenn.edu/coral/ under the heading "Annotation packages" if anyone is interested or needs specifics.

Thank you!

↧

How To Get Annotation For Bed File From Another Bed File

November 23, 2012, 7:52 pm

≫ Next: Convert Bamtobed Score

≪ Previous: Getting Rna Sequences From Gff And Fa Files

Hello All,

I have a bed file (with Chr, Start, End, Name, Score and Strand)

Chr1 5678 5680 NA 7  +
Chr1 700  800  NA 8  -
Chr1 900  1200 NA 10 -

and would like to know, how can I get the annotation for the name column from another bed file

Chr1 5500 6000 Gene1 x +
Chr1  500 1000 Gene2 x -

or any standard genome file formats like gbk or .fna files or for that matter another bed file? So mu output file will be a bed file with Chr, Start, End, Name and Strand.

Chr1 5678 5680 Gene1 7 +
Chr1 700  800  Gene2 8 -
Chr1 900  1200 Gene2 10 -

Any easy and standard way to do this??

Bedtools usually operates more on the features but not sure if annotation from one bedfile can be extracted into the other based on overlapping feaures.

Thanks in advance!

↧

Convert Bamtobed Score

February 28, 2012, 6:00 am

≫ Next: Bedtools Compare Multiple Bed Files?

≪ Previous: How To Get Annotation For Bed File From Another Bed File

Hey,

just a short question....is there a possibility to set the score in the bed file to "1" an not to the the alignment score?? arguments -tag and -ed only use BAM alignment tags... ?!? :/

Cheers!

↧

Bedtools Compare Multiple Bed Files?

October 26, 2011, 5:27 pm

≫ Next: N Closest Genes To A Given Location

≪ Previous: Convert Bamtobed Score

I've been dealing with comparison between two bed files using intersectBed -a -b command. I'm just wondering, is there any commands in Bedtools which can help us compare multiple bed files?

Say, I have 3 bed files (A,B,C). I want to identify those regions where any two of the three (AB,BC,AC)overlaps reciprocally 50%.....

thx

edit: Just find this post right now.Maybe I didn't express quite well a couple of months ago. I mean to find those overlappings which spans at least 50% of EACH of the multiple bed files. So I don't quite understand cat AB BC AC > ABC.common Means to find the overlapping part of all the three?

I myself try to solve the problem like below:

intersectBed -a 2 -b 3 > 23
intersectBed -a 1 -b 3 > 13
intersectBed -a 1 -b 2 > 12

intersectBed -a 1 -b 23 -f 0.50|sort > 23_1
intersectBed -a 2 -b 13 -f 0.50|sort > 13_2
intersectBed -a 3 -b 12 -f 0.50|sort > 12_3

comm -1 -2 23_1 13_2 > test
comm -1 -2 test 1_3 > final result

I don't know if I'm on the right track. thx

↧

N Closest Genes To A Given Location

September 25, 2012, 1:11 pm

≫ Next: Extract Only Paired-End Reads That Map A Specific Interval

≪ Previous: Bedtools Compare Multiple Bed Files?

Hi,

This is basically an extension of the following question already asked in biostar (http://biostars.org/post/show/53561/python-finding-gene-closest-to-a-given-location/).

Let us say I have a list of genomic regions (as a bed file), and also a list of genes (as a bed file). For each genomic region I want to find the 5 (or N to be general) closest genes. How would I try to do that? Any suggestions?

Thanks!

↧

Extract Only Paired-End Reads That Map A Specific Interval

August 31, 2012, 1:23 am

≫ Next: Genomecoveragebed - Bedtool For Reporting Per Base Genome Coverage

≪ Previous: N Closest Genes To A Given Location

Hi,

Is it possible to extract paired-end reads that map to a specific interval ( from a bam file ). I tried with intersectBed :

intersectBed -abam align.bam -b interval.gff3 -wa > result.bam

here's the result :

enter image description here

But I only want reads that map to the feature in bold blue (one of the paired reads is enough). For example, I don't want the reads that map either side of this feature (red arrow).

Is it possible with intersectbed or an other program ?

Thanks,

↧

Genomecoveragebed - Bedtool For Reporting Per Base Genome Coverage

February 15, 2012, 1:56 pm

≫ Next: Changing Column Order In Bed File

≪ Previous: Extract Only Paired-End Reads That Map A Specific Interval

Hi Everyone I would nedd some help on genomeCoverageBed tool. This tools when used for finding per base genome coverage uses an option -d. I am actually interested in finding read counts for each base within a particular intron of a gene. I will like to explain you more just to make myself clear. I used IGV to see how my alignments looks and moreover what is the coverage of each base within a particular intron. When I take my cursor in IGV to the area exactly above the base (i am interested in)within the coverage track it gives me such details:

Total Count:6
A:0
C:0
G:6
T:0
N:0

Now this total count is basically the read count for the base G within that intron. This counts says that 6 reads have actually covered this base position(and hence base). Now when i use this code snippet which is basically finding per base genome coverage genomeCoverageBed -i 2-B3-1b-D303A_sorted.bed -g pombe.genome -d this code gives me around 31 as the depth for that base(i.e G in my example). Looking closely in IGV i figured out that this 21 is basically 21 = 6 + 15 where 6 is the actual reads that has covered this base position(hence base) and 15 means that these reads have not covered that base at that position, but since the genomeCoverageBed tool calculates depth of feature coverage it also includes all those reads which skips that particular base. I would provide you with an image to make it more clear I would like to know how can i ...

↧

Changing Column Order In Bed File

August 31, 2012, 3:27 pm

≫ Next: How To Combine Fpkm Values From Cufflinks With Contigs From De Novo Assembly Program Velvet/Oases?

≪ Previous: Genomecoveragebed - Bedtool For Reporting Per Base Genome Coverage

Here is my data with A, B, C and D columns in my bed file.

   A.     B.     C.     D.
  Chr 1.  1.    12.     +
  Chr 2.  24.   56.     +

How can I move my D column to position 1 where the Column A right now?

↧