Problem With Counting Mapped Reads

March 23, 2014, 5:43 pm

≫ Next: Getting Unmapped Reads: Comparing Fastq To Bam

Hi, This is my very first experience analysing RNAseq data. My goal is to do differential analysis between two strains of a bacteria. So far, i managed to align and produce SAM and BAM files. I'm having problems to annotate and count my reads. Here are the commands that I used. My reads are from SOLID and hence in colourspace

$ nohup solid2fastq.pl 291_01_01 291_01_01-bwa  #Convert .csfasta and .qual to .fastq

$ nohup bwa index -c TbruceiTreu927Genomic_TriTrypDB-4.0.fasta

$ nohup bwa aln -c TbruceiTreu927Genomic_TriTrypDB-4.0.fasta 291_01_01-bwa.singleF3.fastq 291_01_01-bwa.sai

$ perl -ne 'if($_ !~ m/^\S+?\t4\t/){print $_}' 291_01_01-bwa.sam > 291_01_01-bwa.sam.filtered #Convert to SAM file

$ samtools sort 291_01_01-bwa.bam 291_01_01-bwa.bam.sorted

$ samtools index 291_01_01-bwa.bam.sorted.bam

to produce .rpkm file

$ java -jar ~/bin/bam2rpkm-0.06/bam2rpkm-0.06.jar  -i 291_01_01-bwa.bam.sorted.bam -f Tbrucei427_TriTrypDB-4.0.gff > 291_01_01-bwa.RPKM2.out  # i get an error here
$ERROR: Problem encountered whilst reading gtf file. Could not interpret line 'GeneDB|Tb427_01_v4 EuPathDB supercontig 1

so i tried different method to count

$ htseq-count -i ID 291_01_01-bwa.sam Tbrucei427_TriTrypDB-4.0.gff > 291_01_01-bwa.sam_htseq-count #still error
$Error occured when processing GFF file (line 37060 of file Tbrucei427_Tr ...

↧

Getting Unmapped Reads: Comparing Fastq To Bam

December 4, 2011, 6:02 pm

≫ Next: How Can I Include One Bed File In Another Bed File ?

≪ Previous: Problem With Counting Mapped Reads

given a FASTQ file and a BAM file of aligned reads, is there an efficient way to get all FASTQ reads that are in the original FASTQ but not in the BAM? Perhaps using bedtools. i.e.:

unmapped_script original.fastq aligned.bam > unmapped.fastq

should create an unmapped.fastq file, which is a subset of original.fastq containing only those entries that do not appear in aligned.bam

thank you.

↧

How Can I Include One Bed File In Another Bed File ?

August 5, 2013, 4:34 am

≫ Next: Counting Features In A Bed File

≪ Previous: Getting Unmapped Reads: Comparing Fastq To Bam

Hello, I have 2 bedfiles that share some common features let's call the first file A.bed (bigger file) and the second B.bed (smaller file). I would like to have a new bed file that includes everything in B.bed in the A.bed file. I don't need the intersect, I more like need the merge option I checked bedtools's manual... couldn't find an answer for merging 2 bedfiles. Can someone help?

Thanks in advance

↧

Counting Features In A Bed File

November 22, 2012, 4:02 am

≫ Next: How To Check Whole Genome With Bigwigsummary ?

≪ Previous: How Can I Include One Bed File In Another Bed File ?

I have a file in the following BED format

Chr1 1022071 1022105  +
Chr1 1022071 1022105  +
Chr1 1022072 1022106  -
Chr1 1022072 1022106  -
Chr1 1022072 1022106  -
Chr1 1022072 1022106  -

I am trying get the counts of each feature represented in this file.

mergeBed -i R5_chr.bed -n -s -d 0 > Output/R5_chr_counts.bed

I am interested in the counts of the features and I do not want to merge features by any number of base pairs. Then the output should be as follows

Chr1 1022071 1022105 2 +
Chr1 1022072 1022106 4 +

Any suggestions on how to achieve this using bedtools or in bash or awk? Thanks in advance!

↧

How To Check Whole Genome With Bigwigsummary ?

March 30, 2012, 11:33 am

≫ Next: How To Count Genes In Genomic Regions Using A Gtf/Gff3 And A Bed File Of Regions

≪ Previous: Counting Features In A Bed File

Hi,

I have question about bigwigsummary tools ,

I have my start and end positions and my bigwig file but I want to check whole genome instead of chromosome by chromosome Is there any option to use this tool in that way ?

I know that for each chromosome I have to use :

bigWigSummary -type=X bigwigfile chrN start end datapoints

I want to check from chr1 to chrX.

Thanks in Advance.

↧

How To Count Genes In Genomic Regions Using A Gtf/Gff3 And A Bed File Of Regions

February 27, 2013, 11:13 am

≫ Next: Running BedTools on Linux Cluster: Permission Denied

≪ Previous: How To Check Whole Genome With Bigwigsummary ?

I'd like to count the number of unique genes in a gff file falling within a list of genomic regions. With bedtools I can count the number of regions within the gff which is almost what I want, but not quite.

bedtools intersect -a regions.bed -b my.gff -c

UPDATE:

I should have made my question a bit more specific. I have a modified ensemble style gtf file (not a gff) that has unique transcript IDs. This means that simply selecting unique fields in the 9th column of the gtf file actually counts transcript IDs.

To circumvent this problem I first truncated the gtf file:

cat my.gff | sed -e 's/;.*//' > delete.me.gtf

Then I ran the bedtools map command:

bedtools map -a regions.bed -b delete.me.gtf -c 9 -o count_distinct > counts.genes_in_windows.bed

I almost forgot to delete the intermediate file:

rm delete.me.gtf

There is probably a way to make this a oneliner, without the intermediate file, but I have a dissertation to write!

↧

Running BedTools on Linux Cluster: Permission Denied

August 3, 2014, 10:50 am

≫ Next: bedtools: extracting no coverage regions

≪ Previous: How To Count Genes In Genomic Regions Using A Gtf/Gff3 And A Bed File Of Regions

I been having some problems with running BedTools binaries in a linux cluster. I have the binaries in my own $HOME/bin file and when I try to run bedtools I get this error message

-bash: bedtools: Permission Denied

I followed the instructions here and still got the same error message.

Any clue what do to>

↧

bedtools: extracting no coverage regions

April 26, 2014, 10:32 am

≫ Next: How to get the rRNA ratio from a RNAseq dataset

≪ Previous: Running BedTools on Linux Cluster: Permission Denied

Hello,

I am not sure if this has been answered before as I looked and couldn't find a simple answer.

I have a bam file, and all I want is to annotated all regions with 0 coverage in bed format. Is that possible?

Thank you,

Adrian

↧

How to get the rRNA ratio from a RNAseq dataset

September 30, 2014, 9:25 am

≫ Next: compute normal-tumor coverage ratio from exome BAMs

≪ Previous: bedtools: extracting no coverage regions

Hello,

I want to know if there is any way using the bedtools and miRdeep2 output bed file to get the rRNA ratio in my miRNAseq fastq data. Thank you very much!

I have a gtf file, a genome.fa, a bed file from the miRdeep2. Thanks!

↧

compute normal-tumor coverage ratio from exome BAMs

July 2, 2014, 6:22 am

≫ Next: Given gene ID and genomic coordinates, how can I create a GFF formatted file?

≪ Previous: How to get the rRNA ratio from a RNAseq dataset

Could someone please suggest a quick way to compute the data ratio of uniquely mapped reads in
the normal to uniquely mapped reads in the tumor, as required by Varscan in the command below? I have over 50 exome BAMs.

(normal_unique_mapped_reads/tumor_unique_mapped_reads).

java -jar VarScan.jar copynumber normal-tumor.mpileup output.basename -min-coverage 10 --data-ratio [data_ratio] --min-segment-size 20 --max-segment-size 100

↧

Given gene ID and genomic coordinates, how can I create a GFF formatted file?

June 19, 2014, 2:39 pm

≫ Next: Intersectbed Tool Generating Empty File

≪ Previous: compute normal-tumor coverage ratio from exome BAMs

I have downloaded a list of coordinates of yeast genes from Xu et al., 2009 (see table S3). Unfortunately its current format is not a standard format so it does not appear to be compatible with the programs I would like to use i.e. HOMER, bedops or bedtools. I was wondering if anyone could help me get it into a gff format using unix or R (other languages are also welcome if the code is just copy and paste)? I tried to recreate what I saw at the ensembl website, but said programs were still not recognizing it as gff. Here is the beginning of the file (there are actually ~7K lines): ID chr strand start end type name commonName endConfidence source ST0001 1 + 9369 9601 SUTs SUT001 SUT001 bothEndsMapped Manual ST0002 1 + 30073 30905 CUTs CUT001 CUT001 bothEndsMapped Automatic ST0003 1 + 31153 32985 ORF-T YAL062W GDH3 bothEndsMapped Manual ST0004 1 + 33361 34897 ORF-T YAL061W BDH2 bothEndsMapped Manual ST0005 1 + 35097 36393 ORF-T YAL060W BDH1 bothEndsMapped Manual ST0006 1 + 36545 37329 ORF-T YAL059W ECM1 bothEndsMapped Manual ST0007 1 + 37409 39033 ORF-T YAL058W CNE1 bothEndsMapped Manual ST0008 1 ...

↧

Intersectbed Tool Generating Empty File

August 28, 2012, 10:13 am

≫ Next: macs and bedtools

≪ Previous: Given gene ID and genomic coordinates, how can I create a GFF formatted file?

I have used the Bedtools command intersectBed to check the overlap between two bed files. A is my INDEL file and B is my Reference file. But it is producing an empty output file. I thought the problem was that the file B is much larger than file A. But I tried changing the file order and it is still not creating any output.

Here is the reference B file (larger):

gff_seqname      0        1395    gene    0    +
gff_seqname      0        1395    exon    0    +
gff_seqname    1397    2498    gene    0    +
gff_seqname    1397    2498    exon    0    +
gff_seqname    2524    3619    gene    0    +

Here is my A file with just 51 INDELS:

NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    174708    174713    -GCCGG:2/6
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    1078686    1078686    +A:105/112
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    1229123    1229125    -CT:800/870
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    1234830    1234830    +AT:134/134
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    1234833    1234834    -A:134/134

here is my command:

intersectBed -a SOD_pal_BWA_GMM.PE.sorted.bam.sorted_cleaned_GMM.bam.sorted.hr.bam.raw.bed  -b sodalis_galaxy.bed  -wa -wb  >test13.bed

↧

macs and bedtools

July 4, 2014, 2:07 pm

≫ Next: Bedtools Genomecoveragebed Usage : How To Create A Genome File?

≪ Previous: Intersectbed Tool Generating Empty File

Hello

I have MACS2 output and now looking for peaks which are situated in introns. I have bed file with introns from USCS for my species. What file with peaks should I use for bedtools intersection? Peaks summit (.bed) or narrow peak (.bed), both from MACS2 output?

↧

Bedtools Genomecoveragebed Usage : How To Create A Genome File?

May 4, 2013, 11:07 pm

≫ Next: Extracting Genomic Coverage Information Across Different Samples

≪ Previous: macs and bedtools

I am using BEDTOOLS and the following command to get the coverage file:

$ ./genomeCoverageBed -ibam ~/GG_project/trim/ecoli.bam -g > ~/GG_project/trim/coverage

where ecoli.bam is my sorted bam file, and coverage is my output file

From where do I get the genome file? How do I create a genome file?? Specifically I would need a ecoli.genome file.

↧

Extracting Genomic Coverage Information Across Different Samples

March 21, 2014, 1:39 pm

≫ Next: Counting Number Of Bam Reads Directly Within Set Of Intervals With Bedtools

≪ Previous: Bedtools Genomecoveragebed Usage : How To Create A Genome File?

Hello, I have 3 bam files that i wanted to compare against each other. For example i have reference file with 10,000 sequences. I have paired end reads sequenced for 3 different samples. 1) Sample 1 is 100% same as reference so we expect all reads to map to it 2) Sample 2 is 80% similar to reference so 20% of reference sequences wont have any reads 3) Sample 3 is 60% similar to reference and 40% of reference wont have any reads. Now my goal is to identify what reference sequences doesnot have any reads mapped in Sample 2 and 3.I need to identify the 20% reference sequences from Sample 2 and 40% from Sample 3. Also in some cases in a reference which is approx 10kb long, sample 1 maps to entire 10kb, sample 2 maps to first 5kb and sample 3 maps to last 3kb. so i need to identify the partial regions for those reference sequences as well. I have the mapped sorted bam files for all these three samples. I am looking in to using bedtools but not sure what in bedtools will give the answer i needed. i have the following commands which might do similar but it ouputs differences at every base.

genomeCoverageBed -bg -ibam sample1.bam > sample1.bedgraph

genomeCoverageBed -bg -ibam sample2.bam > sample2.bedgraph

unionBedGraphs -header -i sample1.bedgraph sample2. ...

↧

Counting Number Of Bam Reads Directly Within Set Of Intervals With Bedtools

September 7, 2011, 1:04 am

≫ Next: Bed File Of Mapq Sliding Window On A Bam File?

≪ Previous: Extracting Genomic Coverage Information Across Different Samples

how can I count the number of BAM reads falling directly within a set of intervals, given in a GFF format? Note that I do not want reads overlapping the intervals, but ones that fall directly within them.

I tried the following:

intersectBed -abam reads.bam -b exons.gff -wb -f 1

this has redundancies, so I pipe it into coverageBed as follows:

intersectBed -abam reads.bam -b exons.gff -wb -f 1 | coverageBed -abam stdin -b exons.gff

Is this correct? Thanks.

↧

Bed File Of Mapq Sliding Window On A Bam File?

February 27, 2014, 2:01 am

≫ Next: Creating Bed File For Lncrna Using Gencode Gtf File

≪ Previous: Counting Number Of Bam Reads Directly Within Set Of Intervals With Bedtools

There may already be a recipe for this, so asking first before reinventing the wheel: I would like to create a bed file where the score is the average mapQ from the reads of the input.bam file. I think bedtools or bedops are the way to go:http://bedtools.readthedocs.org/en/latest/content/tools/bamtobed.html http://bedops.readthedocs.org/en/latest/content/reference/file-management/conversion/bam2bed.html Other than simply running bamtobed/bam2bed, I would like to be able to define a sliding window size and step for the windows, of say, size=1000 and step=200. I also would like to generate the bam2bed information only from a list of regions in regions.bed. E.g., something like:mapq_sliding_windows --bam input.bam --wsize 1000 -wstep 200 --regions regions.bed > mapq_sliding_windows.bed EDITED: Thank you Aaron for you answer. I got it working but it's slow for my 30x WGS bams:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e "select chrom, size from hg19.chromInfo" > hg19.genome
bedtools makewindows -g hg19.genome -w 1000 -s 200 > hg19.windows.bed
bedtools map -a hg19.windows.bed -b <(bedtools bamtobed -i input.bam | grep -v chrM) -c 5 -o mean &gt ...

↧

Creating Bed File For Lncrna Using Gencode Gtf File

May 12, 2013, 9:29 am

≫ Next: Simple Redirection, I/O Problem With Bedtools

≪ Previous: Bed File Of Mapq Sliding Window On A Bam File?

Hi all,

I want to get the bed file of lncRNA based on GENCODE GTF file

I download the file "gencode.v16.long_noncoding_RNAs.gtf.gz", and extract the chr, start, end info from the file, then I use mergeBed to merge those overlapped lncRNA, am I correct? Since I know we can merge the exon genomic position using this kind of method

While for lncRNA I am not so sure, and is there any place already offering such kind of bed files?

actually, we should got 22444 Long non-coding RNA loci transcripts, however only 11817 genomic regions after merging process.

Anyone knows the answer, could you give me some help?

↧

Simple Redirection, I/O Problem With Bedtools

January 24, 2013, 7:41 am

≫ Next: Extract Only Paired-End Reads That Map A Specific Interval

≪ Previous: Creating Bed File For Lncrna Using Gencode Gtf File

Hi Guys, Just a quick question. Its more of a Bash question rather than Bioinformatics, with Bedtools in question.

I mostly pipe the bedtools I/O. Here's a general scenario :

sed 1d fileA.bed | intersectBed -a stdin -b peaks.bed | intersectBed -u -a stdin -b fileB.bed

Now, the problem is fileB is also having a head, which is reported as an error by intersectBed (makes sense, non-integer start).

How can I remove the first line or the head of the fileB on the fly in the pipe.

Thanks

↧

Extract Only Paired-End Reads That Map A Specific Interval

August 31, 2012, 1:23 am

≫ Next: error with bedtools slop

≪ Previous: Simple Redirection, I/O Problem With Bedtools

Hi,

Is it possible to extract paired-end reads that map to a specific interval ( from a bam file ). I tried with intersectBed :

intersectBed -abam align.bam -b interval.gff3 -wa > result.bam

here's the result :

enter image description here

But I only want reads that map to the feature in bold blue (one of the paired reads is enough). For example, I don't want the reads that map either side of this feature (red arrow).

Is it possible with intersectbed or an other program ?

Thanks,

↧