Quantcast
Channel: Post Feed
Viewing all 3764 articles
Browse latest View live

Problem With Counting Mapped Reads

$
0
0
Hi, This is my very first experience analysing RNAseq data. My goal is to do differential analysis between two strains of a bacteria. So far, i managed to align and produce SAM and BAM files. I'm having problems to annotate and count my reads. Here are the commands that I used. My reads are from SOLID and hence in colourspace$ nohup solid2fastq.pl 291_01_01 291_01_01-bwa #Convert .csfasta and .qual to .fastq $ nohup bwa index -c TbruceiTreu927Genomic_TriTrypDB-4.0.fasta $ nohup bwa aln -c TbruceiTreu927Genomic_TriTrypDB-4.0.fasta 291_01_01-bwa.singleF3.fastq 291_01_01-bwa.sai $ perl -ne 'if($_ !~ m/^\S+?\t4\t/){print $_}' 291_01_01-bwa.sam > 291_01_01-bwa.sam.filtered #Convert to SAM file $ samtools sort 291_01_01-bwa.bam 291_01_01-bwa.bam.sorted $ samtools index 291_01_01-bwa.bam.sorted.bam to produce .rpkm file $ java -jar ~/bin/bam2rpkm-0.06/bam2rpkm-0.06.jar -i 291_01_01-bwa.bam.sorted.bam -f Tbrucei427_TriTrypDB-4.0.gff > 291_01_01-bwa.RPKM2.out # i get an error here $ERROR: Problem encountered whilst reading gtf file. Could not interpret line 'GeneDB|Tb427_01_v4 EuPathDB supercontig 1 so i tried different method to count $ htseq-count -i ID 291_01_01-bwa.sam Tbrucei427_TriTrypDB-4.0.gff > 291_01_01-bwa.sam_htseq-count #still error $Error occured when processing GFF file (line 37060 of file Tbrucei427_Tr ...

Getting Unmapped Reads: Comparing Fastq To Bam

$
0
0

given a FASTQ file and a BAM file of aligned reads, is there an efficient way to get all FASTQ reads that are in the original FASTQ but not in the BAM? Perhaps using bedtools. i.e.:

unmapped_script original.fastq aligned.bam > unmapped.fastq

should create an unmapped.fastq file, which is a subset of original.fastq containing only those entries that do not appear in aligned.bam

thank you.

How Can I Include One Bed File In Another Bed File ?

$
0
0

Hello, I have 2 bedfiles that share some common features let's call the first file A.bed (bigger file) and the second B.bed (smaller file). I would like to have a new bed file that includes everything in B.bed in the A.bed file. I don't need the intersect, I more like need the merge option I checked bedtools's manual... couldn't find an answer for merging 2 bedfiles. Can someone help?

Thanks in advance

Counting Features In A Bed File

$
0
0

I have a file in the following BED format

Chr1 1022071 1022105  +
Chr1 1022071 1022105  +
Chr1 1022072 1022106  -
Chr1 1022072 1022106  -
Chr1 1022072 1022106  -
Chr1 1022072 1022106  -

I am trying get the counts of each feature represented in this file.

mergeBed -i R5_chr.bed -n -s -d 0 > Output/R5_chr_counts.bed

I am interested in the counts of the features and I do not want to merge features by any number of base pairs. Then the output should be as follows

Chr1 1022071 1022105 2 +
Chr1 1022072 1022106 4 +

Any suggestions on how to achieve this using bedtools or in bash or awk? Thanks in advance!

How To Check Whole Genome With Bigwigsummary ?

$
0
0

Hi,

I have question about bigwigsummary tools ,

I have my start and end positions and my bigwig file but I want to check whole genome instead of chromosome by chromosome Is there any option to use this tool in that way ?

I know that for each chromosome I have to use :

bigWigSummary -type=X bigwigfile chrN start end datapoints

I want to check from chr1 to chrX.

Thanks in Advance.

How To Count Genes In Genomic Regions Using A Gtf/Gff3 And A Bed File Of Regions

$
0
0

I'd like to count the number of unique genes in a gff file falling within a list of genomic regions. With bedtools I can count the number of regions within the gff which is almost what I want, but not quite.

bedtools intersect -a regions.bed -b my.gff -c

UPDATE:

I should have made my question a bit more specific. I have a modified ensemble style gtf file (not a gff) that has unique transcript IDs. This means that simply selecting unique fields in the 9th column of the gtf file actually counts transcript IDs.

To circumvent this problem I first truncated the gtf file:

cat my.gff | sed -e 's/;.*//' > delete.me.gtf

Then I ran the bedtools map command:

bedtools map -a regions.bed -b delete.me.gtf -c 9 -o count_distinct > counts.genes_in_windows.bed

I almost forgot to delete the intermediate file:

rm delete.me.gtf

There is probably a way to make this a oneliner, without the intermediate file, but I have a dissertation to write!

Running BedTools on Linux Cluster: Permission Denied

$
0
0

I been having some problems with running BedTools binaries in a linux cluster. I have the binaries in my own $HOME/bin file and when I try to run bedtools I get this error message

 

-bash: bedtools: Permission Denied 

I followed the instructions here and still got the same error message.

 

Any clue what do to> 

bedtools: extracting no coverage regions

$
0
0

Hello,

I am not sure if this has been answered before as I looked and couldn't find a simple answer.

I have a bam file, and all I want is to annotated all regions with 0 coverage in bed format. Is that possible?

Thank you,

Adrian

 


How to get the rRNA ratio from a RNAseq dataset

$
0
0

Hello,

 

I want to know if there is any way using the bedtools and miRdeep2 output bed file to get the rRNA ratio in my miRNAseq fastq data. Thank you very much!

 

I have a gtf file, a genome.fa, a bed file from the miRdeep2. Thanks!

compute normal-tumor coverage ratio from exome BAMs

$
0
0

Could someone please suggest a quick way to compute the data ratio of uniquely mapped reads in
the normal to uniquely mapped reads in the tumor, as required by Varscan in the command below? I have over 50 exome BAMs.

(normal_unique_mapped_reads/tumor_unique_mapped_reads).

java -jar VarScan.jar copynumber normal-tumor.mpileup
output.basename -min-coverage 10 --data-ratio
[data_ratio] --min-segment-size 20
--max-segment-size 100

Given gene ID and genomic coordinates, how can I create a GFF formatted file?

$
0
0
  I have downloaded a list of coordinates of yeast genes from Xu et al., 2009 (see table S3). Unfortunately its current format is not a standard format so it does not appear to be compatible with the programs I would like to use i.e. HOMER, bedops or bedtools. I was wondering if anyone could help me get it into a gff format using unix or R (other languages are also welcome if the code is just copy and paste)? I tried to recreate what I saw at the ensembl website, but said programs were still not recognizing it as gff.  Here is the beginning of the file (there are actually ~7K lines):   ID    chr    strand    start    end    type    name    commonName    endConfidence    source ST0001    1    +    9369    9601    SUTs    SUT001    SUT001    bothEndsMapped    Manual ST0002    1    +    30073    30905    CUTs    CUT001    CUT001    bothEndsMapped    Automatic ST0003    1    +    31153    32985    ORF-T    YAL062W    GDH3    bothEndsMapped    Manual ST0004    1    +    33361    34897    ORF-T    YAL061W    BDH2    bothEndsMapped    Manual ST0005    1    +    35097    36393    ORF-T    YAL060W    BDH1    bothEndsMapped    Manual ST0006    1    +    36545    37329    ORF-T    YAL059W    ECM1    bothEndsMapped    Manual ST0007    1    +    37409    39033    ORF-T    YAL058W    CNE1    bothEndsMapped    Manual ST0008    1   ...

Intersectbed Tool Generating Empty File

$
0
0

I have used the Bedtools command intersectBed to check the overlap between two bed files. A is my INDEL file and B is my Reference file. But it is producing an empty output file. I thought the problem was that the file B is much larger than file A. But I tried changing the file order and it is still not creating any output.

Here is the reference B file (larger):

gff_seqname      0        1395    gene    0    +
gff_seqname      0        1395    exon    0    +
gff_seqname    1397    2498    gene    0    +
gff_seqname    1397    2498    exon    0    +
gff_seqname    2524    3619    gene    0    +

Here is my A file with just 51 INDELS:

NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    174708    174713    -GCCGG:2/6
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    1078686    1078686    +A:105/112
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    1229123    1229125    -CT:800/870
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    1234830    1234830    +AT:134/134
NC_0077121_SODALIS_GLOSSINIDIUS_STR_MORSITANS_CHROMOSOME    1234833    1234834    -A:134/134

here is my command:

intersectBed -a SOD_pal_BWA_GMM.PE.sorted.bam.sorted_cleaned_GMM.bam.sorted.hr.bam.raw.bed  -b sodalis_galaxy.bed  -wa -wb  >test13.bed

macs and bedtools

$
0
0

Hello

I have MACS2 output and now looking for peaks which are situated in introns. I have bed file with introns from USCS for my species. What file with peaks should I use for bedtools intersection? Peaks summit (.bed) or narrow peak (.bed), both from MACS2 output?

Bedtools Genomecoveragebed Usage : How To Create A Genome File?

$
0
0

I am using BEDTOOLS and the following command to get the coverage file:

$ ./genomeCoverageBed -ibam ~/GG_project/trim/ecoli.bam -g > ~/GG_project/trim/coverage

where ecoli.bam is my sorted bam file, and coverage is my output file

From where do I get the genome file? How do I create a genome file?? Specifically I would need a ecoli.genome file.

Extracting Genomic Coverage Information Across Different Samples

$
0
0
Hello, I have 3 bam files that i wanted to compare against each other. For example i have reference file with 10,000 sequences. I have paired end reads sequenced for 3 different samples. 1) Sample 1 is 100% same as reference so we expect all reads to map to it 2) Sample 2 is 80% similar to reference so 20% of reference sequences wont have any reads 3) Sample 3 is 60% similar to reference and 40% of reference wont have any reads. Now my goal is to identify what reference sequences doesnot have any reads mapped in Sample 2 and 3.I need to identify the 20% reference sequences from Sample 2 and 40% from Sample 3. Also in some cases in a reference which is approx 10kb long, sample 1 maps to entire 10kb, sample 2 maps to first 5kb and sample 3 maps to last 3kb. so i need to identify the partial regions for those reference sequences as well. I have the mapped sorted bam files for all these three samples. I am looking in to using bedtools but not sure what in bedtools will give the answer i needed. i have the following commands which might do similar but it ouputs differences at every base. genomeCoverageBed -bg -ibam sample1.bam > sample1.bedgraph genomeCoverageBed -bg -ibam sample2.bam > sample2.bedgraph unionBedGraphs -header -i sample1.bedgraph sample2. ...

Counting Number Of Bam Reads Directly Within Set Of Intervals With Bedtools

$
0
0

how can I count the number of BAM reads falling directly within a set of intervals, given in a GFF format? Note that I do not want reads overlapping the intervals, but ones that fall directly within them.

I tried the following:

intersectBed -abam reads.bam -b exons.gff -wb -f 1

this has redundancies, so I pipe it into coverageBed as follows:

intersectBed -abam reads.bam -b exons.gff -wb -f 1 | coverageBed -abam stdin -b exons.gff

Is this correct? Thanks.

Bed File Of Mapq Sliding Window On A Bam File?

$
0
0
There may already be a recipe for this, so asking first before reinventing the wheel: I would like to create a bed file where the score is the average mapQ from the reads of the input.bam file. I think bedtools or bedops are the way to go:http://bedtools.readthedocs.org/en/latest/content/tools/bamtobed.htmlhttp://bedops.readthedocs.org/en/latest/content/reference/file-management/conversion/bam2bed.html Other than simply running bamtobed/bam2bed, I would like to be able to define a sliding window size and step for the windows, of say, size=1000 and step=200. I also would like to generate the bam2bed information only from a list of regions in regions.bed. E.g., something like:mapq_sliding_windows --bam input.bam --wsize 1000 -wstep 200 --regions regions.bed > mapq_sliding_windows.bed EDITED: Thank you Aaron for you answer. I got it working but it's slow for my 30x WGS bams: mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e "select chrom, size from hg19.chromInfo" > hg19.genome bedtools makewindows -g hg19.genome -w 1000 -s 200 > hg19.windows.bed bedtools map -a hg19.windows.bed -b <(bedtools bamtobed -i input.bam | grep -v chrM) -c 5 -o mean &gt ...

Creating Bed File For Lncrna Using Gencode Gtf File

$
0
0

Hi all,

I want to get the bed file of lncRNA based on GENCODE GTF file

I download the file "gencode.v16.long_noncoding_RNAs.gtf.gz", and extract the chr, start, end info from the file, then I use mergeBed to merge those overlapped lncRNA, am I correct? Since I know we can merge the exon genomic position using this kind of method

While for lncRNA I am not so sure, and is there any place already offering such kind of bed files?

actually, we should got 22444 Long non-coding RNA loci transcripts, however only 11817 genomic regions after merging process.

Anyone knows the answer, could you give me some help?

Simple Redirection, I/O Problem With Bedtools

$
0
0

Hi Guys, Just a quick question. Its more of a Bash question rather than Bioinformatics, with Bedtools in question.

I mostly pipe the bedtools I/O. Here's a general scenario :

sed 1d fileA.bed | intersectBed -a stdin -b peaks.bed | intersectBed -u -a stdin -b fileB.bed

Now, the problem is fileB is also having a head, which is reported as an error by intersectBed (makes sense, non-integer start).

How can I remove the first line or the head of the fileB on the fly in the pipe.

Thanks

Extract Only Paired-End Reads That Map A Specific Interval

$
0
0

Hi,

Is it possible to extract paired-end reads that map to a specific interval ( from a bam file ). I tried with intersectBed :

intersectBed -abam align.bam -b interval.gff3 -wa > result.bam

here's the result :

enter image description here

But I only want reads that map to the feature in bold blue (one of the paired reads is enough). For example, I don't want the reads that map either side of this feature (red arrow).

Is it possible with intersectbed or an other program ?

Thanks,

N.

Viewing all 3764 articles
Browse latest View live