Hi, This is my very first experience analysing RNAseq data. My goal is to do differential analysis between two strains of a bacteria. So far, i managed to align and produce SAM and BAM files. I'm having problems to annotate and count my reads. Here are the commands that I used. My reads are from SOLID and hence in colourspace
$ nohup solid2fastq.pl 291_01_01 291_01_01-bwa #Convert .csfasta and .qual to .fastq
$ nohup bwa index -c TbruceiTreu927Genomic_TriTrypDB-4.0.fasta
$ nohup bwa aln -c TbruceiTreu927Genomic_TriTrypDB-4.0.fasta 291_01_01-bwa.singleF3.fastq 291_01_01-bwa.sai
$ perl -ne 'if($_ !~ m/^\S+?\t4\t/){print $_}' 291_01_01-bwa.sam > 291_01_01-bwa.sam.filtered #Convert to SAM file
$ samtools sort 291_01_01-bwa.bam 291_01_01-bwa.bam.sorted
$ samtools index 291_01_01-bwa.bam.sorted.bam
to produce .rpkm file
$ java -jar ~/bin/bam2rpkm-0.06/bam2rpkm-0.06.jar -i 291_01_01-bwa.bam.sorted.bam -f Tbrucei427_TriTrypDB-4.0.gff > 291_01_01-bwa.RPKM2.out # i get an error here
$ERROR: Problem encountered whilst reading gtf file. Could not interpret line 'GeneDB|Tb427_01_v4 EuPathDB supercontig 1
so i tried different method to count
$ htseq-count -i ID 291_01_01-bwa.sam Tbrucei427_TriTrypDB-4.0.gff > 291_01_01-bwa.sam_htseq-count #still error
$Error occured when processing GFF file (line 37060 of file Tbrucei427_TriTrypDB-4.0.gff):
need more than 1 value to unpack
and different method
$ python make_bed_from_fasta.py ~/Downloads/reference/TbruceiTreu927AnnotatedCDS_TriTrypDB-4.0.fasta > 927_reference.bed #this python script converts .fasta into .bed file since the .gff file cannot be processed
$multiBamCov -q 30 -p -bams 291_01_01-bwa.bam.sorted.bam -bed 927_reference.bed > test_counts.txt
now I only get 0 counts for all genes. Does this mean that there is something wrong with my alignment files or something wrong with the counting method . And it seems like my .gff (version 3) file was unable to be read by htseq-count and also the java script . I downloaded the gff file from GeneDB and it seems like in many tutorials .gtf files are used instead. So I'm stuck at counting the read part and I really need some help . Help please .