Quantcast
Channel: Post Feed
Viewing all articles
Browse latest Browse all 3764

Which of the genes are enriched with repeat elements

$
0
0

I would like to know which of my genes are enriched with repeats of LINE/SINE/ERV etc. elements.

I have a bam file and the repeats in bed format.

As far as I know BAM files contains aligned data for each short read sequence from the fastq file. I am trying to figure out what is the best approach to know which genes (+- 1000 bp) have more repeats elements.

I am thinking about two approaches to implement but not sure which one is the best. here are the approaches i was thinking to use

a) Shall I convert the bam file into bed file and then use bedtools merge. So that I can overlap with the repeats file using bedtools window -c -l -r option. And I know how many of the repeats are overlapping or near by the short reads. Then count this number for each gene.

For example,

chr   start  end gene number_of_repeats
chr1 100  200  gene1 70
chr1 190  240  gene1 40
chr1 250  400  gene1 100
chr2 500  600  gene2 150

if i sort and merge them i will get

chr1 100  240  gene1 90
chr1 250  400  gene1 100
chr2 500  600  gene2 150

So gene1 will have 190 (90 + 100) and gene 2 will have 150 number of repeats.

Or

b) shall I count the number of repeats which for each short sequence without any merging? so i will also get some insight into the gene counts vs .number of repeats?

For example using the same example above, i will get

for gene1 210 (70 + 40 +100) and for gene2 150 number of repeats.

Or

Am i on the completely wrong track and should think a better solution?


Viewing all articles
Browse latest Browse all 3764

Trending Articles