I want to do permutation test: randomly reposit (shuffle) given genomic intervals and measure intersection between new coordinates and specific genomic element.
Example:
- Different sets of genes: protein coding, pseudogenes, ncRNA - intervals that I want to shuffle;
Genomic repeat L1 - coordinates are stable. - For every gene set shuffle intervals, intersect and measure the overlap with L1 (I am using bedtools shuffle - "reposition each feature in the input BED file on a random chromosome at a random position").
Question - Which genomic regions to exclude from the "genome" (bedtools shuffle -g
option) before shuffling gene intervals?
I was going to exclude gaps in the assembly.
But what about:
- All gene regions.
If I am shuffling pseudogene intervals should I exclude protein coding and ncRNA coordinates? - All non L1 Repeat masker coordinates.
As alu, LTR and DNA transposons aren't L1 so their won't be any intersection with them?