So, I have a data file containing several hundred variants in the following format:
CHR # START POS END POS VARIANT ID
1 100 1000 rs1
1 1200 1400 rs2
I ran the latter through Annovar to get the gene each variant was in (or its nearest gene and its distance) as well as the region (intronic, exonic, etc) each variant was in. The output had the following columns
GENE/NEARESTGENE REGION CHR# STARTPOS ENDPOS REFALLELE ALTALLELE VARIANTID
SOMEGENE exonic 1 100 1000 A G rs1
SOMEGENE2 intergenic 1 1200 1400 G T rs2
I moved some columns around to make a file with the following columns, lets call this file.txt
CHR# STARTPOS ENDPOS GENE/NEARESTGENE REGION VARIANTID
1 100 1000 SOMEGENE exonic rs1
1 1200 1400 SOMEGENE2 intergenic rs2
Now, I have several database files - something around 10 - of promoters, TSS, enhancers, etc, all of which in .bed format looking like the following -> lets call these database1.txt ... database10.txt
database1.bed
CHR # STARTPOS ENDPOS LABEL--FOR-THE-REGION/NAME
1 500 600 ...
↧