Hello,
I have a tab delimited format Splice Junction file and the file looks something like this:
chr1 11212 12009 1 1 0 0 2 48
chr1 11672 12009 1 1 0 0 1 31
chr1 11845 12009 1 1 0 0 1 28
chr1 12228 12612 1 1 1 0 1 32
chr1 12722 13220 1 1 1 0 3 9
chr1 14830 14969 2 2 1 0 218 50
chr1 15039 15795 2 2 1 0 98 50
chr1 15948 16606 2 2 1 1 10 48
chr1 16766 16857 2 2 1 0 24 44
chr1 16766 16875 2 2 0 0 2 36
The task is to filter out lines in which Column 6 has value 1, Column 7 has value 1 and Column 8 has value 10 or greater.
I have been going through the bedtools documentation but I am not quite sure on how to get started, I would appreciate a few pointers on how to get going. My input file is going to be in the tab delimited format and I also have the Gencode V.19 GTF file for annotation.
Thanks!
*** Edit ***
Column 1: chromosome
Column 2: first base of the intron (1-based)
Column 3: last base of the intron (1-based)
Column 4: strand
Column 5: intron motif: 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT
Column 6: 0: unannotated, 1: annotated (only if splice junctions database is used)
Column 7: number of uniquely mapping reads crossing the junction
Column 8: number of multi-mapping reads crossing th ...
↧