Quantcast
Channel: Post Feed
Viewing all articles
Browse latest Browse all 3764

Merging/Intersecting Different Gene Annotations - Should I Extend Coordinates?

$
0
0

I want to create gene data-set (as big as possible), hence I am using several gene annotations. However, genes in different annotations overlap (it's the same gene). For reducing biases I overlap different annotations and if genes overlap leave only one gene.

Question:

To ensure this overlap I was thinking to expand gene coordinates - is this necessary? If so, how big extension should be (5bp/100bp)?

Example:

Want to create lncRNA data-set (in the following steps it will be used to search for genomic features).
Input:

  1. GENCODE lncRNA annotation (version 18 - 04/09/2013);
  2. Cabili lncRNA annotation (Cabili et al., 2011 (CSHLP)).

Workflow:

  1. Extract GENCODE genes start/end coordinates;
  2. Extract Cabili genes start/end coordinates;
  3. Extend Cabili coordinates ( -/+ nbp );
  4. Use BedTools intersect;
  5. If genes intersect leave GENCODE gene (as it's a newer annotation (though this step is really subjective)).

I do realize that this extension question depends on the situation and how reliable annotation is, but still hope that someone could suggest something.


Viewing all articles
Browse latest Browse all 3764

Trending Articles