All practicing bioinformaticians will face problems that require them to compare, query and select genomic features across an entire genome. As it happens efficient interval representation and query is a surprisingly challenging problem that needs a specialized representation.
The BEDTools suite contains a set of programs that support a broad range of interval analyses that involve selecting certain locations in the genome. The name reflects the original intent to process BED files but the tools operate just as well on GFF formats. The scripts need to be run in command line format and are available for UNIX type systems: Linux, Mac OSX, and Cygwin (on Windows).
The link to the site is: http://code.google.com/p/bedtools/
With BEDTools one can answer questions such as:
- how many reads map upstream/downstream of one or more locations in the genome?
- how many reads cover a certain base in the genome?
- which sections of the genome are not overlapping with target intervals?
- what are the sequences specified by the coordinates?
- ...