I wanted to use BEDTools to extract genomic sequences (fastaFromBed).
My BED file has all 24 chromosomes, hence I want to use whole genome (merged from chromosome.fa).
Tried to: fastaFromBed -fi genome.fa -bed all.chromosomes.bed -fo output
but got Segmentation fault (core dumped)
Tried to use every chromosome.fa separately and it worked: fastaFromBed -fi chromosome${i}.fa -bed all.chromosomes.bed -fo output
Of course I am getting annoying WARNING. chromosome (chr..) was not found in the FASTA file. Skipping.
But it's still better than nothing and really fast.
I prefer to use BEDTools for sequence extraction so I am wondering is it possible to solve this segmentation fault thing? It seems that large genome.fa file can't be handled by BEDTools as I also tried nucBed and got the same thing or it might be some genome merging problem.
EDITED
This is the bed file I used for: intersectBed; closestBed; fastaFromBed ([www.box.com][1]).
There were problems only with fastaFromBed and only when I tried to use the whole genome.fa (~3.15GB). As I mentioned before - used every chromosome separately, got warnings but there was no segmentation fault and output was fine.
I am wandering that it might be genome.fa problem (used cat
to merge chromosomes)
EDITED#2
head genome.fa
>chr1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
cat genome.fa.fai
chr1 249250621 6 50 51
chr2 243199373 254235646 50 51
chr3 198022430 502299013 50 51
chr4 191154276 704281898 50 51
chr5 180915260 899259266 50 51
chr6 171115067 1083792838 50 51
chr7 159138663 1258330213 50 51
chr8 146364022 1420651656 50 51
chr9 141213431 1569942965 50 51
chr10 135534747 1713980672 50 51
chr11 135006516 1852226121 50 51
chr12 133851895 1989932775 50 51
chr13 115169878 2126461715 50 51
chr14 107349540 2243934998 50 51
chr15 102531392 2353431536 50 51
chr16 90354753 2458013563 50 51
chr17 81195210 2550175419 50 51
chr18 78077248 2632994541 50 51
chr19 59128983 2712633341 50 51
chr20 63025520 2772944911 50 51
chr21 48129895 2837230949 50 51
chr22 51304566 2886323449 50 51
chrX 155270560 2938654113 50 51
chrY 59373566 3097030091 50 51
genome.fa.fai was generated by BEDTools index file genome.fa.fai not found, generating...
And just after it's generated I am getting segmentation fault. If BEDTools scans the genome and generates index file maybe it's not the genome problem.