Quantcast
Channel: Post Feed
Viewing all 3764 articles
Browse latest View live

Fastafrombed Problem

$
0
0

hi,

I try this tools from BedTools but it doesnt work!

$ cat testgenome404.fa

>chr1
AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG

$ cat test.bed
chr1    5       10

$ ./fastaFromBed -fi testgenome404.fa -bed test.bed  -fo test.fa.out

**index file testgenome404.fa.fai not found, generating...

unable to find FASTA index entry for 'chr1'**

$ cat testgenome404.fa.fai
chr1    46      7       46      47

what is this file "testgenome404.fa.fai" what does means this number? chr1 46 7 46 47

why this message?

unable to find FASTA index entry for 'chr1'

Thanks in advance for any help Sara


Problems Extracting Non-Snps From A Vcf File

$
0
0

Hello,

In an SNP analysis, I am trying to extract those editing sites no found in the dbSNPs vcf file I have downloaded a couple of files (All SNPs and Common/Medical SNPs) from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF.

Following this, I have compared my VarScan *.vcf outputs with the SNP.vcf ones using 3 different approaches:

VarScan compare input.vcf SNP.vcf unique1 input-SNPvcf

bedtools intersect -v -a input.vcf -b SNP.vcf > input-SNP.vcf

bedops --not-element-of -1  input-sorted.bed SNP-sorted.bed > inputs-sorted-SNP.bed

In all 3 cases, the SNP-output is identical to the input.vcf/bed.

These command-lines however work when I use an alu.bed or a repeat-masker-bed.

Is it just that my analysis contains no known SNPs? I have discarded for obvious reasons.

Can somebody point a the reason/solution to this problem?

Thanks, G.

Using Gnu Parallel For Bedtools

$
0
0

I am trying to run gnu:parallel on bedtools multicov function where the original command is

bedtools multicov -bams bam1 bam2 bam3.. -bed anon.bed  > Q1_Counst.bed

I would like to implement the above command using gnu parallel. But when I run the command below

parallel -j 25 "bedtools multicov -bams {1} -bed {2} > Q1_Counst.bed" ::: minus_1_common_sorted_q1.bam minus_2_common_sorted_q1.bam minus_3_common_sorted_q1.bam plus_1_common_sorted_q1.bam plus_2_common_sorted_q1.bam plus_3_common_sorted_q1.bam ::: '/genome/genes_exon_2.bed'

each bam file is taken as separate argument , hence the processes starting are like

bedtools multicov -bams  bam1 -bed anon.bed  > Q1_Counst.bed
bedtools multicov -bams  bam2 -bed anon.bed  > Q1_Counst.bed
bedtools multicov -bams  bam3 -bed anon.bed  > Q1_Counst.bed

instead of taking all files as separate arguments. Hence Q1_Counst.bed is overwritten randomly. Could any one help me in getting exact command ? My server has around 30 cores.

Bedtools Multicov Need A Bam Index File Specification Option

$
0
0

bedtools version 2.16.2 multicov used to compute the multiple sample coverage given a feature file(gtf bed).

format: bedtools multicov -bams alin1.bam aln2.bam .. -bed capturRegion.bed >out.coverage

official doc has mentioned that input bam files should be sorted and indexed, but it does not mention the details. suppose the bam file name is: sample1.bam, then the index file should be named: sample1.bam.bai(not sample1.bai) ,otherwise multicov will report an error: indexes not found.

I think it would be better to add an option which will allow the user to specify the bam index files or the suffix used for these index files.

Simple Redirection, I/O Problem With Bedtools

$
0
0

Hi Guys, Just a quick question. Its more of a Bash question rather than Bioinformatics, with Bedtools in question.

I mostly pipe the bedtools I/O. Here's a general scenario :

sed 1d fileA.bed | intersectBed -a stdin -b peaks.bed | intersectBed -u -a stdin -b fileB.bed

Now, the problem is fileB is also having a head, which is reported as an error by intersectBed (makes sense, non-integer start).

How can I remove the first line or the head of the fileB on the fly in the pipe.

Thanks

Filtering Bed Files By Using Bedops

$
0
0

hello every one,

I have paired end illumina reads, R1.fastq and R2.fastq and I have mapped them as single-end reads by using bowtie2 default parameters, I performed further downstream analysis by using samtools and bedops, and now I have R1.bed and R2. bed I made two sets, one of them have R1_uniquely_mapped.bed, R2_uniquely_mapped.bed and second of them R1_mapped_more_than_1.bed , R2_mapped_more_than_1.bed.

because R1 and R2 belongs paired end reads, and my restriction library has maximum 2KB size, then R1 and R2 pairs must be present in less than 2 kb territory of chromosome

theoretically I am assuming, in R1.bed format,

chr1  100   180    @R1_read1______1 .................
chr1   1000  1090 @R1_read2______1................

In R2.bed format,

chr1 2100   2180 @R2_read1_____2............. ## I just add 2KB length with respect to R1.bed###
chr1 2500 2590    @R2_read______2......... ## I just add 1.5KB [1500nts] with respect to R1.bed, because my library is >= 2KB.

How can I customize downstream tools like BEDOPS or bedtool which can capture such type of reads or alignment????? How can I filter this type of infromation by using bedops tool????

all suggestions and comments are most welcome,

Annotating Genomic Intervals

$
0
0

How can I annotate human genomic intervals (BED file) from a ChIP-seq experiment with information such as whether the interval overlaps with a gene(s)? Upstream of a gene? Overlaps with an exon? Intron? 5kb upstream/downstream of TSS? Intergenic? Does it overlap with a DNAse I hypersensitive site?

Surely bedtools can help me with this, but I'm looking for the best workflow / data sources to use for this that will require the least amount of scripting.

Thanks.

Changing Column Order In Bed File

$
0
0

Here is my data with A, B, C and D columns in my bed file.

   A.     B.     C.     D.
  Chr 1.  1.    12.     +
  Chr 2.  24.   56.     +

How can I move my D column to position 1 where the Column A right now?


Intersect Gene Annotation With Specific Position Or Genomic Interval

$
0
0

Hi,

I've several genomic interval and I want to check if they are overlapping with known gene. I've a gtf file with the coordinates of gene exons. My idea was to use intersectBed from bedtools but I've a little problem with small genomic interval that are are overlapping intron coordinates and not exons ( it do ot report me the gene where this interval is). Is it possible to specifiy to intersectBed to take into account introns ? or is there an another tool ?

Thanks

N.

Getting All Reads That Align To A Region In Compact Bed Format Using Bedtools?

$
0
0

I'm trying to find all the reads (by name) from a BAM file that align to various regions in a bed file. Right now I can do this with bedtools using intersectBed:

intersectBed -abam reads.bam -wo -f 1 -b regions.bed -bed

From this one can parse all the read ids that land in every interval in regions.bed, but it's not very compact. Is there a way to get bedtools to natively transform this into a more compact format, e.g.

chr1 x y .... read_id1,read_id2,read_id3

where chr1 x y is a given interval in regions.bed and the comma separated read_id1,... is the list of read ids from reads.bam that fall in that interval. In this compact format, the output BED file would have at most as many entries as there are regions in regions.bed, whereas with the -wo option it can be even larger than the number of reads in reads.bam. Thanks.

Getting Unmapped Reads: Comparing Fastq To Bam

$
0
0

given a FASTQ file and a BAM file of aligned reads, is there an efficient way to get all FASTQ reads that are in the original FASTQ but not in the BAM? Perhaps using bedtools. i.e.:

unmapped_script original.fastq aligned.bam > unmapped.fastq

should create an unmapped.fastq file, which is a subset of original.fastq containing only those entries that do not appear in aligned.bam

thank you.

Counting Number Of Bam Reads Directly Within Set Of Intervals With Bedtools

$
0
0

how can I count the number of BAM reads falling directly within a set of intervals, given in a GFF format? Note that I do not want reads overlapping the intervals, but ones that fall directly within them.

I tried the following:

intersectBed -abam reads.bam -b exons.gff -wb -f 1

this has redundancies, so I pipe it into coverageBed as follows:

intersectBed -abam reads.bam -b exons.gff -wb -f 1 | coverageBed -abam stdin -b exons.gff

Is this correct? Thanks.

How To Get Fasta Format Using Fastafrombed Or How To Turn Linearized Fasta To The Same Length Columns

$
0
0

I extracted sequences with fastaFromBed and have no complains about the BEDTools which is really awesome thing.

Otherwise extracted sequences look like this:

>chr19:13985513-13985622
GGAAAATTTGCCAAGGGTTTGGGGGAACATTCAACCTGTCGGTGAGTTTGGGCAGCTCAGGCAAACCATCGACCGTTGAGTGGACCCTGAGGCCTGGAATTGCCATCCT>chr19:13985689-13985825
TCCCCTCCCCTAGGCCACAGCCGAGGTCACAATCAACATTCATTGTTGTCGGTGGGTTGTGAGGACTGAGGCCAGACCCACCGGGGGATGAATGTCACTGTGGCTGGGCCAGACACG

And my input file looks like this:

>chr19
agtcccagctactcgggaggctaaggcaggagaatcgcttgaacccagga
ggtggaggttgcagggagccgagatcgcaccactgcactccagcctgggc
gacagagcgagattccgtctcaaaaagtaaaataaaataaaataaaaaat
aaaagtttgatatattcagaatcagggaggtctgctgggtgcagttcatt
tgaaaaattcctcagcattttagtGATCTGTATGGTCCCTCtatctgtca
gggtcctagcaggaaattgttgcactctcaaaggattaagcagaaagagt

I was using this:

fastaFromBed -fi input -bed seq.bed -fo output

So shouldn't those sequences be formed in FASTA format (as ncbi says "It is recommended that all lines of text be shorter than 80 characters in length") or at least the same line length as my input file?

What I am doing wrong that I am getting linearized (fasta?) output with fastaFromBed?
What is the quickest way to turn those linear sequences to nicely formatted columns using command line?

Bed File Bedpe Format

$
0
0

Hi,

I'm having trouble with converting the bam file into bed -bedpe using the bedtools.

workflow:
samtools sort -n mut.bam mut.Namesorted
bamTobed -i mut.Namesorted.bam -bedpe > dilpMerged_bedpe.bed

After sorting the file by read name (option -n) I run the bamTobed command. but it gives me an error message after running a few lines:

*ERROR: -bedpe requires BAM to be sorted/grouped by query name.

What am I doing wrong here?

Thanks

A.

How To Extract Scores From Bedgraph File Using Bed Tools

$
0
0

file1

chr1 10 20 name 0 +

file2

chr1 12 14 2.5
chr1 14 15 0.5

How could i extract average scores of file1 using file2, like below? I am trying to extract phastcons (file2) average scores of file1.

chr1  10 20 name 0 + 1.5

Intersectbed - Overlap Analysis Usign Vcf And Bed Files

$
0
0

I am trying to do an overlap analysis between 200 danish exomes (VCF courtsey: Zev) and 10 different gene regions.
I would like to know what percentage overlaps between my region of interest (in mygenes.bed total of 36 lines representing the region) and a VCF file (Danish_*.flt.vcf.gz).

I have tried this command and got result: intersectBed -a Danish1.flt.vcf.gz -b mygenes.bed > D1result.txt

Danish1.flt.vcf.gz: here mygenes.bed: here D1overlapped.txt: here

My assumption is that the output should have lines <= the total number of lines in the mygenes.bed file. But in many instances I am getting more than 36 lines as output. May be am missing something important or may be another tool / option in bedtools can do this task more efficiently. Please let me know your thoughts.

Creating Bed File For Lncrna Using Gencode Gtf File

$
0
0

Hi all,

I want to get the bed file of lncRNA based on GENCODE GTF file

I download the file "gencode.v16.long_noncoding_RNAs.gtf.gz", and extract the chr, start, end info from the file, then I use mergeBed to merge those overlapped lncRNA, am I correct? Since I know we can merge the exon genomic position using this kind of method

While for lncRNA I am not so sure, and is there any place already offering such kind of bed files?

actually, we should got 22444 Long non-coding RNA loci transcripts, however only 11817 genomic regions after merging process.

Anyone knows the answer, could you give me some help?

Determining Each Samples Coverage Area

$
0
0

First time I am working with NGS data. I've got a BAM file with mapped reads for my samples and a BED file with the regions in HG19 that were targeted (used an Ion-torrent ampliseq panel). Are there any tools that can output something similar to this:

**Sample      Amplicon           Chromosome           Start_coordinate_of_coverage             End_coordinate_of_coverage**
Sample1       amp_001                chr6                 1,000,000                                   1,000,250
Sample2       amp_001                chr6                 1,000,111                                   1,000,255
Sample1       amp_002                chr6                 1,000,200                                   1,000,333

I basically want to know for each gene what coverage we have for each sample.

EDIT: changed column headings, I'm looking for coordinates that have coverage, not depth at each exon.

Which Of The Genes Are Enriched With Repeat Elements

$
0
0
I would like to know which of my genes are enriched with repeats of LINE/SINE/ERV etc. elements. I have a bam file and the repeats in bed format. As far as I know BAM files contains aligned data for each short read sequence from the fastq file. I am trying to figure out what is the best approach to know which genes (+- 1000 bp) have more repeats elements. I am thinking about two approaches to implement but not sure which one is the best. here are the approaches i was thinking to use a) Shall I convert the bam file into bed file and then use bedtools merge. So that I can overlap with the repeats file using bedtools window -c -l -r option. And I know how many of the repeats are overlapping or near by the short reads. Then count this number for each gene. For example, chr start end gene number_of_repeats chr1 100 200 gene1 70 chr1 190 240 gene1 40 chr1 250 400 gene1 100 chr2 500 600 gene2 150 if i sort and merge them i will get chr1 100 240 gene1 90 chr1 250 400 gene1 100 chr2 500 600 gene2 150 So gene1 will have 190 (90 + 100) and gene 2 will have 150 number of repeats. Or b) shall I count the number of repeats which for each short sequence without any merging? ...

Tool For Binning Windowbed Output For K-Means Clustering

$
0
0

I have mapped high resolution ChIP-seq data to transcription start sites using windowBed. I now want to bin the data, in bin sizes of my choosing, relative to TSSs so that I can generate heat maps and do k-means clustering on the data.

What tool/s exist for doing this?

Thanks!

Viewing all 3764 articles
Browse latest View live