Quantcast
Channel: Post Feed
Viewing all 3764 articles
Browse latest View live

Converting Gff To Bed With Bedtools?

$
0
0

I use bedtools's sortBed utility to sort BED files for various operations. It takes as input GFF files as well. However, when I feed it a GFF file as in:

sortBed -i myfile.gff

it outputs it as GFF, not BED. Is there a way to make bedtools sort and then convert the result to BED? Many bedtools utilities have a -bed flag. Do I need to use a different subutility of bedtools to achieve this? thanks.


How To Use Bedtools Windows To Overlap Upstream For Positive Strand Strand

$
0
0

Hi,

I am trying to use bedtools windows. It has been explained in the manual of the bedtools but I am still bit confused and thought a confirmation would be good. And I have no biological background.

I have divided my bedfile into two, based on the strand information(For example, posStrand.bed and negStrand.bed).

I would like to screen overlaps of LINEs within 5000bp upstream of my postStrand.bed file.

  1. In this case shall I use -l or -r option from bedtools window?
  2. since all are on + strand, do I need to use the -sw option?

Splice Junction file intersection with genome annotation

$
0
0
Hello,   I have a tab delimited format Splice Junction file and the file looks something like this: chr1    11212    12009    1    1    0    0    2    48 chr1    11672    12009    1    1    0    0    1    31 chr1    11845    12009    1    1    0    0    1    28 chr1    12228    12612    1    1    1    0    1    32 chr1    12722    13220    1    1    1    0    3    9 chr1    14830    14969    2    2    1    0    218    50 chr1    15039    15795    2    2    1    0    98    50 chr1    15948    16606    2    2    1    1    10    48 chr1    16766    16857    2    2    1    0    24    44 chr1    16766    16875    2    2    0    0    2    36 The task is to filter out lines in which Column 6 has value 1, Column 7 has value 1 and Column 8 has value 10 or greater.    I have been going through the bedtools documentation but I am not quite sure on how to get started, I would appreciate a few pointers on how to get going. My input file is going to be in the tab delimited format and I also have the Gencode V.19 GTF file for annotation.   Thanks! *** Edit *** Column 1: chromosome Column 2: first base of the intron (1-based) Column 3: last base of the intron (1-based) Column 4: strand Column 5: intron motif: 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT Column 6: 0: unannotated, 1: annotated (only if splice junctions database is used) Column 7: number of uniquely mapping reads crossing the junction Column 8: number of multi-mapping reads crossing th ...

Bedtools To Compare A Vcf File From Samtools Mpileup With Dbsnp?

$
0
0

Hello,

I have one big vcf file which is genereated by samtools mpileup by comparing 6 cell lines to see whether there are SNP differences between them.

I would like to use bedtools for intersecting. How can I do it? do you have some scripts for that.

Thanks

how to run subtract command in java

$
0
0

I want to run subtract command in java, could somebody tell me how to use.

Thank you very much.

Changing Column Order In Bed File

$
0
0

Here is my data with A, B, C and D columns in my bed file.

   A.     B.     C.     D.
  Chr 1.  1.    12.     +
  Chr 2.  24.   56.     +

How can I move my D column to position 1 where the Column A right now?

Calculating Exome Coverage

$
0
0

*// Edit to make the post more clear (Mapping done via Bowtie2). My problem is that when counting Exome Coverage via coverageBed gives different results than via genomeCoverageBed. So I'm not sure if I'm doing something wrong, or which of the 2 methods is correct.

1) My first step is to build an .bed file of my Illumina Paired-End reads, returning the positions that only fall in targeted exon regions. I'm doing that via intersectBed -a [data.bed] -b [illuminaexonregions.bed].

2) My next step is to calculate the coverage of my new datafile via coverageBed -a [newdata.bed] -b [illuminaexonregions.bed]. I calculated some statistics:

Number of exons 214126 with a total length of 45326818

Number of matched nucleotides 10993449.0

Nucleotides/Length*100 24.253740909 % Coverage.

3) The next step was to calculate the coverage of my new datafile via genomeCoverageBed -i [newdata.bed] -g [genome.txt] -d awk '$3>0 {print $1"\t"$2"\t"$3}'. I calculated some statistics:

Number of exons 214126 with a total length of 45326818

Number of matched nucleotides 10576907.0

Nucleotides/Length*100 23.3347661863 % Coverage.

Somehow there's a difference in matched nucleotides, which I can't explain. What am I doing wrong?

What Is The Best Way To Run Bedtools In Parallel With Blocking

$
0
0

Say I am working on a server with a shared file system and 4 quad core nodes (I/O is not an issue, 16 cores total). I want to run coverageBed across 20 files. Currently I have a shell script that would do this sequentially. It is possible to just background the command so they run in parallel but I am not sure how to block in BASH. (next step requires counting between the files) Assuming I/O is not a bottleneck, what are ways of leveraging the advantage of multiple nodes/cores when running bedtools (or any other sequential commands for that matter).

From my rudimentary understanding of parallel programming the concept I am trying to get at is how do you 'block' so that that the next command after coverageBed will not be executed until all coverageBed runs are done.

I was thinking of wrapping the shell commands in a python script and having queue of coverageBed commands and a function to feed commands 4 at a time (since quad cores) and the function would only return when queue is empty. Is there a better way of doing this?


bedtools intersect - something wrong with chromosome numbers >= 10?

$
0
0

Hi!

I have an alignment (.bam) of reads to mm9 genome. I sorted it with samtools sort, so that later I can use -sorted key with bedtools. I also created a .bed-file with regions of interest, in which I want to count number of reads, that mapped to them. I tried this: converted .bam to .bed with bedtools bamtobed, and then intersected them counting number of hits (bedtools intersect -a regions_of_interest.bed -b alignment_sorted.bed -c -sorted  > Neg2H_counts.bedgraph). The problem is, it looks fine for all chromosomes with numbers from 0 to 9 (and X), but all counts for all regions of interest of chromosomes with higher number (chr10, chr11, etc) are 0. There is no biological reason for that, in fact the highest signal should be on chr11. What could be wrong here? I am fairly new to all these tools.


UPDATE
I tried to do the same intersection with bedmap and the result is identical... So there probably is something wrong with my files - what could it be?
I also tried sorting the alignment-derived bed-file in the same way, as I did with the files with regions of interest and it doesn't help.

Renaming SNPs or SNP matching

$
0
0
This should be easy to do by now, but... we have SNP data from an Illumina exome array given to us in PLINK format. The BIM file looks like this:1 exm2253575 0 881627 G A 1 exm269 0 881918 A G 1 exm340 0 888659 T C 1 exm348 0 889238 A G 1 exm2264981 0 894573 G A 1 exm773 0 909238 G C 1 exm782 0 909309 C T 1 exm912 0 949608 A G 1 exm991 0 977028 T G 1 exm1024 0 978762 A G And I have all of the SNPs in dbSNP 138  downloaded as a large VCF file: #CHROM POS ID REF ALT QUAL FILTER INFO 1 10019 rs376643643 TA T . . RS=376643643;RSPOS=10020;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000200;WGT=1;VC=DIV;R5;OTHERKG 1 10054 rs373328635 CAA C,CA . . RS=373328635;RSPOS=10055;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000210;WGT=1;VC=DIV;R5;OTHERKG;NOC 1 10109 rs376007522 A T . . RS=376007522;RSPOS=10109;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000100;WGT=1;VC=SNV;R5;OTHERKG 1 10139 rs368469931 A T . . RS=368469931;RSPOS=10139;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020001000002000100;WGT=1;VC=SNV;R5;OTHERKG 1 10144 rs144773400 TA T . . RS=1447734 ...

How Can I Compare And Merge Bed Files

$
0
0

I have three bed files with chrNo, start, end position and type. I need to compare each chrNo, start and end position of one file with 2 other files and write the common one in a new file. Can any one suggest how can I do this efficiently? I wrote the simple perl script, but as the file is huge, it is taking a lot of time, thus is not feasible. Thanks in advance

Example files:

file1.bed:

1 20 30

1 100 120

1 200 300

file2.bed:

1 2 5

1 25 34

1 200 300

file3.bed:

1 30 33

1 200 300

1 500 600

common.bed

1 30 34 --> coordinates with overlapping 5bp is considered as same but outermost coordinates of the 3 is taken in common file

1 200 300

To Group Items In Bed Files

$
0
0

For example, we now have a bed file:

chr1 23455 45678
chr1 23446 45663
chr1 23449 45669
chr1 30000 31000

Is there anyway to group the first three lines, while leaving the last line alone? I know Bedtools have mergeBed function, merging those overlapping span, which, however will include the last line.

This may sound a pure computational question; but I'm just curious if we have available tools already to tackle such questions

thx

Get The Idea Of Splicing From Reads Mapped In Rna-Seq

$
0
0

I've got a set of 100 bam files from a public experiment, I want to have an idea of splicing in each of them regarding three exons,without entering in some kind of depth-level procedure like Cufflinks or DEXSeq,

Lets say that my exons are named 1,2 and 3, and I want to know in how many samples I have a splicing event of the number two, so i was looking in the threads and I found that using coverageBed with my bed file of the three exons I could get some kind of idea per bam file

coverageBed -split -abam my_alignment -b exons_to.bed

Am I correct?

I was also thinking of getting the reads mapped in flanking end positions of read 1 and start of read 3 with samtools

What do you think about it? Any idea will be kindly appreciated

Thanks in advance!

Converting Sam Files To Bam Files - Reproduce Results Nature Paper: Transcriptome Genetics Using Second Generation Sequencing In A Caucasian Population

$
0
0
I want to reproduce the results that people achieved in the following Nature paper: Transcriptome genetics using second generation sequencing in a Caucasian populationhttp://www.nature.com/nature/journal/vaop/ncurrent/full/nature08903.html I downloaded their SAM files from the groups website:http://funpopgen.unige.ch/data/ceu60 I downloaded a reference fasta and fai file from: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/technical/reference/ The main problems seem to exist that I'm not able to convert these SAM files into proper "working" BAM files so that I can get BED files that is the input format for FluxCapacitor (http://flux.sammeth.net/). I tried using the following steps (as there is no "proper" header in the SAM files I've to do some additional steps):
  1. samtools view -bt human_b36_male.fa.gz.fai first.sam> first.bam
  2. samtools sort first.bam first.bam.sorted
  3. samtools index first.bam.sorted
  4. samtools index aln-sorted.bam
When I the ...

Profile Coverage Of Rnaseq Samples?

$
0
0

Hi all,

I have a quick question:

How can I visualize aligned paired-end reads from RNAseq datasets in UCSC browser?

I already mapped the reads and assembled the transcripts with Tophat/Cufflinks but I'm not sure how to proceed to visualize the mappings

After sorting the BAM files and fixing the mate pairs, I tried to compute the coverage using the following commands:

genomeCoverageBed -bg -split -ibam F.T0.rep2-accepted_hits-fS.bam -g ~/conversion_util/chrom.hg19.sizes > F.T0.rep2-accepted_hits-fS.bg
bedGraphToBigWig F.T0.rep2-accepted_hits-fS.bg ~/conversion_util/chrom.hg19.sizes F.T0.rep2-accepted_hits-fS.bw

But I was not able to visualize properly the mappings. Here I paste a screenshot of how it looks like:

Do you know where is the mistake?

Thanks!


Extracting Genomic Coverage Information Across Different Samples

$
0
0
Hello, I have 3 bam files that i wanted to compare against each other. For example i have reference file with 10,000 sequences. I have paired end reads sequenced for 3 different samples. 1) Sample 1 is 100% same as reference so we expect all reads to map to it 2) Sample 2 is 80% similar to reference so 20% of reference sequences wont have any reads 3) Sample 3 is 60% similar to reference and 40% of reference wont have any reads. Now my goal is to identify what reference sequences doesnot have any reads mapped in Sample 2 and 3.I need to identify the 20% reference sequences from Sample 2 and 40% from Sample 3. Also in some cases in a reference which is approx 10kb long, sample 1 maps to entire 10kb, sample 2 maps to first 5kb and sample 3 maps to last 3kb. so i need to identify the partial regions for those reference sequences as well. I have the mapped sorted bam files for all these three samples. I am looking in to using bedtools but not sure what in bedtools will give the answer i needed. i have the following commands which might do similar but it ouputs differences at every base. genomeCoverageBed -bg -ibam sample1.bam > sample1.bedgraph genomeCoverageBed -bg -ibam sample2.bam > sample2.bedgraph unionBedGraphs -header -i sample1.bedgraph sample2. ...

Convert .Txt Into Bed Files

$
0
0

I used paired-end sequence data for copy number variation study; and eventually get .txt files as output. I'm hoping to use Bedtools to compare my results with others.

Can I convert .txt files into .bed files? (I don't see option in Bedtools)

If Bedtools is not working, what software can I use for data comparison?

my lines of txt is just like:

deletion    chr9:6169901-6173000    3100
deletion    chr9:7657401-7658800    1400
deletion    chr9:8847501-8848600    1100
deletion    chr9:10010201-10011600    1400
deletion    chr9:10126601-10127700    1100

thx

edit: I converted the txt files into bedpe format, which looks like

chr21    18542801    18543500
chr21    18545701    18545900
chr21    19039901    19040600
chr21    19164301    19169400
chr21    19366001    19370200
chr21    19639601    19640300
chr21    20493701    20495700
chr21    20581401    20583000
chr21    20880901    20882700
chr21    21558601    21559700

Then I started to compare two bedpe, looking for overlapping region, using the command like:

pairToPair -a 1.bedpe -b 2.bedpe > share.bedpe

Then I see the errors:

It looks as though you have less than 6 columns.  Are you sure your files are tab-delimited?

MY bed file have only three columns, seems it requires 6....What's the problem here? thx

How To Combine Fpkm Values From Cufflinks With Contigs From De Novo Assembly Program Velvet/Oases?

$
0
0

Hi all,

I am working on RNA-seq data analysis. I've finished running Tophat and Cufflinks to get FPKM values for each read from Illumina pair-end sequence. Also, parallely I've run Velvet to get contig sequences through de novo assembly and Gmap to see if the assembled sequences map to reference genome (this reference genome is not complete for now, but somewhat useful). Now, I am trying to combine all information so I can have sequence information for a contig and FPKM value for the corresponding to the contig. Some suggested I can convert Cufflink and Gmap outputs to bedfiles and then use IntersectBed to see if there's any overlap. However, I am not sure how I can have every information saved in the output from Bedtools. IntersectBed default seems to provide me overlapped region with 'A' file as a template, so I couldn't see any information from 'B' file. Is there any solution for me?? Please let me know. I would appreciate for your suggestion!

Which Of The Genes Are Enriched With Repeat Elements

$
0
0
I would like to know which of my genes are enriched with repeats of LINE/SINE/ERV etc. elements. I have a bam file and the repeats in bed format. As far as I know BAM files contains aligned data for each short read sequence from the fastq file. I am trying to figure out what is the best approach to know which genes (+- 1000 bp) have more repeats elements. I am thinking about two approaches to implement but not sure which one is the best. here are the approaches i was thinking to use a) Shall I convert the bam file into bed file and then use bedtools merge. So that I can overlap with the repeats file using bedtools window -c -l -r option. And I know how many of the repeats are overlapping or near by the short reads. Then count this number for each gene. For example, chr start end gene number_of_repeats chr1 100 200 gene1 70 chr1 190 240 gene1 40 chr1 250 400 gene1 100 chr2 500 600 gene2 150 if i sort and merge them i will get chr1 100 240 gene1 90 chr1 250 400 gene1 100 chr2 500 600 gene2 150 So gene1 will have 190 (90 + 100) and gene 2 will have 150 number of repeats. Or b) shall I count the number of repeats which for each short sequence without any merging? ...

How To Rearrange Paired End Bam File?

$
0
0

Hello all,

I have a paired end bam file and I want to use bedtools for them. After merging, the paired end read alignments are not lying next to each other. It is making problems in the bedtools process. Is there any tool available to rearrange the paired end read alignments in bam file?

Thanks, Deeps

Viewing all 3764 articles
Browse latest View live