Quantcast
Channel: Post Feed
Viewing all 3764 articles
Browse latest View live

How To Get Annotation For Bed File From Another Bed File

$
0
0

Hello All,

I have a bed file (with Chr, Start, End, Name, Score and Strand)

Chr1 5678 5680 NA 7  +
Chr1 700  800  NA 8  -
Chr1 900  1200 NA 10 -

and would like to know, how can I get the annotation for the name column from another bed file

Chr1 5500 6000 Gene1 x +
Chr1  500 1000 Gene2 x -

or any standard genome file formats like gbk or .fna files or for that matter another bed file? So mu output file will be a bed file with Chr, Start, End, Name and Strand.

Chr1 5678 5680 Gene1 7 +
Chr1 700  800  Gene2 8 -
Chr1 900  1200 Gene2 10 -

Any easy and standard way to do this??

Bedtools usually operates more on the features but not sure if annotation from one bedfile can be extracted into the other based on overlapping feaures.

Thanks in advance!


Does Bedtools Intersect -V Consider Unmapped Reads "As Not In B"

$
0
0
bedtools intersect -v -abam my.bam -b myregions.gff > notinmyregions.bam

would we see reads with 4 in the FLAG field - i.e. unmapped reads in notinmyregions.bam

Bedtools To Compare A Vcf File From Samtools Mpileup With Dbsnp?

$
0
0

Hello,

I have one big vcf file which is genereated by samtools mpileup by comparing 6 cell lines to see whether there are SNP differences between them.

I would like to use bedtools for intersecting. How can I do it? do you have some scripts for that.

Thanks

Getting Number Of Reads In Intervals With Bedtools

$
0
0

What is the correct way to get the total number of reads strictly contained in each interval in a GFF from a BAM file while enforcing strandedness? What I am looking for is very close to this intersectBed feature:

-c    For each entry in A, report the number of overlaps with B.
    - Reports 0 for A entries that have no overlap with B.
    - Overlaps restricted by -f and -r.

Except that I'd like the number of overlaps in A for each entry in B (i.e. the other way around). If I do:

intersectBed -abam mybam.bam -b mygff.gff -s -f 1 -wb

Then my understanding is that this will report the entry in B for each overlap with A. But I'd like each entry in B to be outputted exactly once, with the number of reads from A that are contained strictly within it. I'm not sure how to enforce strict containment here.

Is coverageBed the solution to this? Or multicov? I'm not sure how to enforce strict containment using coverageBed - it's not clear to me if that's the default from the docs. Thanks.

Intersectbed/Coveragebed -Split Purify Exon?

$
0
0
all.reads.bam file records mapped RNA-seq reads data, including:
  1. exon:exon junction
  2. exon body
  3. intron body
  4. exon:intron junction
Q1: When calculating RPKM for given RefSeq gene including all the position reads, will the following command just calculate exon:exon junction reads and at same time ignore all other reads?coverageBED -abam all.reads.bam -b refseq.genes.BED12.bed -s -split >coverage.bed I'm confused by the mannual (Page 62):
When dealing with RNA-seq reads, for example, one typically wants to only tabulate coverage for the portions of the reads that come from exons (and ignore the interstitial intron seqeunce), The -split command allows for such coverage to be performed.
If "-split" is set, the exon:exon read (for example, 30M3000N46M") exists in -abam bam file, and the 3000N will NOT be wrongly intersected when running intersectBED command. But what about coverageBED command? I do hope the 3000N will be not calculated which makes sense, and I also hope the intron body reads and other reads will be NOT ignored.Q2: If one just want to calculate exon's RPKM, does it mean one should prepare -b file to record all the exon information, and run like this:coverageBED -abam all.reads.bam -b ...

Raw Counts From Cufflinks Output

$
0
0

Hi, I want to ask how to get the raw counts from the output of cufflinks. One way to do this is to use the fpkm.

raw counts = FPKM * (length of that transcript/1000) * (# of mapped reads / 1e6)

The FPKM and length of transcript are in the cufflinks FPKM Tracking Files. But how about the # of mapped reads?

For instance, we have a foo.bam. samtools view -c (-f|-F) flag foo.bam can do this job but I am not quite which flag should I set when it's single-end or paired-end.

Thanks!

Is It Possible To Filter Only Bookend Reads From A Bed File?

$
0
0

I have a bed file with many fragments, some overlapping, some on their own and some adjacent to each other (book-ended) features.

I know can group overlapping and book-ended features using bedtools like

bedtools cluster -i fragments.bed

However I was wondering if anyone knew of a way of obtaining from the input file only the fragments that contain book-ended adjacent fragments.

Any ideas?

Best regards

Error In Bedtools Getfasta: Chromosome Not Found

$
0
0
Hi, I am triing to use BEDtools to get some sequences from genomic coordinates. But I am having an errors saying " WARNING. chromosome (chr12) was not found in the FASTA file. Skipping." for each read that I have in my bed file. I gave you some details about what I am doing. I just download the last version of BEDtools (I think) bedtools-2.17.0. Then I have 2 different files (much more longer that the little part that I show) : A fasta file with all the sequences of chromosomes: >chr01 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN a BED file with my genomic coordinates (already sorted) chr01 187814 190840 chr01 307073 310104 chr01 701047 704068 chr01 702941 705962 chr01 702952 705972 chr01 867716 870740 chr01 914064 917087 chr01 991080 994104 chr01 1039795 1042815 chr01 1058713 1061736 And then I write the command line: bedtools getfasta -fi all.con -bed 1-13sorted2.bed -fo NewCandidates/Genomiccoordinates/1-13_1500.fa The only thing that I get is "WARNING. chromosome (chr01) was not found in the FASTA file. Skipping." , thousands of tim ...

Convert Bamtobed Score

$
0
0

Hey,

just a short question....is there a possibility to set the score in the bed file to "1" an not to the the alignment score?? arguments -tag and -ed only use BAM alignment tags... ?!? :/

Cheers!

Coveragebed, Depth/Breadth Of Coverage

$
0
0

I'm using coverageBed to calculate the depth and breadth of coverage, but I'm not sure I'm doing this right. I want to calculate the two values for each human chromosome.

For example, I've created a bed file with 1 chromosome. When I input my BAM file and the BED file, I get the following output:

chr1    0       249250621       103718897       224950839       249250621       0.9025086

I know the first 3 fields are from my chr BED file, the 4th field is the # of reads, 5th is # of bases covered, 6th is length of chromosome (redundant to field 3), and the last column is the fraction of bases covered (5th field/6th field).

So the 7th/last field gives the breadth of coverage, but I don't see a depth of coverage value. How do I get a depth of coverage?

Extracting Genomic Coverage Information Across Different Samples

$
0
0
Hello, I have 3 bam files that i wanted to compare against each other. For example i have reference file with 10,000 sequences. I have paired end reads sequenced for 3 different samples. 1) Sample 1 is 100% same as reference so we expect all reads to map to it 2) Sample 2 is 80% similar to reference so 20% of reference sequences wont have any reads 3) Sample 3 is 60% similar to reference and 40% of reference wont have any reads. Now my goal is to identify what reference sequences doesnot have any reads mapped in Sample 2 and 3.I need to identify the 20% reference sequences from Sample 2 and 40% from Sample 3. Also in some cases in a reference which is approx 10kb long, sample 1 maps to entire 10kb, sample 2 maps to first 5kb and sample 3 maps to last 3kb. so i need to identify the partial regions for those reference sequences as well. I have the mapped sorted bam files for all these three samples. I am looking in to using bedtools but not sure what in bedtools will give the answer i needed. i have the following commands which might do similar but it ouputs differences at every base. genomeCoverageBed -bg -ibam sample1.bam > sample1.bedgraph genomeCoverageBed -bg -ibam sample2.bam > sample2.bedgraph unionBedGraphs -header -i sample1.bedgraph sample2. ...

Convert .Txt Into Bed Files

$
0
0

I used paired-end sequence data for copy number variation study; and eventually get .txt files as output. I'm hoping to use Bedtools to compare my results with others.

Can I convert .txt files into .bed files? (I don't see option in Bedtools)

If Bedtools is not working, what software can I use for data comparison?

my lines of txt is just like:

deletion    chr9:6169901-6173000    3100
deletion    chr9:7657401-7658800    1400
deletion    chr9:8847501-8848600    1100
deletion    chr9:10010201-10011600    1400
deletion    chr9:10126601-10127700    1100

thx

edit: I converted the txt files into bedpe format, which looks like

chr21    18542801    18543500
chr21    18545701    18545900
chr21    19039901    19040600
chr21    19164301    19169400
chr21    19366001    19370200
chr21    19639601    19640300
chr21    20493701    20495700
chr21    20581401    20583000
chr21    20880901    20882700
chr21    21558601    21559700

Then I started to compare two bedpe, looking for overlapping region, using the command like:

pairToPair -a 1.bedpe -b 2.bedpe > share.bedpe

Then I see the errors:

It looks as though you have less than 6 columns.  Are you sure your files are tab-delimited?

MY bed file have only three columns, seems it requires 6....What's the problem here? thx

Merging/Intersecting Different Gene Annotations - Should I Extend Coordinates?

$
0
0

I want to create gene data-set (as big as possible), hence I am using several gene annotations. However, genes in different annotations overlap (it's the same gene). For reducing biases I overlap different annotations and if genes overlap leave only one gene.

Question:

To ensure this overlap I was thinking to expand gene coordinates - is this necessary? If so, how big extension should be (5bp/100bp)?

Example:

Want to create lncRNA data-set (in the following steps it will be used to search for genomic features).
Input:

  1. GENCODE lncRNA annotation (version 18 - 04/09/2013);
  2. Cabili lncRNA annotation (Cabili et al., 2011 (CSHLP)).

Workflow:

  1. Extract GENCODE genes start/end coordinates;
  2. Extract Cabili genes start/end coordinates;
  3. Extend Cabili coordinates ( -/+ nbp );
  4. Use BedTools intersect;
  5. If genes intersect leave GENCODE gene (as it's a newer annotation (though this step is really subjective)).

I do realize that this extension question depends on the situation and how reliable annotation is, but still hope that someone could suggest something.

GTF2/GFF3 "feature" types and expression analysis

$
0
0
Hi, I aligned a few samples using STAR to the genome provided in the Illumina iGenomes UCSC hg19 bundle (here) -- I used the provided gene feature (gtf2) file as is.  Now, my motive is to calculate the gene and isoform expression levels using bedtools multicov (at the same time). Use of the gtf2 file produces a file containing read counts per exon.  I wish to compute gene and isoform read counts too, so I converted the gtf2 file to a gff3 file using using gtf2gff3 script from SO/GAL (here).  My first question is: Is it OK if the alignment is performed with gtf2 file but counted for reads using the gff3 file, keeping in mind that the gff3 file was converted from the gtf2 file? My second question follows I have read both these resources (here and here) but do not understand the differences between:
  • exon vs CDS
  • transcript vs mRNA
I know that with the process I described, it is possible to retrieve gene read count by selecting only the lines where feature=gene from the bedtools multicov output.  What must I do for isoforms?  I am confused by the semantics. Thanks ahead of time and let me know if my post was not clear enough. ...

Problems Extracting Non-Snps From A Vcf File

$
0
0

Hello,

In an SNP analysis, I am trying to extract those editing sites no found in the dbSNPs vcf file I have downloaded a couple of files (All SNPs and Common/Medical SNPs) from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF.

Following this, I have compared my VarScan *.vcf outputs with the SNP.vcf ones using 3 different approaches:

VarScan compare input.vcf SNP.vcf unique1 input-SNPvcf

bedtools intersect -v -a input.vcf -b SNP.vcf > input-SNP.vcf

bedops --not-element-of -1  input-sorted.bed SNP-sorted.bed > inputs-sorted-SNP.bed

In all 3 cases, the SNP-output is identical to the input.vcf/bed.

These command-lines however work when I use an alu.bed or a repeat-masker-bed.

Is it just that my analysis contains no known SNPs? I have discarded for obvious reasons.

Can somebody point a the reason/solution to this problem?

Thanks, G.


Simple Redirection, I/O Problem With Bedtools

$
0
0

Hi Guys, Just a quick question. Its more of a Bash question rather than Bioinformatics, with Bedtools in question.

I mostly pipe the bedtools I/O. Here's a general scenario :

sed 1d fileA.bed | intersectBed -a stdin -b peaks.bed | intersectBed -u -a stdin -b fileB.bed

Now, the problem is fileB is also having a head, which is reported as an error by intersectBed (makes sense, non-integer start).

How can I remove the first line or the head of the fileB on the fly in the pipe.

Thanks

Does Bedops Have A Command Similar To The Bedtools Makewindows?

$
0
0

With bedtools you can make genomic windows from a genome file or a bed file

input.bed

chr1    1000000 1500000
chr3    500000  900000

[prompt]$ bedtools makewindows -b input.bed -w 250000

chr1    1000000 1250000
chr1    1250000 1500000
chr3    500000  750000
chr3    750000  900000

Does the bedops suite provide a similar way to create genomic windows?

Bedtools Genomecoveragebed Usage : How To Create A Genome File?

$
0
0

I am using BEDTOOLS and the following command to get the coverage file:

$ ./genomeCoverageBed -ibam ~/GG_project/trim/ecoli.bam -g > ~/GG_project/trim/coverage

where ecoli.bam is my sorted bam file, and coverage is my output file

From where do I get the genome file? How do I create a genome file?? Specifically I would need a ecoli.genome file.

Question about number of reads within intervals

$
0
0
Hi there, This question is very basic but I need to ensure that I'm going on the right way. I need to calculate the number of reads falling inside my bed intervals and the number of reads falling outside them. After reading this thread (https://www.biostars.org/p/11832/), I decided to try with this command: intersectBed -abam my_file.bam -b my_file.bed -wa -f 1 | coverageBed -abam stdin -b my_file.bed I would like to know what is the difference between using the previous command, or using only the second part: coverageBed -abam my_file.bam -b my_file.bed The output is quite different for some hits: - First command output: 1 50331576 50331667 (.. gene names..) 0 0 91 0.0000000 1 39845848 39846030 (..gene names..) 70 178182 0.9780220 - Second command output: 1 50331576 50331667 (..gene names..) 47 91 91 1.0000000 1 39845848 39846030 (..gene names..) 143 182182 1.0000000 I think that for first command I get only those reads falling strictly within interval, while for the second one also include reads that partially covering the intervals? This is true? For other hand, I would like also to get the number of reads falling outside the intervals. I can make a new bed file using bedtools complement, but if I use -v option of bedtools intersect would be OK? Like this: intersectBed -v -abam my_file.bam -b my_file.bed -wa -f 1 | c ...

How To Create A Read Density Profile Within A Interval?

$
0
0

HI!

I need some help: I have to create density profile with a window specific of 1kb (how many time a sequence is detected after NGS method). I have to use SAM and BEDtools, I think I can use genomeCov in BEDtools but I don't have genome reference.

So, if anybody is abble to help me...

Thanks

Viewing all 3764 articles
Browse latest View live