Quantcast
Channel: Post Feed
Viewing all 3764 articles
Browse latest View live

Using Gnu Parallel For Bedtools

$
0
0

I am trying to run gnu:parallel on bedtools multicov function where the original command is

bedtools multicov -bams bam1 bam2 bam3.. -bed anon.bed  > Q1_Counst.bed

I would like to implement the above command using gnu parallel. But when I run the command below

parallel -j 25 "bedtools multicov -bams {1} -bed {2} > Q1_Counst.bed" ::: minus_1_common_sorted_q1.bam minus_2_common_sorted_q1.bam minus_3_common_sorted_q1.bam plus_1_common_sorted_q1.bam plus_2_common_sorted_q1.bam plus_3_common_sorted_q1.bam ::: '/genome/genes_exon_2.bed'

each bam file is taken as separate argument , hence the processes starting are like

bedtools multicov -bams  bam1 -bed anon.bed  > Q1_Counst.bed
bedtools multicov -bams  bam2 -bed anon.bed  > Q1_Counst.bed
bedtools multicov -bams  bam3 -bed anon.bed  > Q1_Counst.bed

instead of taking all files as separate arguments. Hence Q1_Counst.bed is overwritten randomly. Could any one help me in getting exact command ? My server has around 30 cores.


Can Bedtools/Bedops Used To Extract Regions Where Scores Are Higher Than A Given Value?

$
0
0
I have a very basic question about bedtools and bedops. Can I use these tools to filter all the regions where the score is higher (or lower) than a given value? For example, let's say that I have a BED file like the following: chr7 127471196 127472363 Pos1 12 + 127471196 127472363 255,0,0 chr7 127472363 127473530 Pos2 200 + 127472363 127473530 255,0,0 chr7 127473530 127474697 Pos3 120 + 127473530 127474697 255,0,0 chr7 127474697 127475864 Pos4 54 + 127474697 127475864 255,0,0 chr7 127475864 127477031 Neg1 2 - 127475864 127477031 0,0,255 chr7 127477031 127478198 Neg2 15 - 127477031 127478198 0,0,255 chr7 127478198 127479365 Neg3 25 - 127478198 127479365 0,0,255 chr7 127479365 127480532 Pos5 2 + 127479365 127480532 255,0,0 chr7 127480532 127481699 Neg4 9 - 127480532 127481699 0,0,255 According to the BED format's specs, the fifth column contains a score, between 0 and 1000 (alternatively, in the bedGraph format the score is on the 4th position). If I want to get all the regions that have a score higher than 20, for example, I can do an awk search: $: awk '$5 > 20 {print}' mybedfile.bed However, in order to use awk, I have to keep the BED file in a uncompressed format. It would be much better if I could use the .starch format in Bedops, or if I could combine any Bedops/Bedtools operation with th ...

Tool: Bedtools: Analyzing Genomic Features

$
0
0
All practicing bioinformaticians will face problems that require them to compare, query and select genomic features across an entire genome. As it happens efficient interval representation and query is a surprisingly challenging problem that needs a specialized representation. The BEDTools suite contains a set of programs that support a broad range of interval analyses that involve selecting certain locations in the genome. The name reflects the original intent to process BED files but the tools operate just as well on GFF formats. The scripts need to be run in command line format and are available for UNIX type systems: Linux, Mac OSX, and Cygwin (on Windows). The link to the site is: http://code.google.com/p/bedtools/ With BEDTools one can answer questions such as:
  • how many reads map upstream/downstream of one or more locations in the genome?
  • how many reads cover a certain base in the genome?
  • which sections of the genome are not overlapping with target intervals?
  • what are the sequences specified by the coordinates?
  • ...
The suite consists of multiple tools but for beginners the most important is ...

Getting Number Of Reads In Intervals With Bedtools

$
0
0

What is the correct way to get the total number of reads strictly contained in each interval in a GFF from a BAM file while enforcing strandedness? What I am looking for is very close to this intersectBed feature:

-c    For each entry in A, report the number of overlaps with B.
    - Reports 0 for A entries that have no overlap with B.
    - Overlaps restricted by -f and -r.

Except that I'd like the number of overlaps in A for each entry in B (i.e. the other way around). If I do:

intersectBed -abam mybam.bam -b mygff.gff -s -f 1 -wb

Then my understanding is that this will report the entry in B for each overlap with A. But I'd like each entry in B to be outputted exactly once, with the number of reads from A that are contained strictly within it. I'm not sure how to enforce strict containment here.

Is coverageBed the solution to this? Or multicov? I'm not sure how to enforce strict containment using coverageBed - it's not clear to me if that's the default from the docs. Thanks.

Correlation Of Fpkm And Length Normalized Transcript Mapped Read Count

$
0
0
Hello, in the process of estimating expression for a 16 human tissue dataset ("Human Body Map 2.0 GSE30611") I used different methods to estimate the expression of the genes. After mapping against hg19 genome version, I used the UCSC provided refseq annotation for hg19 to count mapped reads for ~40,000 human genes in two ways:
  1. Counting with cufflinks outputs a Fragments Per Kilobase Per Million mapped fragments value (FPKM) for each transcript. The FPKM value basically accounts for library size and also the length of the transcript comprising all the annotated exons + some additional likelihood estimator to assign reads (see here).
  2. Counting mapped reads with bedtools and divide a transcript's mapped count by the sum of all the exon lengths. This gained a length normalized expression estimate to compare between genes.
However, the correlation of (1.) and (2.) is always around ~0.65 between same tissues (technically the same experiment). I would expect this correlation to be > 0.9.Below, I plotted (2.) against (1.) for all ~40,000 transcripts. It seems like normal length normalization is simply overestimating some expression.Can someone she ...

Bed File Bedpe Format

$
0
0

Hi,

I'm having trouble with converting the bam file into bed -bedpe using the bedtools.

workflow:
samtools sort -n mut.bam mut.Namesorted
bamTobed -i mut.Namesorted.bam -bedpe > dilpMerged_bedpe.bed

After sorting the file by read name (option -n) I run the bamTobed command. but it gives me an error message after running a few lines:

*ERROR: -bedpe requires BAM to be sorted/grouped by query name.

What am I doing wrong here?

Thanks

A.

Bedtools subtract not dealing well with large datasets

$
0
0

I am using bedtools subtract with large datasets and it keeps crashing, giving the following error

    terminate called after throwing an instance of 'std::bad_alloc'

Is there a way to get over this problem in bedtools?

 

Alternatively is there any other way to find nonoverlapping regions for two bed files?

thanks

Genomecoveragebed - Bedtool For Reporting Per Base Genome Coverage

$
0
0
Hi Everyone I would nedd some help on genomeCoverageBed tool. This tools when used for finding per base genome coverage uses an option -d. I am actually interested in finding read counts for each base within a particular intron of a gene. I will like to explain you more just to make myself clear. I used IGV to see how my alignments looks and moreover what is the coverage of each base within a particular intron. When I take my cursor in IGV to the area exactly above the base (i am interested in)within the coverage track it gives me such details: Total Count:6 A:0 C:0 G:6 T:0 N:0 Now this total count is basically the read count for the base G within that intron. This counts says that 6 reads have actually covered this base position(and hence base). Now when i use this code snippet which is basically finding per base genome coverage genomeCoverageBed -i 2-B3-1b-D303A_sorted.bed -g pombe.genome -d this code gives me around 31 as the depth for that base(i.e G in my example). Looking closely in IGV i figured out that this 21 is basically 21 = 6 + 15 where 6 is the actual reads that has covered this base position(hence base) and 15 means that these reads have not covered that base at that position, but since the genomeCoverageBed tool calculates depth of feature coverage it also includes all those reads which skips that particular base. I would provide you with an image to make it more clear I would like to know how can i ...

Intersectbed Overlap

$
0
0

Hi,

I've a question about intersectBed. Is it possible to extract only alignment like this :

chromosome ===============================================================
BED/BAM A               ==============              =================
BED FILE B               ============
RESULT                  ==============

But no alignment like this (even if the read overlapp 100% of the feature, I don't want to extract these reads)

chromosome ===============================================================
BED/BAM A    =========================              =================
BED FILE B               =============
RESULT

So, only extracting reads that have 90-95% of its sequence overlapping 90-95% of the feature.

Is it clear ?

Thanks,

N.

How To Explain Uneven Coverage Of A Dna Seqment Obtained Via Pcr Amplification.

$
0
0

Experiment: deep sequencing for mutants in 700nt fragment.

the fragment of dna was preamplified by primers flanking the fragment followed by hiseq.

per base coverage was calculated by coverageBed -d -abam in.bam -b ref.bed > out.cov

Observation: two distinct peaks in coverage at the ends as below plot.. coverage vs positions

enter image description here

the peaks are made from reads having part of primers..thus also show soft clipping at ends..

there is a huge difference in the calculations if i include such reads And if I exclude them.

Question: is there anyone who knows how to handle such a situation?

How To Rearrange Paired End Bam File?

$
0
0

Hello all,

I have a paired end bam file and I want to use bedtools for them. After merging, the paired end read alignments are not lying next to each other. It is making problems in the bedtools process. Is there any tool available to rearrange the paired end read alignments in bam file?

Thanks, Deeps

Counting The Whole Insert Size From Paired-End Reads As Coverage

$
0
0

We have updated our workflows for per base sequence coverage to use genomeCoverageBed from BAM files. However for pair-end data it seems as though the regions between pair-end reads are not counted.

To be clear I am not talking about using -split for not counting introns in a single read of a paired-end, instead I am looking to count the probable whole insert when the insert size is greater than the combined read length of the paired reads.

We've looked at using iRanges from BioConductor as well but cannot tell if this would do what we want.

Is there is hidden flag in genomeCoverageBed to count the whole insert as coverage, not just the sequenced ends? Is there another program out there what would work on BAM files?

I know I can alter the SAM file before BAM conversion but this seems like something that should be coded somewhere already.

Remove Intronic Regions in .BAM

$
0
0

Hi

I have a .BAM file which contains discordantly and concordantly mapped mate-pairs. I used bedtools Pairtobed to extract the mate-pairs which both show overlap with targeted regions (Illumina target .bed file). Is it somewhere possible to remove the parts of the mate-pairs that do not show overlap? I couldn't find it in the bedtools manual... can I just use intersectBed for each read for this?

 

Thanks!

Per Base Coverage

$
0
0

Is there a way to obtain per-base coverage for a define chromosome interval using a bam file generated from Illumina single-end reads? genomeCoverageBed in Bedtools does not seem to have an option for it.

Splice Junction file intersection with genome annotation

$
0
0
Hello,   I have a tab delimited format Splice Junction file and the file looks something like this: chr1    11212    12009    1    1    0    0    2    48 chr1    11672    12009    1    1    0    0    1    31 chr1    11845    12009    1    1    0    0    1    28 chr1    12228    12612    1    1    1    0    1    32 chr1    12722    13220    1    1    1    0    3    9 chr1    14830    14969    2    2    1    0    218    50 chr1    15039    15795    2    2    1    0    98    50 chr1    15948    16606    2    2    1    1    10    48 chr1    16766    16857    2    2    1    0    24    44 chr1    16766    16875    2    2    0    0    2    36 The task is to filter out lines in which Column 6 has value 1, Column 7 has value 1 and Column 8 has value 10 or greater.    I have been going through the bedtools documentation but I am not quite sure on how to get started, I would appreciate a few pointers on how to get going. My input file is going to be in the tab delimited format and I also have the Gencode V.19 GTF file for annotation.   Thanks! *** Edit *** Column 1: chromosome Column 2: first base of the intron (1-based) Column 3: last base of the intron (1-based) Column 4: strand Column 5: intron motif: 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT Column 6: 0: unannotated, 1: annotated (only if splice junctions database is used) Column 7: number of uniquely mapping reads crossing the junction Column 8: number of multi-mapping reads crossing th ...

macs and bedtools

$
0
0

Hello

I have MACS2 output and now looking for peaks which are situated in introns. I have bed file with introns from USCS for my species. What file with peaks should I use for bedtools intersection? Peaks summit (.bed) or narrow peak (.bed), both from MACS2 output?

"mask" values in a bedgraph

$
0
0

I am trying to plot average conservation in a list of genomic features, and so far managed to do it successfully using a combination of the phastCons bigwig files (hg19.100way.phastCons.bw) and deepTools. However, as extra step, I want to re-do my analysis but this time by removing, or masking, the conservation values in the exons.

My first step, and the easiest, was to remove all features that overlap with exons, using bedtools intersect. This worked, bit seems like a crude way of doing it. So I am now trying to convert all phastCons values in exons to zero.

The question is: how to do it? Consider that I want a nice bigwig at end to input to deepTools. Initially I converted the phastCons bigwig to bedgrap, because it thought map from bedtools would work. It did not, so I am a bit out of ideas now. 

Bedtools Intersectbed

$
0
0

Apologies if this is blatantly obvious!

I would like to compare coordinates in setA with those of setB. The output should have the same number of coordinates as setA and tell me how many nucleotides of each setA coordinate are overlapped by any coordinate in setB.

For example a large coordinate in setA may be overlapped by two setB coordinates, but i want to know how many nucleotides of the setA coordinate are covered by both setB coordinate in total.

I know how to do this on GALAXY as there is the handy 'Coverage' tool in 'Operate on Genomic Intervals'. However, i want to do this on the command line. I have been trying to get BEDTools to do this using 'intersectBed', but i can only seem to get just the overlapping setA coords (using -u), or get the nucleotide over for multiple setB coordinates on separate line (using -wao), or a count of how many setB overlaps setA (using -c).

SetB coordinates are non-overlapping themselves, so i guess i could tally up those SetB coordinates that overlap the same setA coordinate.

Can BEDTools do what i want or there another command line way of doing what i want?

Thank you!

PS I have also sent the to BEDTools discussion, so apologies for any double postings!

how to run subtract command in java

$
0
0

I want to run subtract command in java, could somebody tell me how to use.

Thank you very much.

Intersectbed Provides An Empty Output

$
0
0

Hi,

I've downloaded the recent Cygwin version 1.7.24 and an trying to run bedTools but I get an empty file as my output. When I run the same commandline and files on a colleagues computer also through Cygwin I get a file containing the overlaps I seek. is the new Cygwin not compatable with BedTools? I've put the command line we used below:

./intersectbed -a Gene_body.bed -b EdgeR1.bed -wao > yyy.temp

Any help would be appreciated.

Viewing all 3764 articles
Browse latest View live