[Bioperl-l] fetching all alignments from a sam/bam by read header in perl
abhishek.vit at gmail.com
Sun Feb 26 01:24:14 EST 2012
Reading the doc page for Bio::DB::SAM I see there is a way to fetch reads
by name (read id) but the documentation also says this is slow.(copied
below). I need to do about 300-500 million look ups and if each one is
costly I wanted to know if there is another slick low level way. For my
application I would not have feature location just the read name.
-name Filter on reads with the designated name. Note that
this can be a slow operation unless accompanied by
the feature location as well.
On Fri, Feb 24, 2012 at 6:58 AM, Abhishek Pratap <abhishek.vit at gmail.com>wrote:
> Hi Peter
> You got it right.
> Here is the link :
> On Fri, Feb 24, 2012 at 1:24 AM, Peter Cock <p.j.a.cock at googlemail.com>
> > On Fri, Feb 24, 2012 at 12:55 AM, Abhishek Pratap
> > <abhishek.vit at gmail.com> wrote:
> >> I am wondering if there is a slick way access all the possible
> >> alignments for a read present in sam or bam file given the read
> >> header. Since the existing codebase is in perl I would prefer
> >> something which can be done in/via perl.
> >> By default BAM's are indexed by location so the inbuilt samtools
> >> indexing wont work I guess.
> >> I should also say the input bam file will have in the order of 500
> >> million total alignments and many reads are expected to be aligned to
> >> more than one place in the genome. Given the size of the data loading
> >> it all in one big hash is not turning out to be memory friendly.
> > Are you asking for SAM/BAM read lookup by read name?
> >> PS: I also posted this earlier on Biostar.
> > Link?
> > Peter
More information about the Bioperl-l