[Bioperl-l] CONTIG dealing
n.appleby at uq.edu.au
Wed Oct 18 17:58:06 EDT 2006
I have just entered the wonderful new world of BioPerl, so the answer to my
question may be obvious to any of the gurus reading this.
I need to collect sequence features and ontology annotations. Here goes.
I am retrieving sequences from SwissProt via Bio::DB::SwissProt and
get_Seq_by_id, for this example Q8RZV7. Once I have parsed it into an RDBMS
format that I am happy with I can get at the xref ids. In this case, they
AP003451; BAB86144.1; -; Genomic_DNA.
AP008207; BAF07116.1; -; Genomic_DNA.
AB103395; BAC81207.1; -; mRNA.
I can happily go off and fetch those from Bio::DB::GenBank (first column),
and Bio::DB::GenPept (second). All good, except...
AP008207 is a contig. I don't want to get all of the features for the entire
thing, just the single contig that actually matches the original sequence.
It takes a couple of hours to get at it and then it gives me way too much.
I will come across this problem with other sequences. How do I (a) find out
if it is a contig without downloading it in it's entirety and (b) extract
the list of sequences that are about to be contigged together.
I have searched the web for answers, including this list, but see nothing.
More information about the Bioperl-l