[Bioperl-l] how to get specific sub-sequence from GenBank file with Bio::DB::GenBank

Roy Chaudhuri roy.chaudhuri at gmail.com
Mon Apr 16 07:16:57 EDT 2012


Hi Juan,

If you know the LTR coordinates in advance, then you can download a 
specific subsequence using Bio::DB::GenBank as shown here:
http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::GenBank_when_you_have_genomic_coordinates_to_get_a_Seq_object

If you don't, then you will need to download the whole sequence as you 
are doing, but add in some code to print out just the sequence 
associated with the LTR feature. Something like (untested):

for my $feat ($seq->get_SeqFeatures) {
   $seqs_out->write_seq($feat->spliced_seq) if $feat->primary_tag eq 'LTR';
}

Cheers,
Roy.


On 15/04/2012 04:27, Juan Jovel wrote:
>
>
> Hello All, I want to get some subsequences from provirus sequences in
> the GenBank, I got the whole sequences with the script below.
> However, I want to get a specific sub-sequence, which appears in the
> GenBank files in the line:  LTR             9091..9723 how can I
> modify my script to get only nts 9091-9723 (in this example), instead
> of the whole sequence. Thanks a lot in
> advance!________________________HERE THE SCRIPT: #!/usr/bin/perl
> -wuse strict;use Bio::DB::GenBank;use Bio::SeqIO; chomp(my $infile =
> $ARGV[0]);open(IN, "$infile") or die "$!";my @ids =<IN>; chomp(my
> $outfile = $ARGV[1]); my $seqs_out = Bio::SeqIO ->  new(-file =>
> ">$outfile",			     -format =>  "fasta"); foreach my $entry(@ids){
> print "$entry";	my $db = Bio::DB::GenBank->new;	my $seq =
> $db->get_Seq_by_acc($entry);	$seqs_out->write_seq($seq);	}exit;
>
>
>
>
>
>
>
>  _______________________________________________ Bioperl-l mailing
> list Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list