From joel.klein at wur.nl Wed Feb 1 09:32:19 2012 From: joel.klein at wur.nl (Bradyjoel) Date: Wed, 1 Feb 2012 06:32:19 -0800 (PST) Subject: [Bioperl-l] Running into problems In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF34BD33ADBAB@exchsth.agresearch.co.nz> References: <33228400.post@talk.nabble.com> <18DF7D20DFEC044098A1062202F5FFF34BD33ADBAB@exchsth.agresearch.co.nz> Message-ID: <33243492.post@talk.nabble.com> Thank you for your help! I think I'm almost there, but I still get 1 more error when I run the script: $ perl blast2.pl Searching gi|146341649|ref|YP_001206697.1| Undefined subroutine &main::hsp_filter called at /usr/share/perl5/Bio/SearchIO/Writer/TextResultWriter.pm line 298, line 71. Smithies, Russell wrote: > > I'd probably cheat a bit and optimise my blast parameters so there's less > output to process. > Also, are you sure an e-value of 100 is what you're after? I'd be aiming > much lower - probably 1e-6. > It also pays to mask repeats if you're blasting against a whole genome to > cut down on the number of rubbish hits. > > --Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Bradyjoel >> Sent: Tuesday, 31 January 2012 12:21 a.m. >> To: Bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Running into problems >> >> >> HI all, >> >> I'm quite new to bioperl and tried to write a script that creates a >> database >> from a newly sequenced genome and then preforms a tblastn against a >> multiple protein fasta file and then creates a blast report were only the >> results that only preservers identity scores above 98%. However my script >> keeps returning numerous errors and problems and since I have only a >> little >> experience I cannot determine were I went wrong. I include the code that >> I >> got so far in the attachment. Hope someone can help. >> Regards Joel >> >> http://old.nabble.com/file/p33228400/blast1.pl blast1.pl >> -- >> View this message in context: http://old.nabble.com/Running-into- >> problems-tp33228400p33228400.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/Running-into-problems-tp33228400p33243492.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From joel.klein at wur.nl Wed Feb 1 09:33:19 2012 From: joel.klein at wur.nl (Bradyjoel) Date: Wed, 1 Feb 2012 06:33:19 -0800 (PST) Subject: [Bioperl-l] Running into problems In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF34BD33ADBAB@exchsth.agresearch.co.nz> References: <33228400.post@talk.nabble.com> <18DF7D20DFEC044098A1062202F5FFF34BD33ADBAB@exchsth.agresearch.co.nz> Message-ID: <33243494.post@talk.nabble.com> Thank you for your help! I think I'm almost there, but I still get 1 more error when I run the script: $ perl blast2.pl Searching gi|146341649|ref|YP_001206697.1| Undefined subroutine &main::hsp_filter called at /usr/share/perl5/Bio/SearchIO/Writer/TextResultWriter.pm line 298, line 71. Smithies, Russell wrote: > > I'd probably cheat a bit and optimise my blast parameters so there's less > output to process. > Also, are you sure an e-value of 100 is what you're after? I'd be aiming > much lower - probably 1e-6. > It also pays to mask repeats if you're blasting against a whole genome to > cut down on the number of rubbish hits. > > --Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Bradyjoel >> Sent: Tuesday, 31 January 2012 12:21 a.m. >> To: Bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Running into problems >> >> >> HI all, >> >> I'm quite new to bioperl and tried to write a script that creates a >> database >> from a newly sequenced genome and then preforms a tblastn against a >> multiple protein fasta file and then creates a blast report were only the >> results that only preservers identity scores above 98%. However my script >> keeps returning numerous errors and problems and since I have only a >> little >> experience I cannot determine were I went wrong. I include the code that >> I >> got so far in the attachment. Hope someone can help. >> Regards Joel >> >> http://old.nabble.com/file/p33228400/blast1.pl blast1.pl >> -- >> View this message in context: http://old.nabble.com/Running-into- >> problems-tp33228400p33228400.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/Running-into-problems-tp33228400p33243494.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From joel.klein at wur.nl Wed Feb 1 09:33:33 2012 From: joel.klein at wur.nl (Bradyjoel) Date: Wed, 1 Feb 2012 06:33:33 -0800 (PST) Subject: [Bioperl-l] Running into problems In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF34BD33ADBAB@exchsth.agresearch.co.nz> References: <33228400.post@talk.nabble.com> <18DF7D20DFEC044098A1062202F5FFF34BD33ADBAB@exchsth.agresearch.co.nz> Message-ID: <33243496.post@talk.nabble.com> Thank you for your help! I think I'm almost there, but I still get 1 more error when I run the script: $ perl blast2.pl Searching gi|146341649|ref|YP_001206697.1| Undefined subroutine &main::hsp_filter called at /usr/share/perl5/Bio/SearchIO/Writer/TextResultWriter.pm line 298, line 71. Smithies, Russell wrote: > > I'd probably cheat a bit and optimise my blast parameters so there's less > output to process. > Also, are you sure an e-value of 100 is what you're after? I'd be aiming > much lower - probably 1e-6. > It also pays to mask repeats if you're blasting against a whole genome to > cut down on the number of rubbish hits. > > --Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Bradyjoel >> Sent: Tuesday, 31 January 2012 12:21 a.m. >> To: Bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Running into problems >> >> >> HI all, >> >> I'm quite new to bioperl and tried to write a script that creates a >> database >> from a newly sequenced genome and then preforms a tblastn against a >> multiple protein fasta file and then creates a blast report were only the >> results that only preservers identity scores above 98%. However my script >> keeps returning numerous errors and problems and since I have only a >> little >> experience I cannot determine were I went wrong. I include the code that >> I >> got so far in the attachment. Hope someone can help. >> Regards Joel >> >> http://old.nabble.com/file/p33228400/blast1.pl blast1.pl >> -- >> View this message in context: http://old.nabble.com/Running-into- >> problems-tp33228400p33228400.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/Running-into-problems-tp33228400p33243496.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From vivekkrishnakumar at gmail.com Wed Feb 1 10:14:05 2012 From: vivekkrishnakumar at gmail.com (Vivek Krishnakumar) Date: Wed, 1 Feb 2012 10:14:05 -0500 Subject: [Bioperl-l] `get_feature_by_name` not working after migrating to Bio::DB::SeqFeature::Store from a Bio::DB::GFF backend Message-ID: Hello, I would like to state my problem: Previously, i was working with a Bio::DB:GFF backend database and was able to retrieve features from the database by feature name using the function 'get_feature_by_name()'. But recently, when I made the switch to Bio::DB::SeqFeature::Store to store the exact same data points as before in the new db, invoking the 'get_feature_by_name()' function returns nothing. Please refer to the following code snippet which is supposed to retrieve a certain gene "locus" feature from the backend database and use that to get the 'start' and 'end' coordinate of the feature based on the 'strand' on which it is present. my ($locus_obj, $gene_models) = get_annotation_db_features($locus, $gff_dbh); sub get_annotation_db_features { my ($locus, $gff_dbh) = @_; my ($locus_obj) = $gff_dbh->get_feature_by_name('Gene' => '$locus'); my ($end5, $end3) = $locus_obj->strand == 1 ? ($locus_obj->start, $locus_obj->end) : ($locus_obj->end, $locus_obj->start); my $segment = $gff_dbh->segment($locus_obj->refseq, $end5, $end3); my @gene_models = $segment->features('processed_transcript:working_models', -attributes => { 'Gene' => $locus }); #will have to sort the gene models return ($locus_obj, \@gene_models); } Also, here is a snippet of the GFF3 file that is used to populate the backend database: ##gff-version 3 chr2 working_models gene 30427563 30429139 . - . ID=gene_35804;Note=Zinc transporter;Name=Medtr2g097580 chr2 working_models mRNA 30427563 30429139 . - . ID=mrna_36255;Parent=gene_35804;Name=Medtr2g097580.1;conf_class=F chr2 working_models exon 30428491 30429139 . - . ID=exon_120028;Parent=mrna_36255 chr2 working_models exon 30427563 30428147 . - . ID=exon_120029;Parent=mrna_36255 chr2 working_models CDS 30428491 30429109 . - 0 ID=cds_120028;Parent=mrna_36255 chr2 working_models CDS 30427756 30428147 . - 2 ID=cds_120029;Parent=mrna_36255 Considering that this gene locus is unique in the entire GFF file, If this above GFF is loaded into the SeqFeature::Store database, you would expect that running the following query should yield a count of "1": SELECT count(f.id) FROM feature as f, name as n WHERE (n.id=f.id AND n.name = 'Medtr2g097580' AND n.display_name > 0); And in my case, it does yield "1". But if I use the function *get_feature_by_name*, retrieve the locus object and try to get the strand of this object, I get the following error: [error] Can't call method "strand" on an undefined value at get_db_features.pl line 910. As specified earlier, I do not have any problems if the backend is Bio::DB::GFF. >From the SeqFeature::Store documentation, we know that 'get_feature_by_name()' is in place for backward compatibility, I even tried modifying the code snippet to call 'get_features_by_name()' instead, but no matter what, I do not get any locus object back from this subroutine! Could someone please guide me in the right direction and let me know if I am making any mistakes here when migrating from GFF2 to GFF3? Thanks in advance. ~ Vivek From lincoln.stein at gmail.com Wed Feb 1 10:20:12 2012 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Wed, 1 Feb 2012 10:20:12 -0500 Subject: [Bioperl-l] `get_feature_by_name` not working after migrating to Bio::DB::SeqFeature::Store from a Bio::DB::GFF backend In-Reply-To: References: Message-ID: Try not putting single quotation marks around $locus, as you are searching for a gene named (literally) "$locus". If this doesn't work, use $gff_dbh->get_features_by_name($locus) instead, as I'm not sure whether the Class=>Identifier syntax is accepted by Bio::DB::SeqFeature::Store. Lincoln On Wed, Feb 1, 2012 at 10:14 AM, Vivek Krishnakumar < vivekkrishnakumar at gmail.com> wrote: > Hello, > > I would like to state my problem: Previously, i was working with a > Bio::DB:GFF backend database and was able to retrieve features from the > database by feature name using the function 'get_feature_by_name()'. But > recently, when I made the switch to Bio::DB::SeqFeature::Store to store the > exact same data points as before in the new db, invoking the > 'get_feature_by_name()' function returns nothing. > > Please refer to the following code snippet which is supposed to retrieve a > certain gene "locus" feature from the backend database and use that to get > the 'start' and 'end' coordinate of the feature based on the 'strand' on > which it is present. > > my ($locus_obj, $gene_models) = get_annotation_db_features($locus, > $gff_dbh); > > sub get_annotation_db_features { > my ($locus, $gff_dbh) = @_; > > my ($locus_obj) = $gff_dbh->get_feature_by_name('Gene' => '$locus'); > > my ($end5, $end3) = $locus_obj->strand == 1 > ? ($locus_obj->start, $locus_obj->end) > : ($locus_obj->end, $locus_obj->start); > > my $segment = $gff_dbh->segment($locus_obj->refseq, $end5, $end3); > my @gene_models = > $segment->features('processed_transcript:working_models', -attributes > => { 'Gene' => $locus }); > > #will have to sort the gene models > return ($locus_obj, \@gene_models); > } > > Also, here is a snippet of the GFF3 file that is used to populate the > backend database: > ##gff-version 3 > chr2 working_models gene 30427563 30429139 . - . > ID=gene_35804;Note=Zinc transporter;Name=Medtr2g097580 > chr2 working_models mRNA 30427563 30429139 . - . > ID=mrna_36255;Parent=gene_35804;Name=Medtr2g097580.1;conf_class=F > chr2 working_models exon 30428491 30429139 . - . > ID=exon_120028;Parent=mrna_36255 > chr2 working_models exon 30427563 30428147 . - . > ID=exon_120029;Parent=mrna_36255 > chr2 working_models CDS 30428491 30429109 . - 0 > ID=cds_120028;Parent=mrna_36255 > chr2 working_models CDS 30427756 30428147 . - 2 > ID=cds_120029;Parent=mrna_36255 > > Considering that this gene locus is unique in the entire GFF file, If this > above GFF is loaded into the SeqFeature::Store database, you would expect > that running the following query should yield a count of "1": > > SELECT count(f.id) FROM feature as f, name as n WHERE > (n.id=f.id< > https://owa.jcvi.org/OWA/redir.aspx?C=6752206bd7374e2483702b789422d7b3&URL=http%3a%2f%2ff.id%2f > > > AND > n.name = 'Medtr2g097580' AND n.display_name > 0); > > And in my case, it does yield "1". > > But if I use the function *get_feature_by_name*, retrieve the locus object > and try to get the strand of this object, I get the following error: > > [error] Can't call method "strand" on an undefined value at > get_db_features.pl line 910. > > As specified earlier, I do not have any problems if the backend is > Bio::DB::GFF. > > >From the SeqFeature::Store documentation, we know that > 'get_feature_by_name()' is in place for backward compatibility, I even > tried modifying the code snippet to call 'get_features_by_name()' instead, > but no matter what, I do not get any locus object back from this > subroutine! > > Could someone please guide me in the right direction and let me know if I > am making any mistakes here when migrating from GFF2 to GFF3? > Thanks in advance. > ~ Vivek > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From cjfields at illinois.edu Wed Feb 1 10:37:06 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 1 Feb 2012 15:37:06 +0000 Subject: [Bioperl-l] Running into problems In-Reply-To: <33243496.post@talk.nabble.com> References: <33228400.post@talk.nabble.com> <18DF7D20DFEC044098A1062202F5FFF34BD33ADBAB@exchsth.agresearch.co.nz> <33243496.post@talk.nabble.com> Message-ID: <23334B19-0139-4279-A9ED-B442EED34052@illinois.edu> That looks like a bug to me, but it's hard to say w/o having more information, such as the actual code you are using, the version of BioPerl, etc. See the BioPerl FAQ to get the version number. chris On Feb 1, 2012, at 8:33 AM, Bradyjoel wrote: > > Thank you for your help! I think I'm almost there, but I still get 1 more > error when I run the script: > > $ perl blast2.pl > Searching gi|146341649|ref|YP_001206697.1| > Undefined subroutine &main::hsp_filter called at > /usr/share/perl5/Bio/SearchIO/Writer/TextResultWriter.pm line 298, > line 71. > > > Smithies, Russell wrote: >> >> I'd probably cheat a bit and optimise my blast parameters so there's less >> output to process. >> Also, are you sure an e-value of 100 is what you're after? I'd be aiming >> much lower - probably 1e-6. >> It also pays to mask repeats if you're blasting against a whole genome to >> cut down on the number of rubbish hits. >> >> --Russell >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Bradyjoel >>> Sent: Tuesday, 31 January 2012 12:21 a.m. >>> To: Bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] Running into problems >>> >>> >>> HI all, >>> >>> I'm quite new to bioperl and tried to write a script that creates a >>> database >>> from a newly sequenced genome and then preforms a tblastn against a >>> multiple protein fasta file and then creates a blast report were only the >>> results that only preservers identity scores above 98%. However my script >>> keeps returning numerous errors and problems and since I have only a >>> little >>> experience I cannot determine were I went wrong. I include the code that >>> I >>> got so far in the attachment. Hope someone can help. >>> Regards Joel >>> >>> http://old.nabble.com/file/p33228400/blast1.pl blast1.pl >>> -- >>> View this message in context: http://old.nabble.com/Running-into- >>> problems-tp33228400p33228400.html >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://old.nabble.com/Running-into-problems-tp33228400p33243496.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From vivekkrishnakumar at gmail.com Wed Feb 1 10:50:43 2012 From: vivekkrishnakumar at gmail.com (Vivek Krishnakumar) Date: Wed, 1 Feb 2012 07:50:43 -0800 (PST) Subject: [Bioperl-l] `get_feature_by_name` not working after migrating to Bio::DB::SeqFeature::Store from a Bio::DB::GFF backend In-Reply-To: References: Message-ID: <10934743.3252.1328111443141.JavaMail.geo-discussion-forums@vbzs10> Hi Lincoln, Thanks very much for your suggestions. Not sure how the single quotation marks appeared around the $locus variable. But looks like it was only in the email. Fortunately did not have quotes around the variable in my original code. Now, when I switch over to 'get_features_by_name()', my script does not run to completion. I want to mention that this snippet of code is part of a larger CGI script that interfaces with the SeqFeature backend DB. When I modify the function call to $gff_dbh->get_features_by_name($locus), the script just runs indefinitely and returns absolutely nothing. I did put in a warn statement to see if the correct locus ID is being passed to the function (I am able to see the warning message in my apache error log), which seems to be fine. But the moment it reaches the function call step, the CGI script freezes up and I am unable to do anything. It just ends up as a rogue process owned by the 'daemon' user and continues to use up a lot of memory. I am using the following BioPerl modules in this CGI script: use Bio::SeqIO; use Bio::SearchIO; use Bio::DB::SeqFeature::Store; use Bio::SeqFeature::Generic; use Bio::Graphics; use Bio::Graphics::Feature; Could any of these be interfering with get_features_by_name()? Thank you. Vivek From vivekkrishnakumar at gmail.com Wed Feb 1 10:50:43 2012 From: vivekkrishnakumar at gmail.com (Vivek Krishnakumar) Date: Wed, 1 Feb 2012 07:50:43 -0800 (PST) Subject: [Bioperl-l] `get_feature_by_name` not working after migrating to Bio::DB::SeqFeature::Store from a Bio::DB::GFF backend In-Reply-To: References: Message-ID: <10934743.3252.1328111443141.JavaMail.geo-discussion-forums@vbzs10> Hi Lincoln, Thanks very much for your suggestions. Not sure how the single quotation marks appeared around the $locus variable. But looks like it was only in the email. Fortunately did not have quotes around the variable in my original code. Now, when I switch over to 'get_features_by_name()', my script does not run to completion. I want to mention that this snippet of code is part of a larger CGI script that interfaces with the SeqFeature backend DB. When I modify the function call to $gff_dbh->get_features_by_name($locus), the script just runs indefinitely and returns absolutely nothing. I did put in a warn statement to see if the correct locus ID is being passed to the function (I am able to see the warning message in my apache error log), which seems to be fine. But the moment it reaches the function call step, the CGI script freezes up and I am unable to do anything. It just ends up as a rogue process owned by the 'daemon' user and continues to use up a lot of memory. I am using the following BioPerl modules in this CGI script: use Bio::SeqIO; use Bio::SearchIO; use Bio::DB::SeqFeature::Store; use Bio::SeqFeature::Generic; use Bio::Graphics; use Bio::Graphics::Feature; Could any of these be interfering with get_features_by_name()? Thank you. Vivek From scott at scottcain.net Wed Feb 1 11:26:36 2012 From: scott at scottcain.net (Scott Cain) Date: Wed, 1 Feb 2012 11:26:36 -0500 Subject: [Bioperl-l] `get_feature_by_name` not working after migrating to Bio::DB::SeqFeature::Store from a Bio::DB::GFF backend In-Reply-To: <10934743.3252.1328111443141.JavaMail.geo-discussion-forums@vbzs10> References: <10934743.3252.1328111443141.JavaMail.geo-discussion-forums@vbzs10> Message-ID: Hi Vivek, I responded to your original email and I suspect you may have missed it. I'll copy it below. Another few things: how does $locus get populated? Are you sure what you expect to be there is? Also, to answer your question about the other bioperl modules you're using: no, I don't think that's interfering. Scott ------------------------------------------------ Hello Vivek, In your GFF3, you don't have any features of class "Gene". In GFF2, the class was the text string that started the ninth column, like this: chr2 . gene 30427563 30429139 . - . Gene 35804 where the class would be Gene and the name (also called group) would be 35804. Class is not a particularly well defined concept in GFF3, so the easiest way to restore functionality to your script is to change the call from this: my ($locus_obj) = $gff_dbh->get_feature_by_name('Gene' => $locus); to this: my ($locus_obj) = $gff_dbh->get_features_by_name(-name => $locus, -type => 'gene'); I believe (though haven't tested it myself in a very long time) that you can embed class in the name of the feature, like this: chr2 . gene 30427563 30429139 . - . Name=Gene:Medtr2g097580 which may or may not be easier, depending on your data and your code base. Scott On Wed, Feb 1, 2012 at 10:50 AM, Vivek Krishnakumar wrote: > Hi Lincoln, > > Thanks very much for your suggestions. Not sure how the single quotation > marks appeared around the $locus variable. But looks like it was only in > the email. Fortunately did not have quotes around the variable in my > original code. > > Now, when I switch over to 'get_features_by_name()', my script does not run > to completion. > > I want to mention that this snippet of code is part of a larger CGI script > that interfaces with the SeqFeature backend DB. When I modify the function > call to $gff_dbh->get_features_by_name($locus), the script just runs > indefinitely and returns absolutely nothing. I did put in a warn statement > to see if the correct locus ID is being passed to the function (I am able > to see the warning message in my apache error log), which seems to be fine. > But the moment it reaches the function call step, the CGI script freezes up > and I am unable to do anything. It just ends up as a rogue process owned by > the 'daemon' user and continues to use up a lot of memory. > > I am using the following BioPerl modules in this CGI script: > use Bio::SeqIO; > use Bio::SearchIO; > use Bio::DB::SeqFeature::Store; > use Bio::SeqFeature::Generic; > use Bio::Graphics; > use Bio::Graphics::Feature; > > Could any of these be interfering with get_features_by_name()? > > Thank you. > Vivek > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From joel.klein at wur.nl Wed Feb 1 11:47:14 2012 From: joel.klein at wur.nl (Bradyjoel) Date: Wed, 1 Feb 2012 08:47:14 -0800 (PST) Subject: [Bioperl-l] Running into problems In-Reply-To: <23334B19-0139-4279-A9ED-B442EED34052@illinois.edu> References: <33228400.post@talk.nabble.com> <18DF7D20DFEC044098A1062202F5FFF34BD33ADBAB@exchsth.agresearch.co.nz> <33243496.post@talk.nabble.com> <23334B19-0139-4279-A9ED-B442EED34052@illinois.edu> Message-ID: <33244172.post@talk.nabble.com> the bioperl version that I have found with this command is: $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' 1.006901 $ perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION' 49.46.48.48.54.57.48.49 And I put the perl code that I'm currently using in the attachment. Fields, Christopher J wrote: > > That looks like a bug to me, but it's hard to say w/o having more > information, such as the actual code you are using, the version of > BioPerl, etc. See the BioPerl FAQ to get the version number. > > chris > > On Feb 1, 2012, at 8:33 AM, Bradyjoel wrote: > >> >> Thank you for your help! I think I'm almost there, but I still get 1 more >> error when I run the script: >> >> $ perl blast2.pl >> Searching gi|146341649|ref|YP_001206697.1| >> Undefined subroutine &main::hsp_filter called at >> /usr/share/perl5/Bio/SearchIO/Writer/TextResultWriter.pm line 298, >> line 71. >> >> >> Smithies, Russell wrote: >>> >>> I'd probably cheat a bit and optimise my blast parameters so there's >>> less >>> output to process. >>> Also, are you sure an e-value of 100 is what you're after? I'd be aiming >>> much lower - probably 1e-6. >>> It also pays to mask repeats if you're blasting against a whole genome >>> to >>> cut down on the number of rubbish hits. >>> >>> --Russell >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Bradyjoel >>>> Sent: Tuesday, 31 January 2012 12:21 a.m. >>>> To: Bioperl-l at lists.open-bio.org >>>> Subject: [Bioperl-l] Running into problems >>>> >>>> >>>> HI all, >>>> >>>> I'm quite new to bioperl and tried to write a script that creates a >>>> database >>>> from a newly sequenced genome and then preforms a tblastn against a >>>> multiple protein fasta file and then creates a blast report were only >>>> the >>>> results that only preservers identity scores above 98%. However my >>>> script >>>> keeps returning numerous errors and problems and since I have only a >>>> little >>>> experience I cannot determine were I went wrong. I include the code >>>> that >>>> I >>>> got so far in the attachment. Hope someone can help. >>>> Regards Joel >>>> >>>> http://old.nabble.com/file/p33228400/blast1.pl blast1.pl >>>> -- >>>> View this message in context: http://old.nabble.com/Running-into- >>>> problems-tp33228400p33228400.html >>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> ======================================================================= >>> Attention: The information contained in this message and/or attachments >>> from AgResearch Limited is intended only for the persons or entities >>> to which it is addressed and may contain confidential and/or privileged >>> material. Any review, retransmission, dissemination or other use of, or >>> taking of any action in reliance upon, this information by persons or >>> entities other than the intended recipients is prohibited by AgResearch >>> Limited. If you have received this message in error, please notify the >>> sender immediately. >>> ======================================================================= >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/Running-into-problems-tp33228400p33243496.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/Running-into-problems-tp33228400p33244172.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From vivekkrishnakumar at gmail.com Wed Feb 1 11:51:46 2012 From: vivekkrishnakumar at gmail.com (Vivek Krishnakumar) Date: Wed, 1 Feb 2012 11:51:46 -0500 Subject: [Bioperl-l] `get_feature_by_name` not working after migrating to Bio::DB::SeqFeature::Store from a Bio::DB::GFF backend In-Reply-To: References: <10934743.3252.1328111443141.JavaMail.geo-discussion-forums@vbzs10> Message-ID: Hi Scott, Thanks very much for your suggestions. Looks I did miss it somehow (confusion was caused because I was using both bioperl-l at googlegroups and bioperl-l at open-bio) Anyway, I had modified my function exactly like your suggestion: my ($locus_obj) = $gff_dbh->get_features_by_name(-name => $locus, -type => 'gene'); But doing so just returns the following error: -------------------- EXCEPTION -------------------- MSG: segment() called in a scalar context but multiple features match. Either call in a list context or narrow your search using the -types or -class arguments STACK Bio::DB::SeqFeature::Store::segment /usr/local/packages/perl-5.10.1/lib/5.10.1/Bio/DB/SeqFeature/Store.pm:1322 STACK main::get_annotation_db_features /opt/www/medicago/cgi-bin/medicago/eucap/eucap.pl:899 STACK main::structural_annotation /opt/www/medicago/cgi-bin/medicago/eucap/eucap.pl:660 STACK toplevel /opt/www/medicago/cgi-bin/medicago/eucap/eucap.pl:119 ------------------------------------------- which would suggest to oneself that there are several such features with the same ID. But in fact, I was able to verify by querying the database that I have only one such locus. As for your question regarding how $locus is populated, it is populated from a CGI parameter passed to the script. I know that I am only passing it one locus ID. And as I mentioned earlier in this thread, the warning statement I inserted before making the function call shows me that there is only one ID in the $locus variable. My last resort now is to try as you suggested and modify my GFF3 file and embed the -class => 'Gene' into the "Name" attribute. While doing so, should I also embed the 'mRNA' class into the "Name" attribute of the mRNA feature like so: chr2 working_models mRNA 30427563 30429139 . - . ID=mrna_36255;Parent=gene_35804;Name=mRNA:Medtr2g097580.1;conf_class=F Subsequently, should I modify the function call to include the 'class': my ($locus_obj) = $gff_dbh->get_features_by_name(-name => $locus, -type => 'gene', -class => 'Gene'); Thank you. Vivek On Wed, Feb 1, 2012 at 11:26 AM, Scott Cain wrote: > Hi Vivek, > > I responded to your original email and I suspect you may have missed > it. I'll copy it below. Another few things: how does $locus get > populated? Are you sure what you expect to be there is? > > Also, to answer your question about the other bioperl modules you're > using: no, I don't think that's interfering. > > Scott > > ------------------------------------------------ > Hello Vivek, > > In your GFF3, you don't have any features of class "Gene". In GFF2, > the class was the text string that started the ninth column, like > this: > > chr2 . gene 30427563 30429139 . - . Gene 35804 > > where the class would be Gene and the name (also called group) would > be 35804. Class is not a particularly well defined concept in GFF3, > so the easiest way to restore functionality to your script is to > change the call from this: > > my ($locus_obj) = $gff_dbh->get_feature_by_name('Gene' => $locus); > > to this: > > my ($locus_obj) = $gff_dbh->get_features_by_name(-name => $locus, > -type => 'gene'); > > I believe (though haven't tested it myself in a very long time) that > you can embed class in the name of the feature, like this: > > chr2 . gene 30427563 30429139 . - . Name=Gene:Medtr2g097580 > > which may or may not be easier, depending on your data and your code base. > > Scott > > > On Wed, Feb 1, 2012 at 10:50 AM, Vivek Krishnakumar > wrote: > > Hi Lincoln, > > > > Thanks very much for your suggestions. Not sure how the single quotation > > marks appeared around the $locus variable. But looks like it was only in > > the email. Fortunately did not have quotes around the variable in my > > original code. > > > > Now, when I switch over to 'get_features_by_name()', my script does not > run > > to completion. > > > > I want to mention that this snippet of code is part of a larger CGI > script > > that interfaces with the SeqFeature backend DB. When I modify the > function > > call to $gff_dbh->get_features_by_name($locus), the script just runs > > indefinitely and returns absolutely nothing. I did put in a warn > statement > > to see if the correct locus ID is being passed to the function (I am able > > to see the warning message in my apache error log), which seems to be > fine. > > But the moment it reaches the function call step, the CGI script freezes > up > > and I am unable to do anything. It just ends up as a rogue process owned > by > > the 'daemon' user and continues to use up a lot of memory. > > > > I am using the following BioPerl modules in this CGI script: > > use Bio::SeqIO; > > use Bio::SearchIO; > > use Bio::DB::SeqFeature::Store; > > use Bio::SeqFeature::Generic; > > use Bio::Graphics; > > use Bio::Graphics::Feature; > > > > Could any of these be interfering with get_features_by_name()? > > > > Thank you. > > Vivek > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > From cjfields at illinois.edu Wed Feb 1 11:55:11 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 1 Feb 2012 16:55:11 +0000 Subject: [Bioperl-l] Running into problems In-Reply-To: <33244172.post@talk.nabble.com> References: <33228400.post@talk.nabble.com> <18DF7D20DFEC044098A1062202F5FFF34BD33ADBAB@exchsth.agresearch.co.nz> <33243496.post@talk.nabble.com> <23334B19-0139-4279-A9ED-B442EED34052@illinois.edu> <33244172.post@talk.nabble.com> Message-ID: <7622D0FC-5E84-461A-9966-2D6C713BC273@illinois.edu> It's possible the attachment was scrubbed, can you send this to me directly? chris On Feb 1, 2012, at 10:47 AM, Bradyjoel wrote: > > the bioperl version that I have found with this command is: > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > 1.006901 > $ perl -MBio::Root::Version -e 'printf "%vd\n", > $Bio::Root::Version::VERSION' > 49.46.48.48.54.57.48.49 > > And I put the perl code that I'm currently using in the attachment. > > > > Fields, Christopher J wrote: >> >> That looks like a bug to me, but it's hard to say w/o having more >> information, such as the actual code you are using, the version of >> BioPerl, etc. See the BioPerl FAQ to get the version number. >> >> chris >> >> On Feb 1, 2012, at 8:33 AM, Bradyjoel wrote: >> >>> >>> Thank you for your help! I think I'm almost there, but I still get 1 more >>> error when I run the script: >>> >>> $ perl blast2.pl >>> Searching gi|146341649|ref|YP_001206697.1| >>> Undefined subroutine &main::hsp_filter called at >>> /usr/share/perl5/Bio/SearchIO/Writer/TextResultWriter.pm line 298, >>> line 71. >>> >>> >>> Smithies, Russell wrote: >>>> >>>> I'd probably cheat a bit and optimise my blast parameters so there's >>>> less >>>> output to process. >>>> Also, are you sure an e-value of 100 is what you're after? I'd be aiming >>>> much lower - probably 1e-6. >>>> It also pays to mask repeats if you're blasting against a whole genome >>>> to >>>> cut down on the number of rubbish hits. >>>> >>>> --Russell >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of Bradyjoel >>>>> Sent: Tuesday, 31 January 2012 12:21 a.m. >>>>> To: Bioperl-l at lists.open-bio.org >>>>> Subject: [Bioperl-l] Running into problems >>>>> >>>>> >>>>> HI all, >>>>> >>>>> I'm quite new to bioperl and tried to write a script that creates a >>>>> database >>>>> from a newly sequenced genome and then preforms a tblastn against a >>>>> multiple protein fasta file and then creates a blast report were only >>>>> the >>>>> results that only preservers identity scores above 98%. However my >>>>> script >>>>> keeps returning numerous errors and problems and since I have only a >>>>> little >>>>> experience I cannot determine were I went wrong. I include the code >>>>> that >>>>> I >>>>> got so far in the attachment. Hope someone can help. >>>>> Regards Joel >>>>> >>>>> http://old.nabble.com/file/p33228400/blast1.pl blast1.pl >>>>> -- >>>>> View this message in context: http://old.nabble.com/Running-into- >>>>> problems-tp33228400p33228400.html >>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> ======================================================================= >>>> Attention: The information contained in this message and/or attachments >>>> from AgResearch Limited is intended only for the persons or entities >>>> to which it is addressed and may contain confidential and/or privileged >>>> material. Any review, retransmission, dissemination or other use of, or >>>> taking of any action in reliance upon, this information by persons or >>>> entities other than the intended recipients is prohibited by AgResearch >>>> Limited. If you have received this message in error, please notify the >>>> sender immediately. >>>> ======================================================================= >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/Running-into-problems-tp33228400p33243496.html >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://old.nabble.com/Running-into-problems-tp33228400p33244172.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From vivekkrishnakumar at gmail.com Wed Feb 1 12:37:47 2012 From: vivekkrishnakumar at gmail.com (Vivek Krishnakumar) Date: Wed, 1 Feb 2012 12:37:47 -0500 Subject: [Bioperl-l] `get_feature_by_name` not working after migrating to Bio::DB::SeqFeature::Store from a Bio::DB::GFF backend In-Reply-To: References: <10934743.3252.1328111443141.JavaMail.geo-discussion-forums@vbzs10> Message-ID: Hi Scott and Lincoln, Thanks for all your suggestions. I figured out the mistake I was making. Apart from switching to get_features_by_name(), I also need to switch from $locus_obj->refseq to $locus_obj->seq_id to get the reference sequence identifier. This was interfering with the $gff_dbh->segment() call. Also, it looks like I do not need to embed the 'Gene' class in the GFF file anymore. Thanks again. Regards, ~Vivek On Wed, Feb 1, 2012 at 11:51 AM, Vivek Krishnakumar < vivekkrishnakumar at gmail.com> wrote: > Hi Scott, > > Thanks very much for your suggestions. Looks I did miss it somehow > (confusion was caused because I was using both bioperl-l at googlegroups and > bioperl-l at open-bio) > > Anyway, I had modified my function exactly like your suggestion: > > my ($locus_obj) = $gff_dbh->get_features_by_name(-name => $locus, -type => > 'gene'); > > But doing so just returns the following error: > > -------------------- EXCEPTION -------------------- > MSG: segment() called in a scalar context but multiple features match. > Either call in a list context or narrow your search using the -types or -class arguments > > STACK Bio::DB::SeqFeature::Store::segment /usr/local/packages/perl-5.10.1/lib/5.10.1/Bio/DB/SeqFeature/Store.pm:1322 > STACK main::get_annotation_db_features /opt/www/medicago/cgi-bin/medicago/eucap/eucap.pl:899 > STACK main::structural_annotation /opt/www/medicago/cgi-bin/medicago/eucap/eucap.pl:660 > STACK toplevel /opt/www/medicago/cgi-bin/medicago/eucap/eucap.pl:119 > ------------------------------------------- > > which would suggest to oneself that there are several such features with > the same ID. But in fact, I was able to verify by querying the database > that I have only one such locus. > > As for your question regarding how $locus is populated, it is populated > from a CGI parameter passed to the script. I know that I am only passing it > one locus ID. And as I mentioned earlier in this thread, the warning > statement I inserted before making the function call shows me that there is > only one ID in the $locus variable. > > My last resort now is to try as you suggested and modify my GFF3 file and > embed the -class => 'Gene' into the "Name" attribute. While doing so, > should I also embed the 'mRNA' class into the "Name" attribute of the mRNA > feature like so: > > chr2 working_models mRNA 30427563 30429139 . - . > ID=mrna_36255;Parent=gene_35804;Name=mRNA:Medtr2g097580.1;conf_class=F > > Subsequently, should I modify the function call to include the 'class': > my ($locus_obj) = $gff_dbh->get_features_by_name(-name => $locus, -type => > 'gene', -class => 'Gene'); > > Thank you. > Vivek > > > On Wed, Feb 1, 2012 at 11:26 AM, Scott Cain wrote: > >> Hi Vivek, >> >> I responded to your original email and I suspect you may have missed >> it. I'll copy it below. Another few things: how does $locus get >> populated? Are you sure what you expect to be there is? >> >> Also, to answer your question about the other bioperl modules you're >> using: no, I don't think that's interfering. >> >> Scott >> >> ------------------------------------------------ >> Hello Vivek, >> >> In your GFF3, you don't have any features of class "Gene". In GFF2, >> the class was the text string that started the ninth column, like >> this: >> >> chr2 . gene 30427563 30429139 . - . Gene 35804 >> >> where the class would be Gene and the name (also called group) would >> be 35804. Class is not a particularly well defined concept in GFF3, >> so the easiest way to restore functionality to your script is to >> change the call from this: >> >> my ($locus_obj) = $gff_dbh->get_feature_by_name('Gene' => $locus); >> >> to this: >> >> my ($locus_obj) = $gff_dbh->get_features_by_name(-name => $locus, >> -type => 'gene'); >> >> I believe (though haven't tested it myself in a very long time) that >> you can embed class in the name of the feature, like this: >> >> chr2 . gene 30427563 30429139 . - . >> Name=Gene:Medtr2g097580 >> >> which may or may not be easier, depending on your data and your code base. >> >> Scott >> >> >> On Wed, Feb 1, 2012 at 10:50 AM, Vivek Krishnakumar >> wrote: >> > Hi Lincoln, >> > >> > Thanks very much for your suggestions. Not sure how the single quotation >> > marks appeared around the $locus variable. But looks like it was only in >> > the email. Fortunately did not have quotes around the variable in my >> > original code. >> > >> > Now, when I switch over to 'get_features_by_name()', my script does not >> run >> > to completion. >> > >> > I want to mention that this snippet of code is part of a larger CGI >> script >> > that interfaces with the SeqFeature backend DB. When I modify the >> function >> > call to $gff_dbh->get_features_by_name($locus), the script just runs >> > indefinitely and returns absolutely nothing. I did put in a warn >> statement >> > to see if the correct locus ID is being passed to the function (I am >> able >> > to see the warning message in my apache error log), which seems to be >> fine. >> > But the moment it reaches the function call step, the CGI script >> freezes up >> > and I am unable to do anything. It just ends up as a rogue process >> owned by >> > the 'daemon' user and continues to use up a lot of memory. >> > >> > I am using the following BioPerl modules in this CGI script: >> > use Bio::SeqIO; >> > use Bio::SearchIO; >> > use Bio::DB::SeqFeature::Store; >> > use Bio::SeqFeature::Generic; >> > use Bio::Graphics; >> > use Bio::Graphics::Feature; >> > >> > Could any of these be interfering with get_features_by_name()? >> > >> > Thank you. >> > Vivek >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> > > From lincoln.stein at gmail.com Wed Feb 1 12:57:33 2012 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Wed, 1 Feb 2012 12:57:33 -0500 Subject: [Bioperl-l] `get_feature_by_name` not working after migrating to Bio::DB::SeqFeature::Store from a Bio::DB::GFF backend In-Reply-To: References: <10934743.3252.1328111443141.JavaMail.geo-discussion-forums@vbzs10> Message-ID: Is this exception appearing at the get_features_by_name() call, or later on in the script? I don't see how it can be generated in response to get_features_by_name(). Lincoln On Wed, Feb 1, 2012 at 11:51 AM, Vivek Krishnakumar < vivekkrishnakumar at gmail.com> wrote: > Hi Scott, > > Thanks very much for your suggestions. Looks I did miss it somehow > (confusion was caused because I was using both bioperl-l at googlegroups and > bioperl-l at open-bio) > > Anyway, I had modified my function exactly like your suggestion: > my ($locus_obj) = $gff_dbh->get_features_by_name(-name => $locus, -type => > 'gene'); > > But doing so just returns the following error: > > -------------------- EXCEPTION -------------------- > MSG: segment() called in a scalar context but multiple features match. > Either call in a list context or narrow your search using the -types > or -class arguments > > STACK Bio::DB::SeqFeature::Store::segment > /usr/local/packages/perl-5.10.1/lib/5.10.1/Bio/DB/SeqFeature/Store.pm:1322 > STACK main::get_annotation_db_features > /opt/www/medicago/cgi-bin/medicago/eucap/eucap.pl:899 > STACK main::structural_annotation > /opt/www/medicago/cgi-bin/medicago/eucap/eucap.pl:660 > STACK toplevel /opt/www/medicago/cgi-bin/medicago/eucap/eucap.pl:119 > ------------------------------------------- > > which would suggest to oneself that there are several such features with > the same ID. But in fact, I was able to verify by querying the database > that I have only one such locus. > > As for your question regarding how $locus is populated, it is populated > from a CGI parameter passed to the script. I know that I am only passing it > one locus ID. And as I mentioned earlier in this thread, the warning > statement I inserted before making the function call shows me that there is > only one ID in the $locus variable. > > My last resort now is to try as you suggested and modify my GFF3 file and > embed the -class => 'Gene' into the "Name" attribute. While doing so, > should I also embed the 'mRNA' class into the "Name" attribute of the mRNA > feature like so: > > chr2 working_models mRNA 30427563 30429139 . - . > ID=mrna_36255;Parent=gene_35804;Name=mRNA:Medtr2g097580.1;conf_class=F > > Subsequently, should I modify the function call to include the 'class': > my ($locus_obj) = $gff_dbh->get_features_by_name(-name => $locus, -type => > 'gene', -class => 'Gene'); > > Thank you. > Vivek > > On Wed, Feb 1, 2012 at 11:26 AM, Scott Cain wrote: > > > Hi Vivek, > > > > I responded to your original email and I suspect you may have missed > > it. I'll copy it below. Another few things: how does $locus get > > populated? Are you sure what you expect to be there is? > > > > Also, to answer your question about the other bioperl modules you're > > using: no, I don't think that's interfering. > > > > Scott > > > > ------------------------------------------------ > > Hello Vivek, > > > > In your GFF3, you don't have any features of class "Gene". In GFF2, > > the class was the text string that started the ninth column, like > > this: > > > > chr2 . gene 30427563 30429139 . - . Gene 35804 > > > > where the class would be Gene and the name (also called group) would > > be 35804. Class is not a particularly well defined concept in GFF3, > > so the easiest way to restore functionality to your script is to > > change the call from this: > > > > my ($locus_obj) = $gff_dbh->get_feature_by_name('Gene' => $locus); > > > > to this: > > > > my ($locus_obj) = $gff_dbh->get_features_by_name(-name => $locus, > > -type => 'gene'); > > > > I believe (though haven't tested it myself in a very long time) that > > you can embed class in the name of the feature, like this: > > > > chr2 . gene 30427563 30429139 . - . > Name=Gene:Medtr2g097580 > > > > which may or may not be easier, depending on your data and your code > base. > > > > Scott > > > > > > On Wed, Feb 1, 2012 at 10:50 AM, Vivek Krishnakumar > > wrote: > > > Hi Lincoln, > > > > > > Thanks very much for your suggestions. Not sure how the single > quotation > > > marks appeared around the $locus variable. But looks like it was only > in > > > the email. Fortunately did not have quotes around the variable in my > > > original code. > > > > > > Now, when I switch over to 'get_features_by_name()', my script does not > > run > > > to completion. > > > > > > I want to mention that this snippet of code is part of a larger CGI > > script > > > that interfaces with the SeqFeature backend DB. When I modify the > > function > > > call to $gff_dbh->get_features_by_name($locus), the script just runs > > > indefinitely and returns absolutely nothing. I did put in a warn > > statement > > > to see if the correct locus ID is being passed to the function (I am > able > > > to see the warning message in my apache error log), which seems to be > > fine. > > > But the moment it reaches the function call step, the CGI script > freezes > > up > > > and I am unable to do anything. It just ends up as a rogue process > owned > > by > > > the 'daemon' user and continues to use up a lot of memory. > > > > > > I am using the following BioPerl modules in this CGI script: > > > use Bio::SeqIO; > > > use Bio::SearchIO; > > > use Bio::DB::SeqFeature::Store; > > > use Bio::SeqFeature::Generic; > > > use Bio::Graphics; > > > use Bio::Graphics::Feature; > > > > > > Could any of these be interfering with get_features_by_name()? > > > > > > Thank you. > > > Vivek > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From vivekkrishnakumar at gmail.com Wed Feb 1 13:15:17 2012 From: vivekkrishnakumar at gmail.com (Vivek Krishnakumar) Date: Wed, 1 Feb 2012 10:15:17 -0800 (PST) Subject: [Bioperl-l] `get_feature_by_name` not working after migrating to Bio::DB::SeqFeature::Store from a Bio::DB::GFF backend In-Reply-To: References: <10934743.3252.1328111443141.JavaMail.geo-discussion-forums@vbzs10> Message-ID: <26948128.3288.1328120117492.JavaMail.geo-discussion-forums@vbto23> Hi Lincoln, This exception was NOT generated by the $gff_dbh->get_features_by_name() call. It was later on in the script by the $gff_dbh->segment() call. And the reason was because I was trying to get the reference seq_id by using $locus_obj->refseq, which seems to be deprecated in DB::SeqFeature. When I switched to $locus_obj->seq_id, everything worked fine. Thanks again and sorry for the confusion. Vivek From vivekkrishnakumar at gmail.com Wed Feb 1 13:15:17 2012 From: vivekkrishnakumar at gmail.com (Vivek Krishnakumar) Date: Wed, 1 Feb 2012 10:15:17 -0800 (PST) Subject: [Bioperl-l] `get_feature_by_name` not working after migrating to Bio::DB::SeqFeature::Store from a Bio::DB::GFF backend In-Reply-To: References: <10934743.3252.1328111443141.JavaMail.geo-discussion-forums@vbzs10> Message-ID: <26948128.3288.1328120117492.JavaMail.geo-discussion-forums@vbto23> Hi Lincoln, This exception was NOT generated by the $gff_dbh->get_features_by_name() call. It was later on in the script by the $gff_dbh->segment() call. And the reason was because I was trying to get the reference seq_id by using $locus_obj->refseq, which seems to be deprecated in DB::SeqFeature. When I switched to $locus_obj->seq_id, everything worked fine. Thanks again and sorry for the confusion. Vivek From scott at scottcain.net Wed Feb 1 14:43:56 2012 From: scott at scottcain.net (Scott Cain) Date: Wed, 1 Feb 2012 14:43:56 -0500 Subject: [Bioperl-l] `get_feature_by_name` not working after migrating to Bio::DB::SeqFeature::Store from a Bio::DB::GFF backend In-Reply-To: References: <10934743.3252.1328111443141.JavaMail.geo-discussion-forums@vbzs10> Message-ID: Hi Vivek, What I don't understand from the error stack that you copied in your email is that it doesn't mention the get_features_by_name method, rather it mentions the segment method--are you sure that the line number reported in your get_annotation_db_features method corresponds to the change you made in the get_features_by_name call? I get the feeling that it is dying somewhere else now. I would say that resorting to adding class to the feature names is unlikely to help. Scott On Wed, Feb 1, 2012 at 11:51 AM, Vivek Krishnakumar wrote: > Hi Scott, > > Thanks very much for your suggestions. Looks I did miss it somehow > (confusion was caused because I was using both bioperl-l at googlegroups and > bioperl-l at open-bio) > > Anyway, I had modified my function exactly like your suggestion: > > my ($locus_obj) = $gff_dbh->get_features_by_name(-name => $locus, -type => > 'gene'); > > But doing so just returns the following error: > > -------------------- EXCEPTION -------------------- > MSG: segment() called in a scalar context but multiple features match. > Either call in a list context or narrow your search using the -types or > -class arguments > > STACK Bio::DB::SeqFeature::Store::segment > /usr/local/packages/perl-5.10.1/lib/5.10.1/Bio/DB/SeqFeature/Store.pm:1322 > STACK main::get_annotation_db_features > /opt/www/medicago/cgi-bin/medicago/eucap/eucap.pl:899 > STACK main::structural_annotation > /opt/www/medicago/cgi-bin/medicago/eucap/eucap.pl:660 > STACK toplevel /opt/www/medicago/cgi-bin/medicago/eucap/eucap.pl:119 > ------------------------------------------- > > which would suggest to oneself that there are several such features with the > same ID. But in fact, I was able to verify by querying the database that I > have only one such locus. > > As for your question regarding how $locus is populated, it is populated from > a CGI parameter passed to the script. I know that I am only passing it one > locus ID. And as I mentioned earlier in this thread, the warning statement I > inserted before making the function call shows me that there is only one ID > in the $locus variable. > > My last resort now is to try as you suggested and modify my GFF3 file and > embed the -class => 'Gene' into the "Name" attribute. While doing so, should > I also embed the 'mRNA' class into the "Name" attribute of the mRNA feature > like so: > > chr2 ? ?working_models ?mRNA ? ?30427563 ? ?30429139 ? ?. ? - ? . > ID=mrna_36255;Parent=gene_35804;Name=mRNA:Medtr2g097580.1;conf_class=F > > Subsequently, should I modify the function call to include the 'class': > my ($locus_obj) = $gff_dbh->get_features_by_name(-name => $locus, -type => > 'gene', -class => 'Gene'); > > Thank you. > Vivek > > > On Wed, Feb 1, 2012 at 11:26 AM, Scott Cain wrote: >> >> Hi Vivek, >> >> I responded to your original email and I suspect you may have missed >> it. ?I'll copy it below. ?Another few things: how does $locus get >> populated? ?Are you sure what you expect to be there is? >> >> Also, to answer your question about the other bioperl modules you're >> using: no, I don't think that's interfering. >> >> Scott >> >> ------------------------------------------------ >> Hello Vivek, >> >> In your GFF3, you don't have any features of class "Gene". ?In GFF2, >> the class was the text string that started the ninth column, like >> this: >> >> chr2 ? . ?gene ? ?30427563 ? ?30429139 ? ?. ? - ? . ?Gene 35804 >> >> where the class would be Gene and the name (also called group) would >> be 35804. ?Class is not a particularly well defined concept in GFF3, >> so the easiest way to restore functionality to your script is to >> change the call from this: >> >> ?my ($locus_obj) = $gff_dbh->get_feature_by_name('Gene' => $locus); >> >> to this: >> >> ?my ($locus_obj) = $gff_dbh->get_features_by_name(-name => $locus, >> -type => 'gene'); >> >> I believe (though haven't tested it myself in a very long time) that >> you can embed class in the name of the feature, like this: >> >> chr2 ?. ?gene ? ?30427563 ? ?30429139 ? ?. ? - ? . >> ?Name=Gene:Medtr2g097580 >> >> which may or may not be easier, depending on your data and your code base. >> >> Scott >> >> >> On Wed, Feb 1, 2012 at 10:50 AM, Vivek Krishnakumar >> wrote: >> > Hi Lincoln, >> > >> > Thanks very much for your suggestions. Not sure how the single quotation >> > marks appeared around the $locus variable. But looks like it was only in >> > the email. Fortunately did not have quotes around the variable in my >> > original code. >> > >> > Now, when I switch over to 'get_features_by_name()', my script does not >> > run >> > to completion. >> > >> > I want to mention that this snippet of code is part of a larger CGI >> > script >> > that interfaces with the SeqFeature backend DB. When I modify the >> > function >> > call to $gff_dbh->get_features_by_name($locus), the script just runs >> > indefinitely and returns absolutely nothing. I did put in a warn >> > statement >> > to see if the correct locus ID is being passed to the function (I am >> > able >> > to see the warning message in my apache error log), which seems to be >> > fine. >> > But the moment it reaches the function call step, the CGI script freezes >> > up >> > and I am unable to do anything. It just ends up as a rogue process owned >> > by >> > the 'daemon' user and continues to use up a lot of memory. >> > >> > I am using the following BioPerl modules in this CGI script: >> > use Bio::SeqIO; >> > use Bio::SearchIO; >> > use Bio::DB::SeqFeature::Store; >> > use Bio::SeqFeature::Generic; >> > use Bio::Graphics; >> > use Bio::Graphics::Feature; >> > >> > Could any of these be interfering with get_features_by_name()? >> > >> > Thank you. >> > Vivek >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Wed Feb 1 14:44:56 2012 From: scott at scottcain.net (Scott Cain) Date: Wed, 1 Feb 2012 14:44:56 -0500 Subject: [Bioperl-l] `get_feature_by_name` not working after migrating to Bio::DB::SeqFeature::Store from a Bio::DB::GFF backend In-Reply-To: References: <10934743.3252.1328111443141.JavaMail.geo-discussion-forums@vbzs10> Message-ID: /scott realizes he should refresh his mail client before sending helpful emails :-) On Wed, Feb 1, 2012 at 2:43 PM, Scott Cain wrote: > Hi Vivek, > > What I don't understand from the error stack that you copied in your > email is that it doesn't mention the get_features_by_name method, > rather it mentions the segment method--are you sure that the line > number reported in your get_annotation_db_features method corresponds > to the change you made in the get_features_by_name call? ?I get the > feeling that it is dying somewhere else now. ?I would say that > resorting to adding class to the feature names is unlikely to help. > > Scott > > > On Wed, Feb 1, 2012 at 11:51 AM, Vivek Krishnakumar > wrote: >> Hi Scott, >> >> Thanks very much for your suggestions. Looks I did miss it somehow >> (confusion was caused because I was using both bioperl-l at googlegroups and >> bioperl-l at open-bio) >> >> Anyway, I had modified my function exactly like your suggestion: >> >> my ($locus_obj) = $gff_dbh->get_features_by_name(-name => $locus, -type => >> 'gene'); >> >> But doing so just returns the following error: >> >> -------------------- EXCEPTION -------------------- >> MSG: segment() called in a scalar context but multiple features match. >> Either call in a list context or narrow your search using the -types or >> -class arguments >> >> STACK Bio::DB::SeqFeature::Store::segment >> /usr/local/packages/perl-5.10.1/lib/5.10.1/Bio/DB/SeqFeature/Store.pm:1322 >> STACK main::get_annotation_db_features >> /opt/www/medicago/cgi-bin/medicago/eucap/eucap.pl:899 >> STACK main::structural_annotation >> /opt/www/medicago/cgi-bin/medicago/eucap/eucap.pl:660 >> STACK toplevel /opt/www/medicago/cgi-bin/medicago/eucap/eucap.pl:119 >> ------------------------------------------- >> >> which would suggest to oneself that there are several such features with the >> same ID. But in fact, I was able to verify by querying the database that I >> have only one such locus. >> >> As for your question regarding how $locus is populated, it is populated from >> a CGI parameter passed to the script. I know that I am only passing it one >> locus ID. And as I mentioned earlier in this thread, the warning statement I >> inserted before making the function call shows me that there is only one ID >> in the $locus variable. >> >> My last resort now is to try as you suggested and modify my GFF3 file and >> embed the -class => 'Gene' into the "Name" attribute. While doing so, should >> I also embed the 'mRNA' class into the "Name" attribute of the mRNA feature >> like so: >> >> chr2 ? ?working_models ?mRNA ? ?30427563 ? ?30429139 ? ?. ? - ? . >> ID=mrna_36255;Parent=gene_35804;Name=mRNA:Medtr2g097580.1;conf_class=F >> >> Subsequently, should I modify the function call to include the 'class': >> my ($locus_obj) = $gff_dbh->get_features_by_name(-name => $locus, -type => >> 'gene', -class => 'Gene'); >> >> Thank you. >> Vivek >> >> >> On Wed, Feb 1, 2012 at 11:26 AM, Scott Cain wrote: >>> >>> Hi Vivek, >>> >>> I responded to your original email and I suspect you may have missed >>> it. ?I'll copy it below. ?Another few things: how does $locus get >>> populated? ?Are you sure what you expect to be there is? >>> >>> Also, to answer your question about the other bioperl modules you're >>> using: no, I don't think that's interfering. >>> >>> Scott >>> >>> ------------------------------------------------ >>> Hello Vivek, >>> >>> In your GFF3, you don't have any features of class "Gene". ?In GFF2, >>> the class was the text string that started the ninth column, like >>> this: >>> >>> chr2 ? . ?gene ? ?30427563 ? ?30429139 ? ?. ? - ? . ?Gene 35804 >>> >>> where the class would be Gene and the name (also called group) would >>> be 35804. ?Class is not a particularly well defined concept in GFF3, >>> so the easiest way to restore functionality to your script is to >>> change the call from this: >>> >>> ?my ($locus_obj) = $gff_dbh->get_feature_by_name('Gene' => $locus); >>> >>> to this: >>> >>> ?my ($locus_obj) = $gff_dbh->get_features_by_name(-name => $locus, >>> -type => 'gene'); >>> >>> I believe (though haven't tested it myself in a very long time) that >>> you can embed class in the name of the feature, like this: >>> >>> chr2 ?. ?gene ? ?30427563 ? ?30429139 ? ?. ? - ? . >>> ?Name=Gene:Medtr2g097580 >>> >>> which may or may not be easier, depending on your data and your code base. >>> >>> Scott >>> >>> >>> On Wed, Feb 1, 2012 at 10:50 AM, Vivek Krishnakumar >>> wrote: >>> > Hi Lincoln, >>> > >>> > Thanks very much for your suggestions. Not sure how the single quotation >>> > marks appeared around the $locus variable. But looks like it was only in >>> > the email. Fortunately did not have quotes around the variable in my >>> > original code. >>> > >>> > Now, when I switch over to 'get_features_by_name()', my script does not >>> > run >>> > to completion. >>> > >>> > I want to mention that this snippet of code is part of a larger CGI >>> > script >>> > that interfaces with the SeqFeature backend DB. When I modify the >>> > function >>> > call to $gff_dbh->get_features_by_name($locus), the script just runs >>> > indefinitely and returns absolutely nothing. I did put in a warn >>> > statement >>> > to see if the correct locus ID is being passed to the function (I am >>> > able >>> > to see the warning message in my apache error log), which seems to be >>> > fine. >>> > But the moment it reaches the function call step, the CGI script freezes >>> > up >>> > and I am unable to do anything. It just ends up as a rogue process owned >>> > by >>> > the 'daemon' user and continues to use up a lot of memory. >>> > >>> > I am using the following BioPerl modules in this CGI script: >>> > use Bio::SeqIO; >>> > use Bio::SearchIO; >>> > use Bio::DB::SeqFeature::Store; >>> > use Bio::SeqFeature::Generic; >>> > use Bio::Graphics; >>> > use Bio::Graphics::Feature; >>> > >>> > Could any of these be interfering with get_features_by_name()? >>> > >>> > Thank you. >>> > Vivek >>> > _______________________________________________ >>> > Bioperl-l mailing list >>> > Bioperl-l at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >>> dot net >>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >>> Ontario Institute for Cancer Research >> >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net > GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 > Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From Russell.Smithies at agresearch.co.nz Wed Feb 1 16:31:49 2012 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 2 Feb 2012 10:31:49 +1300 Subject: [Bioperl-l] Running into problems In-Reply-To: <33243496.post@talk.nabble.com> References: <33228400.post@talk.nabble.com> <18DF7D20DFEC044098A1062202F5FFF34BD33ADBAB@exchsth.agresearch.co.nz> <33243496.post@talk.nabble.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF34BD33ADBC2@exchsth.agresearch.co.nz> Sorry, typo. Make sure you have your hsp_filter sub in there somewhere. --Rusell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Bradyjoel > Sent: Thursday, 2 February 2012 3:34 a.m. > To: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Running into problems > > > Thank you for your help! I think I'm almost there, but I still get 1 more error > when I run the script: > > $ perl blast2.pl > Searching gi|146341649|ref|YP_001206697.1| Undefined subroutine > &main::hsp_filter called at > /usr/share/perl5/Bio/SearchIO/Writer/TextResultWriter.pm line 298, > line 71. > > > Smithies, Russell wrote: > > > > I'd probably cheat a bit and optimise my blast parameters so there's > > less output to process. > > Also, are you sure an e-value of 100 is what you're after? I'd be > > aiming much lower - probably 1e-6. > > It also pays to mask repeats if you're blasting against a whole genome > > to cut down on the number of rubbish hits. > > > > --Russell > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Bradyjoel > >> Sent: Tuesday, 31 January 2012 12:21 a.m. > >> To: Bioperl-l at lists.open-bio.org > >> Subject: [Bioperl-l] Running into problems > >> > >> > >> HI all, > >> > >> I'm quite new to bioperl and tried to write a script that creates a > >> database from a newly sequenced genome and then preforms a tblastn > >> against a multiple protein fasta file and then creates a blast report > >> were only the results that only preservers identity scores above 98%. > >> However my script keeps returning numerous errors and problems and > >> since I have only a little experience I cannot determine were I went > >> wrong. I include the code that I got so far in the attachment. Hope > >> someone can help. > >> Regards Joel > >> > >> http://old.nabble.com/file/p33228400/blast1.pl blast1.pl > >> -- > >> View this message in context: http://old.nabble.com/Running-into- > >> problems-tp33228400p33228400.html > >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ========================================================== > ============ > > = > > Attention: The information contained in this message and/or > > attachments from AgResearch Limited is intended only for the persons > > or entities to which it is addressed and may contain confidential > > and/or privileged material. Any review, retransmission, dissemination > > or other use of, or taking of any action in reliance upon, this > > information by persons or entities other than the intended recipients > > is prohibited by AgResearch Limited. If you have received this message > > in error, please notify the sender immediately. > > > ========================================================== > ============ > > = > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > View this message in context: http://old.nabble.com/Running-into- > problems-tp33228400p33243496.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From adlai at refenestration.com Wed Feb 1 16:43:37 2012 From: adlai at refenestration.com (Adlai Burman) Date: Wed, 1 Feb 2012 22:43:37 +0100 Subject: [Bioperl-l] Lineage from GB files In-Reply-To: References: <6CBCE402-E3C8-4FE0-BA57-4A7B82C3CF5F@refenestration.com> Message-ID: Thanks, Surya. I'll give it a shot. Adlai On Jan 31, 2012, at 10:49 PM, Surya Saha wrote: > Hi Adlai, > > It really depends on what items are present the Genbank/EMBL. You can use the NCBI Taxonomy database and Taxonomy modules in CPAN to identify the taxonomic hierarchy of an accession, for e.g., you can map the GI to Taxonomy ID and extract the taxonomy using Bio::LITE::Taxonomy::NCBI. > > Here's a script (not authored by me) on Github that might get you started. > > -Surya > > > On Fri, Jan 27, 2012 at 6:27 AM, Adlai Burman wrote: > Does anyone know if there is a way to batch extract taxa such as class, order in Perl from, e/g/ genbank, EMBL records? I know that genus/species and some of the higher taxa are easy to parse from gb records but the interior are inconsistent strings (e.g. element x sometimes is a subclass and sometimes a family. > Any help would really be appreciated. > > Thanks. > Adlai > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From invite+2mkwrx5a at facebookmail.com Wed Feb 1 22:16:52 2012 From: invite+2mkwrx5a at facebookmail.com (Puneet Kadimi) Date: Wed, 1 Feb 2012 19:16:52 -0800 Subject: [Bioperl-l] Check out my photos on Facebook Message-ID: <821a168884640d062e02ccd7a607c5ff@async.facebook.com> Hi Bioperl-l, I set up a Facebook profile where I can post my pictures, videos and events and I want to add you as a friend so you can see it. First, you need to join Facebook! Once you join, you can also create your own profile. Thanks, Puneet To sign up for Facebook, follow the link below: http://www.facebook.com/p.php?i=1646147841&k=AQBEf7iEx7uHcyAqQibbKMtZbUnyPj4MjFuGd4aBfXw2itCG-JcHfP9uwAy7Du4m2r2eHw&r Already have an account? Add this email address to your account: http://www.facebook.com/n/?merge_accounts.php&e=bioperl-l%40bioperl.org&c=AQDNbz0MpsGYMKLlzfTt1q_SQWUXniSEkqIU1W3otHmU8Q ======================================= This message was sent to bioperl-l at bioperl.org. If you don't want to receive these emails from Facebook in the future, please follow the link below to unsubscribe. http://www.facebook.com/o.php?k=24bbd2&u=1753461434&mid=593fd51G6883b2baG0G8 Facebook, Inc. Attention: Department 415 P.O Box 10005 Palo Alto CA 94303 From joel.klein at wur.nl Thu Feb 2 05:29:11 2012 From: joel.klein at wur.nl (Bradyjoel) Date: Thu, 2 Feb 2012 02:29:11 -0800 (PST) Subject: [Bioperl-l] Running into problems In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF34BD33ADBC2@exchsth.agresearch.co.nz> References: <33228400.post@talk.nabble.com> <18DF7D20DFEC044098A1062202F5FFF34BD33ADBAB@exchsth.agresearch.co.nz> <33243496.post@talk.nabble.com> <18DF7D20DFEC044098A1062202F5FFF34BD33ADBC2@exchsth.agresearch.co.nz> Message-ID: <33249286.post@talk.nabble.com> That did the trick, its working now thank you all very much! Smithies, Russell wrote: > > Sorry, typo. > Make sure you have your hsp_filter sub in there somewhere. > > --Rusell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Bradyjoel >> Sent: Thursday, 2 February 2012 3:34 a.m. >> To: Bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Running into problems >> >> >> Thank you for your help! I think I'm almost there, but I still get 1 more >> error >> when I run the script: >> >> $ perl blast2.pl >> Searching gi|146341649|ref|YP_001206697.1| Undefined subroutine >> &main::hsp_filter called at >> /usr/share/perl5/Bio/SearchIO/Writer/TextResultWriter.pm line 298, >> line 71. >> >> >> Smithies, Russell wrote: >> > >> > I'd probably cheat a bit and optimise my blast parameters so there's >> > less output to process. >> > Also, are you sure an e-value of 100 is what you're after? I'd be >> > aiming much lower - probably 1e-6. >> > It also pays to mask repeats if you're blasting against a whole genome >> > to cut down on the number of rubbish hits. >> > >> > --Russell >> > >> >> -----Original Message----- >> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> >> bounces at lists.open-bio.org] On Behalf Of Bradyjoel >> >> Sent: Tuesday, 31 January 2012 12:21 a.m. >> >> To: Bioperl-l at lists.open-bio.org >> >> Subject: [Bioperl-l] Running into problems >> >> >> >> >> >> HI all, >> >> >> >> I'm quite new to bioperl and tried to write a script that creates a >> >> database from a newly sequenced genome and then preforms a tblastn >> >> against a multiple protein fasta file and then creates a blast report >> >> were only the results that only preservers identity scores above 98%. >> >> However my script keeps returning numerous errors and problems and >> >> since I have only a little experience I cannot determine were I went >> >> wrong. I include the code that I got so far in the attachment. Hope >> >> someone can help. >> >> Regards Joel >> >> >> >> http://old.nabble.com/file/p33228400/blast1.pl blast1.pl >> >> -- >> >> View this message in context: http://old.nabble.com/Running-into- >> >> problems-tp33228400p33228400.html >> >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> ========================================================== >> ============ >> > = >> > Attention: The information contained in this message and/or >> > attachments from AgResearch Limited is intended only for the persons >> > or entities to which it is addressed and may contain confidential >> > and/or privileged material. Any review, retransmission, dissemination >> > or other use of, or taking of any action in reliance upon, this >> > information by persons or entities other than the intended recipients >> > is prohibited by AgResearch Limited. If you have received this message >> > in error, please notify the sender immediately. >> > >> ========================================================== >> ============ >> > = >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > >> >> -- >> View this message in context: http://old.nabble.com/Running-into- >> problems-tp33228400p33243496.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/Running-into-problems-tp33228400p33249286.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jw12 at sanger.ac.uk Thu Feb 2 04:46:25 2012 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Thu, 2 Feb 2012 09:46:25 +0000 Subject: [Bioperl-l] Only 10 days left to register for DAS Workshop 2012 Message-ID: Only 10 days left to register for DAS Workshop 2012 DAS is currently being used to share annotations on genomes, protein alignments, structural and interaction information. If you are interested in sharing biological information the DAS workshop below may be of interest to you. Learn of and contribute to current developments in DAS such as: DAS in the cloud, DAS for Genotype Data, DAS searching, DAS for collaborative annotation projects, DAS alternative formats. Registration is open for the 2012 DAS workshop (27-29 February) at the Genome Campus, Hinxton UK. If you are interested in attending, please find out more by going to http://www.ebi.ac.uk/training/onsite/120227_DAS.html and register via the web link at the bottom of the page. This workshop will cater for novice to expert DAS users as each day is optional. Please register early as places will be limited. Registration closes 10 February 2012 - 12:00. If you are interested in giving a 15 minute talk on the second day please email Jonathan Warren using jonathan.warren at sanger.ac.uk Many thanks The Sanger/EBI DAS team. Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From invite+2mkwrx5a at facebookmail.com Wed Feb 1 22:18:14 2012 From: invite+2mkwrx5a at facebookmail.com (Puneet Kadimi) Date: Wed, 1 Feb 2012 19:18:14 -0800 Subject: [Bioperl-l] Check out my photos on Facebook Message-ID: <8864f9c408cc7de0cfa8b7ea40217bfc@async.facebook.com> Hi Bioperl-l, I set up a Facebook profile where I can post my pictures, videos and events and I want to add you as a friend so you can see it. First, you need to join Facebook! Once you join, you can also create your own profile. Thanks, Puneet To sign up for Facebook, follow the link below: http://www.facebook.com/p.php?i=1646147841&k=AQAVgAJy52mb1-PB2xQ61qhtO-fF4-23pEw0pLALyeJvw_fAgUtdIEMjZXJKwFERNxWSvA&r Already have an account? Add this email address to your account: http://www.facebook.com/n/?merge_accounts.php&e=bioperl-l%40bioperl.org&c=AQDNbz0MpsGYMKLlzfTt1q_SQWUXniSEkqIU1W3otHmU8Q ======================================= This message was sent to bioperl-l at bioperl.org. If you don't want to receive these emails from Facebook in the future, please follow the link below to unsubscribe. http://www.facebook.com/o.php?k=24bbd2&u=1753461434&mid=593fda3G6883b2baG0G8 Facebook, Inc. Attention: Department 415 P.O Box 10005 Palo Alto CA 94303 From dtorrente at javeriana.edu.co Sun Feb 5 14:04:47 2012 From: dtorrente at javeriana.edu.co (DANIEL EDUARDO TORRENTE QUINTERO) Date: Sun, 5 Feb 2012 19:04:47 +0000 Subject: [Bioperl-l] Bioperl parsing output file Message-ID: i need some help in this script #!/usr/bin/perl use Bio::SearchIO; $report_obj = new Bio::SearchIO(-format => 'blast', -file => 'C:\blast-2.2.25+\Lib3_consensus_dbAt.xml'); while( $result = $report_obj->next_result ) { while( $hit = $result->next_hit ) { while( $hsp = $hit->next_hsp ) { if ( $hsp->evalue <= 0.00001 ) { print "Hit\t", $hit->name, "\n", "Length\t", $hsp->length('total'), "\n", "Percent_id\t", $hsp->percent_identity, "\n",$result>query_name(),; } } } } I want to export the results of de perl script on a file but i dont know how..... i tried to use all the Bio::SearchIO::Writer methods but i cant make it work with this script. From heikki.lehvaslaiho at gmail.com Mon Feb 6 01:30:45 2012 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 6 Feb 2012 09:30:45 +0300 Subject: [Bioperl-l] Bioperl parsing output file In-Reply-To: References: Message-ID: Daniel, You are already printing the output to STDOUT which goes to the terminal. You can redirect the output to a file using "prog > file" syntax on the command line or read about the perl open function (perldoc -f open) and print directly to a named file from within a script. ? ?? -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho http://about.me/heikki cell: +966 545 595 849? office: +966 2 808 2429 Computational Bioscience Research Center (CBRC), Building #2, Office #4337 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 5 February 2012 22:04, DANIEL EDUARDO TORRENTE QUINTERO wrote: > i need some help in this script > > > #!/usr/bin/perl > use Bio::SearchIO; > > $report_obj = new Bio::SearchIO(-format => 'blast', > ? ? ? ? ? ? ? ? ? ? -file ? => 'C:\blast-2.2.25+\Lib3_consensus_dbAt.xml'); > ? ?while( $result = $report_obj->next_result ) { > ? ? ? ?while( $hit = $result->next_hit ) { > ? ? ? ? ? while( $hsp = $hit->next_hsp ) { > ? ? ? ? ? ? ?if ( $hsp->evalue <= 0.00001 ) { > ? ? ? ? ? ? ? ? ? print "Hit\t", $hit->name, "\n", "Length\t", $hsp->length('total'), > ? ? ? ? ? ? ? ? ? "\n", "Percent_id\t", > ? ? ? ? ? ? ? ? ? $hsp->percent_identity, "\n",$result>query_name(),; ?} ?} } } > > > I want to export the results of de perl script on a file but i dont know how..... i tried to use all the ?Bio::SearchIO::Writer methods but i cant make it work with this script. > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florian.lajus at labri.fr Tue Feb 7 14:49:02 2012 From: florian.lajus at labri.fr (florian lajus) Date: Tue, 07 Feb 2012 20:49:02 +0100 Subject: [Bioperl-l] Question on seqfeature mapping In-Reply-To: <054FEA92-8C5E-47D6-86AF-F71DEAFE2B63@illinois.edu> References: <4F0C7017.90803@inria.fr> <4F0D83D8.90402@inria.fr> <4F0D8A18.1080606@inria.fr> <4F0EA69E.8040203@inria.fr> <49CC4A91-5E47-4866-AD63-62DC7F649CE6@drycafe.net> <4F0FF883.80109@inria.fr> <4F0FF8FF.6070706@inria.fr> <4F199A27.8030408@inria.fr> <054FEA92-8C5E-47D6-86AF-F71DEAFE2B63@illinois.edu> Message-ID: <4F31802E.9030002@labri.fr> Hi, I have a problem with bio queries: How can I retrieve from datadabse a seqfeature according to its annotation (tagname and value)? The problem coming for value as we have "value" => "=>{bioentry_qualifier_value,seqfeature_qualifier_value,location_qualifier_value}.value", in the %slot_attribut_map of the base driver. Do you know a solution? From kasandrah at gmail.com Tue Feb 7 11:11:57 2012 From: kasandrah at gmail.com (Casandra) Date: Tue, 7 Feb 2012 17:11:57 +0100 Subject: [Bioperl-l] help! Message-ID: <8DFE8929-F3F7-4D0A-AD9E-A11983020EDF@gmail.com> Hi, I'm trying to install Bioperl but I'm a bit lost. I know I have perl installed becaused I have already write some scripts but I'm biologist so... not pretty sure about what messages say. My perl version: This is perl, v5.8.8 built for darwin-thread-multi-2level My computer: Mac OS X Vesion 10.5.8 I was following this preliminary steps: -------------- PRELIMINARY PREPARATION This is optional, but regardless of your subsequent choice of installation method, it will help to carry out the following steps. They will increase the likelyhood of installation success (especially of optional dependencies). * Upgrade CPAN: >perl -MCPAN -e shell cpan>install Bundle::CPAN cpan>q * Install/upgrade Module::Build, and make it your preferred installer: >cpan cpan>install Module::Build cpan>o conf prefer_installer MB cpan>o conf commit cpan>q * Install the expat library by whatever method is appropriate for your system. * If your expat library is installed in a non-standard location, tell CPAN about it: >cpan cpan>o conf makepl_arg "EXPATLIBPATH=/non-standard/lib EXPATINCPATH=/ non-standard/include" cpan>o conf commit -------------- And I think I did "Upgrade CPAN properly" but when I tried the next one it started asking too many things to me, and finally it stopped due to "some problems". In text file you can see the whole process. What did I do wrong? After solving these preliminary steps, what should I do? What exactly .tar or .whatever should I download to install? I don't see the difference between installing it through "built.PL" or CPAN. And I don't know if I should do this or that "Fink*" stuff for MAC. * I went to Fink webpage and what I expected to see was "hello! download Bioperl simply clicking here!" but far from this, what it seems is that first I have to download some kinf of Fink-program before starting with Bioperl... is it something close to this? I'm sorry, too many questions... But I really want to learn to use Bioperl but I have no people to ask it face to face. Thank you so much, Casandra -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: terminal.txt URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From limeyloos at gmail.com Mon Feb 6 11:09:18 2012 From: limeyloos at gmail.com (Em Rich) Date: Mon, 6 Feb 2012 08:09:18 -0800 (PST) Subject: [Bioperl-l] Embedding hyperlinks in SVG files Message-ID: Hi, I am currently using Bio::Graphics to render sequence images into SVG format. As SVGs can inherently embed hyperlinks using the xlink attribute I was wondering if there is a straight forward way of adding this attribute type in Bio::Graphics rather than creating an image map. Thanks, Emily From scott at scottcain.net Tue Feb 7 15:26:43 2012 From: scott at scottcain.net (Scott Cain) Date: Tue, 7 Feb 2012 15:26:43 -0500 Subject: [Bioperl-l] Embedding hyperlinks in SVG files In-Reply-To: References: Message-ID: Hi Emily, I'm reasonably sure that it isn't supported. GD::SVG was developed to support creating hi-res images from GBrowse to be used in documentation or publications, and so, supporting links wasn't on our radar. I imagine it wouldn't be very hard to add if you wanted to send a patch to the author (Todd Harris, cc'ed here, so he can contradict me if he wants). Scott On Mon, Feb 6, 2012 at 11:09 AM, Em Rich wrote: > Hi, > > I am currently using Bio::Graphics to render sequence images into SVG > format. As SVGs can inherently embed hyperlinks using the xlink > attribute I was wondering if there is a straight forward way of adding > this attribute type in Bio::Graphics rather than creating an image > map. > > Thanks, > > Emily > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Tue Feb 7 15:55:25 2012 From: scott at scottcain.net (Scott Cain) Date: Tue, 7 Feb 2012 15:55:25 -0500 Subject: [Bioperl-l] help! In-Reply-To: <8DFE8929-F3F7-4D0A-AD9E-A11983020EDF@gmail.com> References: <8DFE8929-F3F7-4D0A-AD9E-A11983020EDF@gmail.com> Message-ID: hi Cassandra, I don't have an answer for you at the moment. It seems to me that using local::lib is a good idea, but I've never found a good tutorial for using it, so I haven't. Perhaps someone else on the list can suggest one. The other thing I just wanted to mention as the admin that approved your message--I came very close to deleting it from the queue without looking at it because it is not unusual for spam messages to have generic subjects like "help!" (just for future reference :-) Scott On Tue, Feb 7, 2012 at 11:11 AM, Casandra wrote: > Hi, > > I'm trying to install Bioperl but I'm a bit lost. I know I have perl > installed becaused I have already write some scripts but I'm biologist so... > not pretty sure about what messages say. > > My perl version: > This is perl, v5.8.8 built for darwin-thread-multi-2level > My computer: > Mac OS X Vesion 10.5.8 > > I was following this preliminary steps: > > -------------- > > PRELIMINARY PREPARATION > > This is optional, but regardless of your subsequent choice of > installation method, it will help to carry out the following steps. > They will increase the likelyhood of installation success > (especially of optional dependencies). > > * Upgrade CPAN: > > >perl -MCPAN -e shell > cpan>install Bundle::CPAN > cpan>q > > * Install/upgrade Module::Build, and make it your preferred > installer: > > >cpan > cpan>install Module::Build > cpan>o conf prefer_installer MB > cpan>o conf commit > cpan>q > > * Install the expat library by whatever method is > appropriate for your system. > > * If your expat library is installed in a non-standard location, > tell CPAN about it: > > >cpan > cpan>o conf makepl_arg "EXPATLIBPATH=/non-standard/lib > EXPATINCPATH=/non-standard/include" > cpan>o conf commit > > -------------- > > And I think I did "Upgrade CPAN properly" but when I tried the next one it > started asking too many things to me, and finally it stopped due to "some > problems". In text file you can see the whole process. > What did I do wrong? > > > After solving these preliminary steps, what should I do? What exactly .tar > or .whatever should I download to install? > > I don't see the difference between installing it through "built.PL" or > ?CPAN. And I don't know if I should do this or that "Fink*" stuff for MAC. > > * I went to Fink webpage and what I expected to see was "hello! download > Bioperl simply clicking here!" but far from this, what it seems is that > first I have to download some kinf of Fink-program before starting with > Bioperl... is it something close to this? > > I'm sorry, too many questions... But I really want to learn to use Bioperl > but I have no people to ask it face to face. > > Thank you so much, > > Casandra > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Tue Feb 7 16:02:50 2012 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 7 Feb 2012 15:02:50 -0600 Subject: [Bioperl-l] help! In-Reply-To: References: <8DFE8929-F3F7-4D0A-AD9E-A11983020EDF@gmail.com> Message-ID: <4F31917A.9030804@illinois.edu> I guess one key question is where these CPAN installation instructions come from. They're a bit odd, and if this is from the wiki we need to do some updating. Re: local::lib, the docs on CPAN are pretty nice if one wants to use a single perl version. https://metacpan.org/module/local::lib#The-bootstrapping-technique In my case I use perlbrew (which is all local by default, and allows switching between perl versions). Highly recommend using either simple local::lib or perlbrew in combination with cpanm. https://metacpan.org/module/perlbrew https://metacpan.org/module/cpanm chris On 02/07/2012 02:55 PM, Scott Cain wrote: > hi Cassandra, > > I don't have an answer for you at the moment. It seems to me that > using local::lib is a good idea, but I've never found a good tutorial > for using it, so I haven't. Perhaps someone else on the list can > suggest one. > > The other thing I just wanted to mention as the admin that approved > your message--I came very close to deleting it from the queue without > looking at it because it is not unusual for spam messages to have > generic subjects like "help!" (just for future reference :-) > > Scott > > > On Tue, Feb 7, 2012 at 11:11 AM, Casandra wrote: >> Hi, >> >> I'm trying to install Bioperl but I'm a bit lost. I know I have perl >> installed becaused I have already write some scripts but I'm biologist so... >> not pretty sure about what messages say. >> >> My perl version: >> This is perl, v5.8.8 built for darwin-thread-multi-2level >> My computer: >> Mac OS X Vesion 10.5.8 >> >> I was following this preliminary steps: >> >> -------------- >> >> PRELIMINARY PREPARATION >> >> This is optional, but regardless of your subsequent choice of >> installation method, it will help to carry out the following steps. >> They will increase the likelyhood of installation success >> (especially of optional dependencies). >> >> * Upgrade CPAN: >> >> >perl -MCPAN -e shell >> cpan>install Bundle::CPAN >> cpan>q >> >> * Install/upgrade Module::Build, and make it your preferred >> installer: >> >> >cpan >> cpan>install Module::Build >> cpan>o conf prefer_installer MB >> cpan>o conf commit >> cpan>q >> >> * Install the expat library by whatever method is >> appropriate for your system. >> >> * If your expat library is installed in a non-standard location, >> tell CPAN about it: >> >> >cpan >> cpan>o conf makepl_arg "EXPATLIBPATH=/non-standard/lib >> EXPATINCPATH=/non-standard/include" >> cpan>o conf commit >> >> -------------- >> >> And I think I did "Upgrade CPAN properly" but when I tried the next one it >> started asking too many things to me, and finally it stopped due to "some >> problems". In text file you can see the whole process. >> What did I do wrong? >> >> >> After solving these preliminary steps, what should I do? What exactly .tar >> or .whatever should I download to install? >> >> I don't see the difference between installing it through "built.PL" or >> CPAN. And I don't know if I should do this or that "Fink*" stuff for MAC. >> >> * I went to Fink webpage and what I expected to see was "hello! download >> Bioperl simply clicking here!" but far from this, what it seems is that >> first I have to download some kinf of Fink-program before starting with >> Bioperl... is it something close to this? >> >> I'm sorry, too many questions... But I really want to learn to use Bioperl >> but I have no people to ask it face to face. >> >> Thank you so much, >> >> Casandra >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From scott at scottcain.net Tue Feb 7 16:12:19 2012 From: scott at scottcain.net (Scott Cain) Date: Tue, 7 Feb 2012 16:12:19 -0500 Subject: [Bioperl-l] help! In-Reply-To: <4F31917A.9030804@illinois.edu> References: <8DFE8929-F3F7-4D0A-AD9E-A11983020EDF@gmail.com> <4F31917A.9030804@illinois.edu> Message-ID: Yes, but those doc don't address exactly the problem Cassandra is having, that she wants to use local::lib, but there need to be some prereqs installed, but they can't be because she chose to use local::lib, and it's not installed. That's all fine if you're not a newbie and know how to properly install the prereqs before using the cpan shell, but when following instructions that say "use local::lib", I find that the instructions are completely insufficient in actually getting the desired software installed. Thus the need for a good tutorial. Scott On Tue, Feb 7, 2012 at 4:02 PM, Chris Fields wrote: > I guess one key question is where these CPAN installation instructions come > from. ?They're a bit odd, and if this is from the wiki we need to do some > updating. > > Re: local::lib, the docs on CPAN are pretty nice if one wants to use a > single perl version. > > https://metacpan.org/module/local::lib#The-bootstrapping-technique > > In my case I use perlbrew (which is all local by default, and allows > switching between perl versions). ?Highly recommend using either simple > local::lib or perlbrew in combination with cpanm. > > https://metacpan.org/module/perlbrew > https://metacpan.org/module/cpanm > > chris > > > > On 02/07/2012 02:55 PM, Scott Cain wrote: >> >> hi Cassandra, >> >> I don't have an answer for you at the moment. ?It seems to me that >> using local::lib is a good idea, but I've never found a good tutorial >> for using it, so I haven't. ?Perhaps someone else on the list can >> suggest one. >> >> The other thing I just wanted to mention as the admin that approved >> your message--I came very close to deleting it from the queue without >> looking at it because it is not unusual for spam messages to have >> generic subjects like "help!" ?(just for future reference :-) >> >> Scott >> >> >> On Tue, Feb 7, 2012 at 11:11 AM, Casandra ?wrote: >>> >>> Hi, >>> >>> I'm trying to install Bioperl but I'm a bit lost. I know I have perl >>> installed becaused I have already write some scripts but I'm biologist >>> so... >>> not pretty sure about what messages say. >>> >>> My perl version: >>> This is perl, v5.8.8 built for darwin-thread-multi-2level >>> My computer: >>> Mac OS X Vesion 10.5.8 >>> >>> I was following this preliminary steps: >>> >>> -------------- >>> >>> PRELIMINARY PREPARATION >>> >>> ? ?This is optional, but regardless of your subsequent choice of >>> ? ?installation method, it will help to carry out the following steps. >>> ? ?They will increase the likelyhood of installation success >>> ? ?(especially of optional dependencies). >>> >>> ? ? ?* Upgrade CPAN: >>> >>> ?>perl -MCPAN -e shell >>> ?cpan>install Bundle::CPAN >>> ?cpan>q >>> >>> ? ? ?* Install/upgrade Module::Build, and make it your preferred >>> ? ? ? ?installer: >>> >>> ?>cpan >>> ?cpan>install Module::Build >>> ?cpan>o conf prefer_installer MB >>> ?cpan>o conf commit >>> ?cpan>q >>> >>> ? ? ?* Install the expat library by whatever method is >>> ? ? ? ?appropriate for your system. >>> >>> ? ? ?* If your expat library is installed in a non-standard location, >>> ? ? ? ?tell CPAN about it: >>> >>> ?>cpan >>> ?cpan>o conf makepl_arg "EXPATLIBPATH=/non-standard/lib >>> EXPATINCPATH=/non-standard/include" >>> ?cpan>o conf commit >>> >>> -------------- >>> >>> And I think I did "Upgrade CPAN properly" but when I tried the next one >>> it >>> started asking too many things to me, and finally it stopped due to "some >>> problems". In text file you can see the whole process. >>> What did I do wrong? >>> >>> >>> After solving these preliminary steps, what should I do? What exactly >>> .tar >>> or .whatever should I download to install? >>> >>> I don't see the difference between installing it through "built.PL" or >>> ?CPAN. And I don't know if I should do this or that "Fink*" stuff for >>> MAC. >>> >>> * I went to Fink webpage and what I expected to see was "hello! download >>> Bioperl simply clicking here!" but far from this, what it seems is that >>> first I have to download some kinf of Fink-program before starting with >>> Bioperl... is it something close to this? >>> >>> I'm sorry, too many questions... But I really want to learn to use >>> Bioperl >>> but I have no people to ask it face to face. >>> >>> Thank you so much, >>> >>> Casandra >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Tue Feb 7 16:27:38 2012 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 7 Feb 2012 15:27:38 -0600 Subject: [Bioperl-l] help! In-Reply-To: References: <8DFE8929-F3F7-4D0A-AD9E-A11983020EDF@gmail.com> <4F31917A.9030804@illinois.edu> Message-ID: <4F31974A.8040804@illinois.edu> Right, I see. However, Casandra didn't mention using local::lib, just using CPAN in general (hence my initial question). IIRC local::lib, if set up correctly, should take care of the installation paths for CPAN/cpanm/etc regardless whether it's bioperl or any other perl module. Last I checked this didn't seem to be an issue with bioperl; I installed Bioperl, as well as any prereqs, Bio::DB::Sam/BigWig, Moose, DBD::SQLite, etc, on our local SGI using local::lib in a common project space (with perl 5.10.1) w/o a problem. If there is a problem specific to BioPerl, we should probably try to fix it and clarify any workarounds in the meantime. chris On 02/07/2012 03:12 PM, Scott Cain wrote: > Yes, but those doc don't address exactly the problem Cassandra is > having, that she wants to use local::lib, but there need to be some > prereqs installed, but they can't be because she chose to use > local::lib, and it's not installed. That's all fine if you're not a > newbie and know how to properly install the prereqs before using the > cpan shell, but when following instructions that say "use local::lib", > I find that the instructions are completely insufficient in actually > getting the desired software installed. Thus the need for a good > tutorial. > > Scott > > > On Tue, Feb 7, 2012 at 4:02 PM, Chris Fields wrote: >> I guess one key question is where these CPAN installation instructions come >> from. They're a bit odd, and if this is from the wiki we need to do some >> updating. >> >> Re: local::lib, the docs on CPAN are pretty nice if one wants to use a >> single perl version. >> >> https://metacpan.org/module/local::lib#The-bootstrapping-technique >> >> In my case I use perlbrew (which is all local by default, and allows >> switching between perl versions). Highly recommend using either simple >> local::lib or perlbrew in combination with cpanm. >> >> https://metacpan.org/module/perlbrew >> https://metacpan.org/module/cpanm >> >> chris >> >> >> >> On 02/07/2012 02:55 PM, Scott Cain wrote: >>> >>> hi Cassandra, >>> >>> I don't have an answer for you at the moment. It seems to me that >>> using local::lib is a good idea, but I've never found a good tutorial >>> for using it, so I haven't. Perhaps someone else on the list can >>> suggest one. >>> >>> The other thing I just wanted to mention as the admin that approved >>> your message--I came very close to deleting it from the queue without >>> looking at it because it is not unusual for spam messages to have >>> generic subjects like "help!" (just for future reference :-) >>> >>> Scott >>> >>> >>> On Tue, Feb 7, 2012 at 11:11 AM, Casandra wrote: >>>> >>>> Hi, >>>> >>>> I'm trying to install Bioperl but I'm a bit lost. I know I have perl >>>> installed becaused I have already write some scripts but I'm biologist >>>> so... >>>> not pretty sure about what messages say. >>>> >>>> My perl version: >>>> This is perl, v5.8.8 built for darwin-thread-multi-2level >>>> My computer: >>>> Mac OS X Vesion 10.5.8 >>>> >>>> I was following this preliminary steps: >>>> >>>> -------------- >>>> >>>> PRELIMINARY PREPARATION >>>> >>>> This is optional, but regardless of your subsequent choice of >>>> installation method, it will help to carry out the following steps. >>>> They will increase the likelyhood of installation success >>>> (especially of optional dependencies). >>>> >>>> * Upgrade CPAN: >>>> >>>> >perl -MCPAN -e shell >>>> cpan>install Bundle::CPAN >>>> cpan>q >>>> >>>> * Install/upgrade Module::Build, and make it your preferred >>>> installer: >>>> >>>> >cpan >>>> cpan>install Module::Build >>>> cpan>o conf prefer_installer MB >>>> cpan>o conf commit >>>> cpan>q >>>> >>>> * Install the expat library by whatever method is >>>> appropriate for your system. >>>> >>>> * If your expat library is installed in a non-standard location, >>>> tell CPAN about it: >>>> >>>> >cpan >>>> cpan>o conf makepl_arg "EXPATLIBPATH=/non-standard/lib >>>> EXPATINCPATH=/non-standard/include" >>>> cpan>o conf commit >>>> >>>> -------------- >>>> >>>> And I think I did "Upgrade CPAN properly" but when I tried the next one >>>> it >>>> started asking too many things to me, and finally it stopped due to "some >>>> problems". In text file you can see the whole process. >>>> What did I do wrong? >>>> >>>> >>>> After solving these preliminary steps, what should I do? What exactly >>>> .tar >>>> or .whatever should I download to install? >>>> >>>> I don't see the difference between installing it through "built.PL" or >>>> CPAN. And I don't know if I should do this or that "Fink*" stuff for >>>> MAC. >>>> >>>> * I went to Fink webpage and what I expected to see was "hello! download >>>> Bioperl simply clicking here!" but far from this, what it seems is that >>>> first I have to download some kinf of Fink-program before starting with >>>> Bioperl... is it something close to this? >>>> >>>> I'm sorry, too many questions... But I really want to learn to use >>>> Bioperl >>>> but I have no people to ask it face to face. >>>> >>>> Thank you so much, >>>> >>>> Casandra >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From guhli007 at umn.edu Tue Feb 7 16:14:05 2012 From: guhli007 at umn.edu (Joseph Guhlin) Date: Tue, 7 Feb 2012 15:14:05 -0600 Subject: [Bioperl-l] help! In-Reply-To: <8DFE8929-F3F7-4D0A-AD9E-A11983020EDF@gmail.com> References: <8DFE8929-F3F7-4D0A-AD9E-A11983020EDF@gmail.com> Message-ID: I wonder if this would work on your system? http://www.sysarchitects.com/bioperl Best, --Joseph Guhlin On Tue, Feb 7, 2012 at 10:11 AM, Casandra wrote: > Hi, > > I'm trying to install Bioperl but I'm a bit lost. I know I have perl > installed becaused I have already write some scripts but I'm biologist > so... not pretty sure about what messages say. > > My perl version: > This is perl, v5.8.8 built for darwin-thread-multi-2level > My computer: > Mac OS X Vesion 10.5.8 > > I was following this preliminary steps: > > -------------- > > PRELIMINARY PREPARATION > > This is optional, but regardless of your subsequent choice of > installation method, it will help to carry out the following steps. > They will increase the likelyhood of installation success > (especially of optional dependencies). > > * Upgrade CPAN: > > >perl -MCPAN -e shell > cpan>install Bundle::CPAN > cpan>q > > * Install/upgrade Module::Build, and make it your preferred > installer: > > >cpan > cpan>install Module::Build > cpan>o conf prefer_installer MB > cpan>o conf commit > cpan>q > > * Install the expat library by whatever method is > appropriate for your system. > > * If your expat library is installed in a non-standard location, > tell CPAN about it: > > >cpan > cpan>o conf makepl_arg "EXPATLIBPATH=/non-standard/lib EXPATINCPATH=/non-standard/include" > cpan>o conf commit > > -------------- > > And I think I did "Upgrade CPAN properly" but when I tried the next one it > started asking too many things to me, and finally it stopped due to "some > problems". In text file you can see the whole process. > What did I do wrong? > > > After solving these preliminary steps, what should I do? What exactly .tar > or .whatever should I download to install? > > I don't see the difference between installing it through "built.PL" or > CPAN. And I don't know if I should do this or that "Fink*" stuff for MAC. > > * I went to Fink webpage and what I expected to see was "hello! download > Bioperl simply clicking here!" but far from this, what it seems is that > first I have to download some kinf of Fink-program before starting with > Bioperl... is it something close to this? > > I'm sorry, too many questions... But I really want to learn to use Bioperl > but I have no people to ask it face to face. > > Thank you so much, > > Casandra > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Tue Feb 7 16:38:12 2012 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 7 Feb 2012 22:38:12 +0100 Subject: [Bioperl-l] help! In-Reply-To: References: <8DFE8929-F3F7-4D0A-AD9E-A11983020EDF@gmail.com> <4F31917A.9030804@illinois.edu> Message-ID: I will take the opportunity to shamelessly pimp my no-install install instructions (below and http://seqxml.org/xml/BioPerl.html). IMHO if Casandra is just looking to get started with BioPerl, messing with external libs and configs is probably overkill. Best, Dave There?s a quickie, ?zero-install? way to get BioPerl on your system. 1) Okay, click here to download bioperl as a zip file: https://github.com/bioperl/bioperl-live/zipball/master when it's done downloading, unzip it if your computer hasn?t done it automatically. On the command line, you would do: unzip bioperl-live-bioperl-release-1-5-1-rc4-4318-g342e587.zip or whatever the file is called. You should then have a folder with some ugly name like bioperl-bioperl-live-558467a 3) rename that to bioperl-live 4) move that folder to wherever you want to keep it. I keep mine in a directory called src in my home directory. So on my computer if I go to the command line and cd to that folder and type pwd I get: /Users/dave/src/bioperl-live 5) in the terminal, cd to your home directory. 6) see if you have a file named .bash_profile by typing ls -l ~/.bash_profile 7) if so, open that file in your favorite editor. if the file doesn't exist, just create the file. 8) put this line in your .bash_profile export PERL5LIB=/Users/dave/src/bioperl-live (obviously replacing my path info with wherever you chose to put bioperl) 9) save and close your .bash_profile 10) open a new terminal window so that the change will take effect. 11) on the command line of the new terminal, type perl -e "use Bio::SeqIO;" If that works, then you have "installed" bioperl. Yay! On Tue, Feb 7, 2012 at 22:12, Scott Cain wrote: > Yes, but those doc don't address exactly the problem Cassandra is > having, that she wants to use local::lib, but there need to be some > prereqs installed, but they can't be because she chose to use > local::lib, and it's not installed. That's all fine if you're not a > newbie and know how to properly install the prereqs before using the > cpan shell, but when following instructions that say "use local::lib", > I find that the instructions are completely insufficient in actually > getting the desired software installed. Thus the need for a good > tutorial. > > Scott > > > On Tue, Feb 7, 2012 at 4:02 PM, Chris Fields > wrote: > > I guess one key question is where these CPAN installation instructions > come > > from. They're a bit odd, and if this is from the wiki we need to do some > > updating. > > > > Re: local::lib, the docs on CPAN are pretty nice if one wants to use a > > single perl version. > > > > https://metacpan.org/module/local::lib#The-bootstrapping-technique > > > > In my case I use perlbrew (which is all local by default, and allows > > switching between perl versions). Highly recommend using either simple > > local::lib or perlbrew in combination with cpanm. > > > > https://metacpan.org/module/perlbrew > > https://metacpan.org/module/cpanm > > > > chris > > > > > > > > On 02/07/2012 02:55 PM, Scott Cain wrote: > >> > >> hi Cassandra, > >> > >> I don't have an answer for you at the moment. It seems to me that > >> using local::lib is a good idea, but I've never found a good tutorial > >> for using it, so I haven't. Perhaps someone else on the list can > >> suggest one. > >> > >> The other thing I just wanted to mention as the admin that approved > >> your message--I came very close to deleting it from the queue without > >> looking at it because it is not unusual for spam messages to have > >> generic subjects like "help!" (just for future reference :-) > >> > >> Scott > >> > >> > >> On Tue, Feb 7, 2012 at 11:11 AM, Casandra wrote: > >>> > >>> Hi, > >>> > >>> I'm trying to install Bioperl but I'm a bit lost. I know I have perl > >>> installed becaused I have already write some scripts but I'm biologist > >>> so... > >>> not pretty sure about what messages say. > >>> > >>> My perl version: > >>> This is perl, v5.8.8 built for darwin-thread-multi-2level > >>> My computer: > >>> Mac OS X Vesion 10.5.8 > >>> > >>> I was following this preliminary steps: > >>> > >>> -------------- > >>> > >>> PRELIMINARY PREPARATION > >>> > >>> This is optional, but regardless of your subsequent choice of > >>> installation method, it will help to carry out the following steps. > >>> They will increase the likelyhood of installation success > >>> (especially of optional dependencies). > >>> > >>> * Upgrade CPAN: > >>> > >>> >perl -MCPAN -e shell > >>> cpan>install Bundle::CPAN > >>> cpan>q > >>> > >>> * Install/upgrade Module::Build, and make it your preferred > >>> installer: > >>> > >>> >cpan > >>> cpan>install Module::Build > >>> cpan>o conf prefer_installer MB > >>> cpan>o conf commit > >>> cpan>q > >>> > >>> * Install the expat library by whatever method is > >>> appropriate for your system. > >>> > >>> * If your expat library is installed in a non-standard location, > >>> tell CPAN about it: > >>> > >>> >cpan > >>> cpan>o conf makepl_arg "EXPATLIBPATH=/non-standard/lib > >>> EXPATINCPATH=/non-standard/include" > >>> cpan>o conf commit > >>> > >>> -------------- > >>> > >>> And I think I did "Upgrade CPAN properly" but when I tried the next one > >>> it > >>> started asking too many things to me, and finally it stopped due to > "some > >>> problems". In text file you can see the whole process. > >>> What did I do wrong? > >>> > >>> > >>> After solving these preliminary steps, what should I do? What exactly > >>> .tar > >>> or .whatever should I download to install? > >>> > >>> I don't see the difference between installing it through "built.PL" or > >>> CPAN. And I don't know if I should do this or that "Fink*" stuff for > >>> MAC. > >>> > >>> * I went to Fink webpage and what I expected to see was "hello! > download > >>> Bioperl simply clicking here!" but far from this, what it seems is that > >>> first I have to download some kinf of Fink-program before starting with > >>> Bioperl... is it something close to this? > >>> > >>> I'm sorry, too many questions... But I really want to learn to use > >>> Bioperl > >>> but I have no people to ask it face to face. > >>> > >>> Thank you so much, > >>> > >>> Casandra > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From mcasandrariera at gmail.com Tue Feb 7 17:05:16 2012 From: mcasandrariera at gmail.com (casandra) Date: Tue, 7 Feb 2012 23:05:16 +0100 Subject: [Bioperl-l] help! In-Reply-To: References: <8DFE8929-F3F7-4D0A-AD9E-A11983020EDF@gmail.com> <4F31917A.9030804@illinois.edu> Message-ID: Ok, this is what happened. I guess this mean it worked, didn't it? Last login: Tue Feb 7 22:47:16 on ttys000 maccasandra:~ mcasandrariera$ perl -e "use Bio::SeqIO;" Can't locate Bio/SeqIO.pm in @INC (@INC contains: /Users/mcasaandrariera/src/bioperl-live /Library/Perl/5.12/darwin-thread-multi-2level /Library/Perl/5.12 /Network/Library/Perl/5.12/darwin-thread-multi-2level /Network/Library/Perl/5.12 /Library/Perl/Updates/5.12.3 /System/Library/Perl/5.12/darwin-thread-multi-2level /System/Library/Perl/5.12 /System/Library/Perl/Extras/5.12/darwin-thread-multi-2level /System/Library/Perl/Extras/5.12 .) at -e line 1. BEGIN failed--compilation aborted at -e line 1. maccasandra:~ mcasandrariera$ cd ./src/bioperl-live/ maccasandra:bioperl-live mcasandrariera$ perl -e "use Bio::SeqIO;" maccasandra:bioperl-live mcasandrariera$ I took your src name (I hadn't any better suggestion, although I don't know what src means... :P) Thank you so much to all of you! I have to say that it was a big relief reading Dave telling so simple things :D thanks! Maybe it will be useless to explain it to me, but, why this method isn't "installing" Bioperl? Why I didn't need to do all those preliminary steps, use Fink, and so on? I mean, if I can finally do the same (using Bioperl). And related to the previous method I tried, I read that you were discussing "that she wants to use local::lib, but there need to be some prereqs installed, but they can't be because she chose to use local::lib, and it's not installed. " I really didn't "wanted" to use it, I chose it because it was the default option, and, since I didn't know about the alternatives, I thought that the default would be ok... But if what Dave said works, better for me, I didn't really know what I was doing with thosesteps (but I want to learn it soon! ;) ) Thank you all for your time ;) Casandra El 7 de febrero de 2012 22:38, Dave Messina escribi?: > I will take the opportunity to shamelessly pimp my no-install install > instructions (below and http://seqxml.org/xml/BioPerl.html). IMHO if > Casandra is just looking to get started with BioPerl, messing with external > libs and configs is probably overkill. > > Best, > Dave > > > > There?s a quickie, ?zero-install? way to get BioPerl on your system. > > 1) Okay, click here to download bioperl as a zip file: > > https://github.com/bioperl/bioperl-live/zipball/master > > > when it's done downloading, unzip it if your computer hasn?t done it > automatically. On the > command line, you would do: > > unzip bioperl-live-bioperl-release-1-5-1-rc4-4318-g342e587.zip > > or whatever the file is called. You should then have a folder with > some ugly name like > > bioperl-bioperl-live-558467a > > 3) rename that to > > bioperl-live > > 4) move that folder to wherever you want to keep it. I keep mine in a > directory called src in my > home directory. > > So on my computer if I go to the command line and cd to that folder > and type pwd I get: > > /Users/dave/src/bioperl-live > > 5) in the terminal, cd to your home directory. > > 6) see if you have a file named .bash_profile by typing > > ls -l ~/.bash_profile > > 7) if so, open that file in your favorite editor. if the file doesn't > exist, just create the file. > > 8) put this line in your .bash_profile > > export PERL5LIB=/Users/dave/src/bioperl-live > > (obviously replacing my path info with wherever you chose to put > bioperl) > > 9) save and close your .bash_profile > > 10) open a new terminal window so that the change will take effect. > > 11) on the command line of the new terminal, type > > perl -e "use Bio::SeqIO;" > > If that works, then you have "installed" bioperl. Yay! > > > > > > > > > > > > > On Tue, Feb 7, 2012 at 22:12, Scott Cain wrote: > >> Yes, but those doc don't address exactly the problem Cassandra is >> having, that she wants to use local::lib, but there need to be some >> prereqs installed, but they can't be because she chose to use >> local::lib, and it's not installed. That's all fine if you're not a >> newbie and know how to properly install the prereqs before using the >> cpan shell, but when following instructions that say "use local::lib", >> I find that the instructions are completely insufficient in actually >> getting the desired software installed. Thus the need for a good >> tutorial. >> >> Scott >> >> >> On Tue, Feb 7, 2012 at 4:02 PM, Chris Fields >> wrote: >> > I guess one key question is where these CPAN installation instructions >> come >> > from. They're a bit odd, and if this is from the wiki we need to do >> some >> > updating. >> > >> > Re: local::lib, the docs on CPAN are pretty nice if one wants to use a >> > single perl version. >> > >> > https://metacpan.org/module/local::lib#The-bootstrapping-technique >> > >> > In my case I use perlbrew (which is all local by default, and allows >> > switching between perl versions). Highly recommend using either simple >> > local::lib or perlbrew in combination with cpanm. >> > >> > https://metacpan.org/module/perlbrew >> > https://metacpan.org/module/cpanm >> > >> > chris >> >> > >> > >> > >> > On 02/07/2012 02:55 PM, Scott Cain wrote: >> >> >> >> hi Cassandra, >> >> >> >> I don't have an answer for you at the moment. It seems to me that >> >> using local::lib is a good idea, but I've never found a good tutorial >> >> for using it, so I haven't. Perhaps someone else on the list can >> >> suggest one. >> >> >> >> The other thing I just wanted to mention as the admin that approved >> >> your message--I came very close to deleting it from the queue without >> >> looking at it because it is not unusual for spam messages to have >> >> generic subjects like "help!" (just for future reference :-) >> >> >> >> Scott >> >> >> >> >> >> On Tue, Feb 7, 2012 at 11:11 AM, Casandra wrote: >> >>> >> >>> Hi, >> >>> >> >>> I'm trying to install Bioperl but I'm a bit lost. I know I have perl >> >>> installed becaused I have already write some scripts but I'm biologist >> >>> so... >> >>> not pretty sure about what messages say. >> >>> >> >>> My perl version: >> >>> This is perl, v5.8.8 built for darwin-thread-multi-2level >> >>> My computer: >> >>> Mac OS X Vesion 10.5.8 >> >>> >> >>> I was following this preliminary steps: >> >>> >> >>> -------------- >> >>> >> >>> PRELIMINARY PREPARATION >> >>> >> >>> This is optional, but regardless of your subsequent choice of >> >>> installation method, it will help to carry out the following steps. >> >>> They will increase the likelyhood of installation success >> >>> (especially of optional dependencies). >> >>> >> >>> * Upgrade CPAN: >> >>> >> >>> >perl -MCPAN -e shell >> >>> cpan>install Bundle::CPAN >> >>> cpan>q >> >>> >> >>> * Install/upgrade Module::Build, and make it your preferred >> >>> installer: >> >>> >> >>> >cpan >> >>> cpan>install Module::Build >> >>> cpan>o conf prefer_installer MB >> >>> cpan>o conf commit >> >>> cpan>q >> >>> >> >>> * Install the expat library by whatever method is >> >>> appropriate for your system. >> >>> >> >>> * If your expat library is installed in a non-standard location, >> >>> tell CPAN about it: >> >>> >> >>> >cpan >> >>> cpan>o conf makepl_arg "EXPATLIBPATH=/non-standard/lib >> >>> EXPATINCPATH=/non-standard/include" >> >>> cpan>o conf commit >> >>> >> >>> -------------- >> >>> >> >>> And I think I did "Upgrade CPAN properly" but when I tried the next >> one >> >>> it >> >>> started asking too many things to me, and finally it stopped due to >> "some >> >>> problems". In text file you can see the whole process. >> >>> What did I do wrong? >> >>> >> >>> >> >>> After solving these preliminary steps, what should I do? What exactly >> >>> .tar >> >>> or .whatever should I download to install? >> >>> >> >>> I don't see the difference between installing it through "built.PL" or >> >>> CPAN. And I don't know if I should do this or that "Fink*" stuff for >> >>> MAC. >> >>> >> >>> * I went to Fink webpage and what I expected to see was "hello! >> download >> >>> Bioperl simply clicking here!" but far from this, what it seems is >> that >> >>> first I have to download some kinf of Fink-program before starting >> with >> >>> Bioperl... is it something close to this? >> >>> >> >>> I'm sorry, too many questions... But I really want to learn to use >> >>> Bioperl >> >>> but I have no people to ask it face to face. >> >>> >> >>> Thank you so much, >> >>> >> >>> Casandra >> >>> >> >>> _______________________________________________ >> >>> Bioperl-l mailing list >> >>> Bioperl-l at lists.open-bio.org >> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> >> >> >> >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- Casandra Riera +34 629774181 Barcelona, Spain. mcasandrariera at gmail.com http://terrainsalo.blogspot.com/ From wkretzsch at gmail.com Tue Feb 7 17:30:37 2012 From: wkretzsch at gmail.com (Warren W. Kretzschmar) Date: Tue, 7 Feb 2012 22:30:37 +0000 Subject: [Bioperl-l] help! In-Reply-To: References: <8DFE8929-F3F7-4D0A-AD9E-A11983020EDF@gmail.com> <4F31917A.9030804@illinois.edu> Message-ID: Hi Cassandra, Do make sure you follow steps 6-10 (below) because bioperl is not in your include path yet. Without adding bioperl to your include path, bioperl will only work when you are running programs while in the bioperl-live dir. This is because the current dir is usually added to the perl include list. 6) see if you have a file named .bash_profile by typing ls -l ~/.bash_profile 7) if so, open that file in your favorite editor. if the file doesn't exist, just create the file. 8) put this line in your .bash_profile export PERL5LIB=/Users/dave/src/ bioperl-live (obviously replacing my path info with wherever you chose to put bioperl) 9) save and close your .bash_profile 10) open a new terminal window so that the change will take effect. Warren -- In God we trust, all others bring data. - William Edwards Deming On Tue, Feb 7, 2012 at 10:05 PM, casandra wrote: > Ok, this is what happened. I guess this mean it worked, didn't it? > > Last login: Tue Feb 7 22:47:16 on ttys000 > maccasandra:~ mcasandrariera$ perl -e "use Bio::SeqIO;" > Can't locate Bio/SeqIO.pm in @INC (@INC contains: > /Users/mcasaandrariera/src/bioperl-live > /Library/Perl/5.12/darwin-thread-multi-2level /Library/Perl/5.12 > /Network/Library/Perl/5.12/darwin-thread-multi-2level > /Network/Library/Perl/5.12 /Library/Perl/Updates/5.12.3 > /System/Library/Perl/5.12/darwin-thread-multi-2level > /System/Library/Perl/5.12 > /System/Library/Perl/Extras/5.12/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.12 .) at -e line 1. > BEGIN failed--compilation aborted at -e line 1. > maccasandra:~ mcasandrariera$ cd ./src/bioperl-live/ > maccasandra:bioperl-live mcasandrariera$ perl -e "use Bio::SeqIO;" > maccasandra:bioperl-live mcasandrariera$ > > I took your src name (I hadn't any better suggestion, although I don't know > what src means... :P) > > Thank you so much to all of you! I have to say that it was a big relief > reading Dave telling so simple things :D thanks! > > Maybe it will be useless to explain it to me, but, why this method isn't > "installing" Bioperl? Why I didn't need to do all those preliminary steps, > use Fink, and so on? I mean, if I can finally do the same (using Bioperl). > > And related to the previous method I tried, I read that you were discussing > "that she wants to use local::lib, but there need to be some prereqs > installed, but they can't be because she chose to use local::lib, and it's > not installed. " I really didn't "wanted" to use it, I chose it because it > was the default option, and, since I didn't know about the alternatives, I > thought that the default would be ok... But if what Dave said works, better > for me, I didn't really know what I was doing with thosesteps (but I want > to learn it soon! ;) ) > > Thank you all for your time ;) > > Casandra > > El 7 de febrero de 2012 22:38, Dave Messina >escribi?: > > > I will take the opportunity to shamelessly pimp my no-install install > > instructions (below and http://seqxml.org/xml/BioPerl.html). IMHO if > > Casandra is just looking to get started with BioPerl, messing with > external > > libs and configs is probably overkill. > > > > Best, > > Dave > > > > > > > > There?s a quickie, ?zero-install? way to get BioPerl on your system. > > > > 1) Okay, click here to download bioperl as a zip file: > > > > https://github.com/bioperl/bioperl-live/zipball/master > > > > > > when it's done downloading, unzip it if your computer hasn?t done it > > automatically. On the > > command line, you would do: > > > > unzip bioperl-live-bioperl-release-1-5-1-rc4-4318-g342e587.zip > > > > or whatever the file is called. You should then have a folder with > > some ugly name like > > > > bioperl-bioperl-live-558467a > > > > 3) rename that to > > > > bioperl-live > > > > 4) move that folder to wherever you want to keep it. I keep mine in a > > directory called src in my > > home directory. > > > > So on my computer if I go to the command line and cd to that folder > > and type pwd I get: > > > > /Users/dave/src/bioperl-live > > > > 5) in the terminal, cd to your home directory. > > > > 6) see if you have a file named .bash_profile by typing > > > > ls -l ~/.bash_profile > > > > 7) if so, open that file in your favorite editor. if the file doesn't > > exist, just create the file. > > > > 8) put this line in your .bash_profile > > > > export PERL5LIB=/Users/dave/src/bioperl-live > > > > (obviously replacing my path info with wherever you chose to put > > bioperl) > > > > 9) save and close your .bash_profile > > > > 10) open a new terminal window so that the change will take effect. > > > > 11) on the command line of the new terminal, type > > > > perl -e "use Bio::SeqIO;" > > > > If that works, then you have "installed" bioperl. Yay! > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 7, 2012 at 22:12, Scott Cain wrote: > > > >> Yes, but those doc don't address exactly the problem Cassandra is > >> having, that she wants to use local::lib, but there need to be some > >> prereqs installed, but they can't be because she chose to use > >> local::lib, and it's not installed. That's all fine if you're not a > >> newbie and know how to properly install the prereqs before using the > >> cpan shell, but when following instructions that say "use local::lib", > >> I find that the instructions are completely insufficient in actually > >> getting the desired software installed. Thus the need for a good > >> tutorial. > >> > >> Scott > >> > >> > >> On Tue, Feb 7, 2012 at 4:02 PM, Chris Fields > >> wrote: > >> > I guess one key question is where these CPAN installation instructions > >> come > >> > from. They're a bit odd, and if this is from the wiki we need to do > >> some > >> > updating. > >> > > >> > Re: local::lib, the docs on CPAN are pretty nice if one wants to use a > >> > single perl version. > >> > > >> > https://metacpan.org/module/local::lib#The-bootstrapping-technique > >> > > >> > In my case I use perlbrew (which is all local by default, and allows > >> > switching between perl versions). Highly recommend using either > simple > >> > local::lib or perlbrew in combination with cpanm. > >> > > >> > https://metacpan.org/module/perlbrew > >> > https://metacpan.org/module/cpanm > >> > > >> > chris > >> > >> > > >> > > >> > > >> > On 02/07/2012 02:55 PM, Scott Cain wrote: > >> >> > >> >> hi Cassandra, > >> >> > >> >> I don't have an answer for you at the moment. It seems to me that > >> >> using local::lib is a good idea, but I've never found a good tutorial > >> >> for using it, so I haven't. Perhaps someone else on the list can > >> >> suggest one. > >> >> > >> >> The other thing I just wanted to mention as the admin that approved > >> >> your message--I came very close to deleting it from the queue without > >> >> looking at it because it is not unusual for spam messages to have > >> >> generic subjects like "help!" (just for future reference :-) > >> >> > >> >> Scott > >> >> > >> >> > >> >> On Tue, Feb 7, 2012 at 11:11 AM, Casandra > wrote: > >> >>> > >> >>> Hi, > >> >>> > >> >>> I'm trying to install Bioperl but I'm a bit lost. I know I have perl > >> >>> installed becaused I have already write some scripts but I'm > biologist > >> >>> so... > >> >>> not pretty sure about what messages say. > >> >>> > >> >>> My perl version: > >> >>> This is perl, v5.8.8 built for darwin-thread-multi-2level > >> >>> My computer: > >> >>> Mac OS X Vesion 10.5.8 > >> >>> > >> >>> I was following this preliminary steps: > >> >>> > >> >>> -------------- > >> >>> > >> >>> PRELIMINARY PREPARATION > >> >>> > >> >>> This is optional, but regardless of your subsequent choice of > >> >>> installation method, it will help to carry out the following > steps. > >> >>> They will increase the likelyhood of installation success > >> >>> (especially of optional dependencies). > >> >>> > >> >>> * Upgrade CPAN: > >> >>> > >> >>> >perl -MCPAN -e shell > >> >>> cpan>install Bundle::CPAN > >> >>> cpan>q > >> >>> > >> >>> * Install/upgrade Module::Build, and make it your preferred > >> >>> installer: > >> >>> > >> >>> >cpan > >> >>> cpan>install Module::Build > >> >>> cpan>o conf prefer_installer MB > >> >>> cpan>o conf commit > >> >>> cpan>q > >> >>> > >> >>> * Install the expat library by whatever method is > >> >>> appropriate for your system. > >> >>> > >> >>> * If your expat library is installed in a non-standard > location, > >> >>> tell CPAN about it: > >> >>> > >> >>> >cpan > >> >>> cpan>o conf makepl_arg "EXPATLIBPATH=/non-standard/lib > >> >>> EXPATINCPATH=/non-standard/include" > >> >>> cpan>o conf commit > >> >>> > >> >>> -------------- > >> >>> > >> >>> And I think I did "Upgrade CPAN properly" but when I tried the next > >> one > >> >>> it > >> >>> started asking too many things to me, and finally it stopped due to > >> "some > >> >>> problems". In text file you can see the whole process. > >> >>> What did I do wrong? > >> >>> > >> >>> > >> >>> After solving these preliminary steps, what should I do? What > exactly > >> >>> .tar > >> >>> or .whatever should I download to install? > >> >>> > >> >>> I don't see the difference between installing it through "built.PL" > or > >> >>> CPAN. And I don't know if I should do this or that "Fink*" stuff > for > >> >>> MAC. > >> >>> > >> >>> * I went to Fink webpage and what I expected to see was "hello! > >> download > >> >>> Bioperl simply clicking here!" but far from this, what it seems is > >> that > >> >>> first I have to download some kinf of Fink-program before starting > >> with > >> >>> Bioperl... is it something close to this? > >> >>> > >> >>> I'm sorry, too many questions... But I really want to learn to use > >> >>> Bioperl > >> >>> but I have no people to ask it face to face. > >> >>> > >> >>> Thank you so much, > >> >>> > >> >>> Casandra > >> >>> > >> >>> _______________________________________________ > >> >>> Bioperl-l mailing list > >> >>> Bioperl-l at lists.open-bio.org > >> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> >> > >> >> > >> >> > >> >> > >> > > >> > _______________________________________________ > >> > Bioperl-l mailing list > >> > Bioperl-l at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> > >> -- > >> ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. scott at scottcain > >> dot net > >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> Ontario Institute for Cancer Research > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > -- > Casandra Riera > +34 629774181 > Barcelona, Spain. > > mcasandrariera at gmail.com > http://terrainsalo.blogspot.com/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Tue Feb 7 17:39:48 2012 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 7 Feb 2012 23:39:48 +0100 Subject: [Bioperl-l] help! In-Reply-To: References: <8DFE8929-F3F7-4D0A-AD9E-A11983020EDF@gmail.com> <4F31917A.9030804@illinois.edu> Message-ID: Hi Casandra, (Wayne already answered this much more succinctly than me, but here is my answer anyway.) I think you're almost there. The fact that you get no error message when you type perl -e "use Bio::SeqIO;" in the directory src/bioperl-live/ tells me that probably Perl doesn't "see" where you put BioPerl. That's what the PERL5LIB variable does; it tells Perl that it should look for Perl modules in the directories named in the environmental variable PERL5LIB. If you type printenv PERL5LIB do you see /Users/mcasandrariera/src/bioperl-live ? If not, then redo steps 5-10 and then try typing printenv PERL5LIB again. Make sure that in step 8, instead of export PERL5LIB=/Users/dave/src/bioperl-live you type export PERL5LIB=/Users/mcasandrariera/src/bioperl-live To answer your other questions: although I don't know what src means... src is just a directory name. It's short for "source", which itself is really short for "source code". src the name of the directory where I keep all of my source code libraries like BioPerl. Maybe it will be useless to explain it to me, but, why this method isn't > "installing" Bioperl? Why I didn't need to do all those preliminary steps, > use Fink, and so on? I mean, if I can finally do the same (using Bioperl). > The short answer is that by following the zero-install instructions, you won't be able use some parts of BioPerl (which I'm betting you won't need right away). Chris and Scott's advice is correct and the right way to go in the long run. Once you've gotten your feet wet a bit and become more familiar with Perl and BioPerl, you may want to come back to their approach and try it again. Best, Dave On Tue, Feb 7, 2012 at 23:05, casandra wrote: > Ok, this is what happened. I guess this mean it worked, didn't it? > > Last login: Tue Feb 7 22:47:16 on ttys000 > maccasandra:~ mcasandrariera$ perl -e "use Bio::SeqIO;" > Can't locate Bio/SeqIO.pm in @INC (@INC contains: > /Users/mcasaandrariera/src/bioperl-live > /Library/Perl/5.12/darwin-thread-multi-2level /Library/Perl/5.12 > /Network/Library/Perl/5.12/darwin-thread-multi-2level > /Network/Library/Perl/5.12 /Library/Perl/Updates/5.12.3 > /System/Library/Perl/5.12/darwin-thread-multi-2level > /System/Library/Perl/5.12 > /System/Library/Perl/Extras/5.12/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.12 .) at -e line 1. > BEGIN failed--compilation aborted at -e line 1. > maccasandra:~ mcasandrariera$ cd ./src/bioperl-live/ > maccasandra:bioperl-live mcasandrariera$ perl -e "use Bio::SeqIO;" > maccasandra:bioperl-live mcasandrariera$ > > I took your src name (I hadn't any better suggestion, although I don't > know what src means... :P) > > Thank you so much to all of you! I have to say that it was a big relief > reading Dave telling so simple things :D thanks! > > Maybe it will be useless to explain it to me, but, why this method isn't > "installing" Bioperl? Why I didn't need to do all those preliminary steps, > use Fink, and so on? I mean, if I can finally do the same (using Bioperl). > > And related to the previous method I tried, I read that you were > discussing "that she wants to use local::lib, but there need to be > some prereqs installed, but they can't be because she chose to > use local::lib, and it's not installed. " I really didn't "wanted" to use > it, I chose it because it was the default option, and, since I didn't know > about the alternatives, I thought that the default would be ok... But if > what Dave said works, better for me, I didn't really know what I was doing > with thosesteps (but I want to learn it soon! ;) ) > > Thank you all for your time ;) > > Casandra > > El 7 de febrero de 2012 22:38, Dave Messina escribi?: > > I will take the opportunity to shamelessly pimp my no-install install >> instructions (below and http://seqxml.org/xml/BioPerl.html). IMHO if >> Casandra is just looking to get started with BioPerl, messing with external >> libs and configs is probably overkill. >> >> Best, >> Dave >> >> >> >> There?s a quickie, ?zero-install? way to get BioPerl on your system. >> >> 1) Okay, click here to download bioperl as a zip file: >> >> https://github.com/bioperl/bioperl-live/zipball/master >> >> >> when it's done downloading, unzip it if your computer hasn?t done it >> automatically. On the >> command line, you would do: >> >> unzip bioperl-live-bioperl-release-1-5-1-rc4-4318-g342e587.zip >> >> or whatever the file is called. You should then have a folder with >> some ugly name like >> >> bioperl-bioperl-live-558467a >> >> 3) rename that to >> >> bioperl-live >> >> 4) move that folder to wherever you want to keep it. I keep mine in a >> directory called src in my >> home directory. >> >> So on my computer if I go to the command line and cd to that folder >> and type pwd I get: >> >> /Users/dave/src/bioperl-live >> >> 5) in the terminal, cd to your home directory. >> >> 6) see if you have a file named .bash_profile by typing >> >> ls -l ~/.bash_profile >> >> 7) if so, open that file in your favorite editor. if the file doesn't >> exist, just create the file. >> >> 8) put this line in your .bash_profile >> >> export PERL5LIB=/Users/dave/src/bioperl-live >> >> (obviously replacing my path info with wherever you chose to put >> bioperl) >> >> 9) save and close your .bash_profile >> >> 10) open a new terminal window so that the change will take effect. >> >> 11) on the command line of the new terminal, type >> >> perl -e "use Bio::SeqIO;" >> >> If that works, then you have "installed" bioperl. Yay! >> >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Feb 7, 2012 at 22:12, Scott Cain wrote: >> >>> Yes, but those doc don't address exactly the problem Cassandra is >>> having, that she wants to use local::lib, but there need to be some >>> prereqs installed, but they can't be because she chose to use >>> local::lib, and it's not installed. That's all fine if you're not a >>> newbie and know how to properly install the prereqs before using the >>> cpan shell, but when following instructions that say "use local::lib", >>> I find that the instructions are completely insufficient in actually >>> getting the desired software installed. Thus the need for a good >>> tutorial. >>> >>> Scott >>> >>> >>> On Tue, Feb 7, 2012 at 4:02 PM, Chris Fields >>> wrote: >>> > I guess one key question is where these CPAN installation instructions >>> come >>> > from. They're a bit odd, and if this is from the wiki we need to do >>> some >>> > updating. >>> > >>> > Re: local::lib, the docs on CPAN are pretty nice if one wants to use a >>> > single perl version. >>> > >>> > https://metacpan.org/module/local::lib#The-bootstrapping-technique >>> > >>> > In my case I use perlbrew (which is all local by default, and allows >>> > switching between perl versions). Highly recommend using either simple >>> > local::lib or perlbrew in combination with cpanm. >>> > >>> > https://metacpan.org/module/perlbrew >>> > https://metacpan.org/module/cpanm >>> > >>> > chris >>> >>> > >>> > >>> > >>> > On 02/07/2012 02:55 PM, Scott Cain wrote: >>> >> >>> >> hi Cassandra, >>> >> >>> >> I don't have an answer for you at the moment. It seems to me that >>> >> using local::lib is a good idea, but I've never found a good tutorial >>> >> for using it, so I haven't. Perhaps someone else on the list can >>> >> suggest one. >>> >> >>> >> The other thing I just wanted to mention as the admin that approved >>> >> your message--I came very close to deleting it from the queue without >>> >> looking at it because it is not unusual for spam messages to have >>> >> generic subjects like "help!" (just for future reference :-) >>> >> >>> >> Scott >>> >> >>> >> >>> >> On Tue, Feb 7, 2012 at 11:11 AM, Casandra >>> wrote: >>> >>> >>> >>> Hi, >>> >>> >>> >>> I'm trying to install Bioperl but I'm a bit lost. I know I have perl >>> >>> installed becaused I have already write some scripts but I'm >>> biologist >>> >>> so... >>> >>> not pretty sure about what messages say. >>> >>> >>> >>> My perl version: >>> >>> This is perl, v5.8.8 built for darwin-thread-multi-2level >>> >>> My computer: >>> >>> Mac OS X Vesion 10.5.8 >>> >>> >>> >>> I was following this preliminary steps: >>> >>> >>> >>> -------------- >>> >>> >>> >>> PRELIMINARY PREPARATION >>> >>> >>> >>> This is optional, but regardless of your subsequent choice of >>> >>> installation method, it will help to carry out the following >>> steps. >>> >>> They will increase the likelyhood of installation success >>> >>> (especially of optional dependencies). >>> >>> >>> >>> * Upgrade CPAN: >>> >>> >>> >>> >perl -MCPAN -e shell >>> >>> cpan>install Bundle::CPAN >>> >>> cpan>q >>> >>> >>> >>> * Install/upgrade Module::Build, and make it your preferred >>> >>> installer: >>> >>> >>> >>> >cpan >>> >>> cpan>install Module::Build >>> >>> cpan>o conf prefer_installer MB >>> >>> cpan>o conf commit >>> >>> cpan>q >>> >>> >>> >>> * Install the expat library by whatever method is >>> >>> appropriate for your system. >>> >>> >>> >>> * If your expat library is installed in a non-standard location, >>> >>> tell CPAN about it: >>> >>> >>> >>> >cpan >>> >>> cpan>o conf makepl_arg "EXPATLIBPATH=/non-standard/lib >>> >>> EXPATINCPATH=/non-standard/include" >>> >>> cpan>o conf commit >>> >>> >>> >>> -------------- >>> >>> >>> >>> And I think I did "Upgrade CPAN properly" but when I tried the next >>> one >>> >>> it >>> >>> started asking too many things to me, and finally it stopped due to >>> "some >>> >>> problems". In text file you can see the whole process. >>> >>> What did I do wrong? >>> >>> >>> >>> >>> >>> After solving these preliminary steps, what should I do? What exactly >>> >>> .tar >>> >>> or .whatever should I download to install? >>> >>> >>> >>> I don't see the difference between installing it through "built.PL" >>> or >>> >>> CPAN. And I don't know if I should do this or that "Fink*" stuff for >>> >>> MAC. >>> >>> >>> >>> * I went to Fink webpage and what I expected to see was "hello! >>> download >>> >>> Bioperl simply clicking here!" but far from this, what it seems is >>> that >>> >>> first I have to download some kinf of Fink-program before starting >>> with >>> >>> Bioperl... is it something close to this? >>> >>> >>> >>> I'm sorry, too many questions... But I really want to learn to use >>> >>> Bioperl >>> >>> but I have no people to ask it face to face. >>> >>> >>> >>> Thank you so much, >>> >>> >>> >>> Casandra >>> >>> >>> >>> _______________________________________________ >>> >>> Bioperl-l mailing list >>> >>> Bioperl-l at lists.open-bio.org >>> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >>> >> >>> >> >>> >> >>> > >>> > _______________________________________________ >>> > Bioperl-l mailing list >>> > Bioperl-l at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain >>> dot net >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>> Ontario Institute for Cancer Research >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> > > > -- > Casandra Riera > +34 629774181 > Barcelona, Spain. > > mcasandrariera at gmail.com > http://terrainsalo.blogspot.com/ > > From mcasandrariera at gmail.com Tue Feb 7 17:46:42 2012 From: mcasandrariera at gmail.com (casandra) Date: Tue, 7 Feb 2012 23:46:42 +0100 Subject: [Bioperl-l] help! In-Reply-To: References: <8DFE8929-F3F7-4D0A-AD9E-A11983020EDF@gmail.com> <4F31917A.9030804@illinois.edu> Message-ID: Thank you both, before I received Dave's email I was answering the following: Hello Warren, I've repeated those steps just in case, here I show you: maccasandra:~ mcasandrariera$ vi .bash_profile maccasandra:~ mcasandrariera$ more .bash_profile export PERL5LIB=/Users/mcasandrariera/src/bioperl-live maccasandra:~ mcasandrariera$ cd ./src/bioperl-live/ maccasandra:bioperl-live mcasandrariera$ perl -e "use Bio::SeqIO;" maccasandra:bioperl-live mcasandrariera$ I get the same "result". Am I supposed to get something different?" And now I tried what Dave ask me: maccasandra:bioperl-live mcasandrariera$ printenv PERL5LIB /Users/mcasandrariera/src/bioperl-live maccasandra:bioperl-live mcasandrariera$ I have to say it's very nice from you to help me, specially you, Dave, who translates all this into casandra's-super-dummie language.. Cas. El 7 de febrero de 2012 23:39, Dave Messina escribi?: > Hi Casandra, > > (Wayne already answered this much more succinctly than me, but here is my > answer anyway.) > > > I think you're almost there. > > The fact that you get no error message when you type > perl -e "use Bio::SeqIO;" > > in the directory src/bioperl-live/ tells me that probably Perl doesn't > "see" where you put BioPerl. That's what the PERL5LIB variable does; it > tells Perl that it should look for Perl modules in the directories named in > the environmental variable PERL5LIB. > > If you type > > printenv PERL5LIB > > do you see > > /Users/mcasandrariera/src/bioperl-live > > ? > > If not, then redo steps 5-10 and then try typing > printenv PERL5LIB > > again. Make sure that in step 8, instead of > > export PERL5LIB=/Users/dave/src/bioperl-live > > you type > > export PERL5LIB=/Users/mcasandrariera/src/bioperl-live > > > To answer your other questions: > > although I don't know what src means... > > > src is just a directory name. It's short for "source", which itself is > really short for "source code". src the name of the directory where I keep > all of my source code libraries like BioPerl. > > > Maybe it will be useless to explain it to me, but, why this method isn't >> "installing" Bioperl? Why I didn't need to do all those preliminary steps, >> use Fink, and so on? I mean, if I can finally do the same (using Bioperl). >> > > The short answer is that by following the zero-install instructions, you > won't be able use some parts of BioPerl (which I'm betting you won't need > right away). > > Chris and Scott's advice is correct and the right way to go in the long > run. Once you've gotten your feet wet a bit and become more familiar with > Perl and BioPerl, you may want to come back to their approach and try it > again. > > > Best, > Dave > > > > > > > On Tue, Feb 7, 2012 at 23:05, casandra wrote: > >> Ok, this is what happened. I guess this mean it worked, didn't it? >> >> Last login: Tue Feb 7 22:47:16 on ttys000 >> maccasandra:~ mcasandrariera$ perl -e "use Bio::SeqIO;" >> Can't locate Bio/SeqIO.pm in @INC (@INC contains: >> /Users/mcasaandrariera/src/bioperl-live >> /Library/Perl/5.12/darwin-thread-multi-2level /Library/Perl/5.12 >> /Network/Library/Perl/5.12/darwin-thread-multi-2level >> /Network/Library/Perl/5.12 /Library/Perl/Updates/5.12.3 >> /System/Library/Perl/5.12/darwin-thread-multi-2level >> /System/Library/Perl/5.12 >> /System/Library/Perl/Extras/5.12/darwin-thread-multi-2level >> /System/Library/Perl/Extras/5.12 .) at -e line 1. >> BEGIN failed--compilation aborted at -e line 1. >> maccasandra:~ mcasandrariera$ cd ./src/bioperl-live/ >> maccasandra:bioperl-live mcasandrariera$ perl -e "use Bio::SeqIO;" >> maccasandra:bioperl-live mcasandrariera$ >> >> I took your src name (I hadn't any better suggestion, although I don't >> know what src means... :P) >> >> Thank you so much to all of you! I have to say that it was a big relief >> reading Dave telling so simple things :D thanks! >> >> Maybe it will be useless to explain it to me, but, why this method isn't >> "installing" Bioperl? Why I didn't need to do all those preliminary steps, >> use Fink, and so on? I mean, if I can finally do the same (using Bioperl). >> >> And related to the previous method I tried, I read that you were >> discussing "that she wants to use local::lib, but there need to be >> some prereqs installed, but they can't be because she chose to >> use local::lib, and it's not installed. " I really didn't "wanted" to use >> it, I chose it because it was the default option, and, since I didn't know >> about the alternatives, I thought that the default would be ok... But if >> what Dave said works, better for me, I didn't really know what I was doing >> with thosesteps (but I want to learn it soon! ;) ) >> >> Thank you all for your time ;) >> >> Casandra >> >> El 7 de febrero de 2012 22:38, Dave Messina escribi?: >> >> I will take the opportunity to shamelessly pimp my no-install install >>> instructions (below and http://seqxml.org/xml/BioPerl.html). IMHO if >>> Casandra is just looking to get started with BioPerl, messing with external >>> libs and configs is probably overkill. >>> >>> Best, >>> Dave >>> >>> >>> >>> There?s a quickie, ?zero-install? way to get BioPerl on your system. >>> >>> 1) Okay, click here to download bioperl as a zip file: >>> >>> https://github.com/bioperl/bioperl-live/zipball/master >>> >>> >>> when it's done downloading, unzip it if your computer hasn?t done it >>> automatically. On the >>> command line, you would do: >>> >>> unzip bioperl-live-bioperl-release-1-5-1-rc4-4318-g342e587.zip >>> >>> or whatever the file is called. You should then have a folder with >>> some ugly name like >>> >>> bioperl-bioperl-live-558467a >>> >>> 3) rename that to >>> >>> bioperl-live >>> >>> 4) move that folder to wherever you want to keep it. I keep mine in >>> a directory called src in my >>> home directory. >>> >>> So on my computer if I go to the command line and cd to that folder >>> and type pwd I get: >>> >>> /Users/dave/src/bioperl-live >>> >>> 5) in the terminal, cd to your home directory. >>> >>> 6) see if you have a file named .bash_profile by typing >>> >>> ls -l ~/.bash_profile >>> >>> 7) if so, open that file in your favorite editor. if the file >>> doesn't exist, just create the file. >>> >>> 8) put this line in your .bash_profile >>> >>> export PERL5LIB=/Users/dave/src/bioperl-live >>> >>> (obviously replacing my path info with wherever you chose to put >>> bioperl) >>> >>> 9) save and close your .bash_profile >>> >>> 10) open a new terminal window so that the change will take effect. >>> >>> 11) on the command line of the new terminal, type >>> >>> perl -e "use Bio::SeqIO;" >>> >>> If that works, then you have "installed" bioperl. Yay! >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Tue, Feb 7, 2012 at 22:12, Scott Cain wrote: >>> >>>> Yes, but those doc don't address exactly the problem Cassandra is >>>> having, that she wants to use local::lib, but there need to be some >>>> prereqs installed, but they can't be because she chose to use >>>> local::lib, and it's not installed. That's all fine if you're not a >>>> newbie and know how to properly install the prereqs before using the >>>> cpan shell, but when following instructions that say "use local::lib", >>>> I find that the instructions are completely insufficient in actually >>>> getting the desired software installed. Thus the need for a good >>>> tutorial. >>>> >>>> Scott >>>> >>>> >>>> On Tue, Feb 7, 2012 at 4:02 PM, Chris Fields >>>> wrote: >>>> > I guess one key question is where these CPAN installation >>>> instructions come >>>> > from. They're a bit odd, and if this is from the wiki we need to do >>>> some >>>> > updating. >>>> > >>>> > Re: local::lib, the docs on CPAN are pretty nice if one wants to use a >>>> > single perl version. >>>> > >>>> > https://metacpan.org/module/local::lib#The-bootstrapping-technique >>>> > >>>> > In my case I use perlbrew (which is all local by default, and allows >>>> > switching between perl versions). Highly recommend using either >>>> simple >>>> > local::lib or perlbrew in combination with cpanm. >>>> > >>>> > https://metacpan.org/module/perlbrew >>>> > https://metacpan.org/module/cpanm >>>> > >>>> > chris >>>> >>>> > >>>> > >>>> > >>>> > On 02/07/2012 02:55 PM, Scott Cain wrote: >>>> >> >>>> >> hi Cassandra, >>>> >> >>>> >> I don't have an answer for you at the moment. It seems to me that >>>> >> using local::lib is a good idea, but I've never found a good tutorial >>>> >> for using it, so I haven't. Perhaps someone else on the list can >>>> >> suggest one. >>>> >> >>>> >> The other thing I just wanted to mention as the admin that approved >>>> >> your message--I came very close to deleting it from the queue without >>>> >> looking at it because it is not unusual for spam messages to have >>>> >> generic subjects like "help!" (just for future reference :-) >>>> >> >>>> >> Scott >>>> >> >>>> >> >>>> >> On Tue, Feb 7, 2012 at 11:11 AM, Casandra >>>> wrote: >>>> >>> >>>> >>> Hi, >>>> >>> >>>> >>> I'm trying to install Bioperl but I'm a bit lost. I know I have perl >>>> >>> installed becaused I have already write some scripts but I'm >>>> biologist >>>> >>> so... >>>> >>> not pretty sure about what messages say. >>>> >>> >>>> >>> My perl version: >>>> >>> This is perl, v5.8.8 built for darwin-thread-multi-2level >>>> >>> My computer: >>>> >>> Mac OS X Vesion 10.5.8 >>>> >>> >>>> >>> I was following this preliminary steps: >>>> >>> >>>> >>> -------------- >>>> >>> >>>> >>> PRELIMINARY PREPARATION >>>> >>> >>>> >>> This is optional, but regardless of your subsequent choice of >>>> >>> installation method, it will help to carry out the following >>>> steps. >>>> >>> They will increase the likelyhood of installation success >>>> >>> (especially of optional dependencies). >>>> >>> >>>> >>> * Upgrade CPAN: >>>> >>> >>>> >>> >perl -MCPAN -e shell >>>> >>> cpan>install Bundle::CPAN >>>> >>> cpan>q >>>> >>> >>>> >>> * Install/upgrade Module::Build, and make it your preferred >>>> >>> installer: >>>> >>> >>>> >>> >cpan >>>> >>> cpan>install Module::Build >>>> >>> cpan>o conf prefer_installer MB >>>> >>> cpan>o conf commit >>>> >>> cpan>q >>>> >>> >>>> >>> * Install the expat library by whatever method is >>>> >>> appropriate for your system. >>>> >>> >>>> >>> * If your expat library is installed in a non-standard >>>> location, >>>> >>> tell CPAN about it: >>>> >>> >>>> >>> >cpan >>>> >>> cpan>o conf makepl_arg "EXPATLIBPATH=/non-standard/lib >>>> >>> EXPATINCPATH=/non-standard/include" >>>> >>> cpan>o conf commit >>>> >>> >>>> >>> -------------- >>>> >>> >>>> >>> And I think I did "Upgrade CPAN properly" but when I tried the next >>>> one >>>> >>> it >>>> >>> started asking too many things to me, and finally it stopped due to >>>> "some >>>> >>> problems". In text file you can see the whole process. >>>> >>> What did I do wrong? >>>> >>> >>>> >>> >>>> >>> After solving these preliminary steps, what should I do? What >>>> exactly >>>> >>> .tar >>>> >>> or .whatever should I download to install? >>>> >>> >>>> >>> I don't see the difference between installing it through "built.PL" >>>> or >>>> >>> CPAN. And I don't know if I should do this or that "Fink*" stuff >>>> for >>>> >>> MAC. >>>> >>> >>>> >>> * I went to Fink webpage and what I expected to see was "hello! >>>> download >>>> >>> Bioperl simply clicking here!" but far from this, what it seems is >>>> that >>>> >>> first I have to download some kinf of Fink-program before starting >>>> with >>>> >>> Bioperl... is it something close to this? >>>> >>> >>>> >>> I'm sorry, too many questions... But I really want to learn to use >>>> >>> Bioperl >>>> >>> but I have no people to ask it face to face. >>>> >>> >>>> >>> Thank you so much, >>>> >>> >>>> >>> Casandra >>>> >>> >>>> >>> _______________________________________________ >>>> >>> Bioperl-l mailing list >>>> >>> Bioperl-l at lists.open-bio.org >>>> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >> >>>> >> >>>> >> >>>> >> >>>> > >>>> > _______________________________________________ >>>> > Bioperl-l mailing list >>>> > Bioperl-l at lists.open-bio.org >>>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. scott at scottcain >>>> dot net >>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >> >> >> -- >> Casandra Riera >> +34 629774181 >> Barcelona, Spain. >> >> mcasandrariera at gmail.com >> http://terrainsalo.blogspot.com/ >> >> > -- Casandra Riera +34 629774181 Barcelona, Spain. mcasandrariera at gmail.com http://terrainsalo.blogspot.com/ From cjfields at illinois.edu Tue Feb 7 17:50:29 2012 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 7 Feb 2012 16:50:29 -0600 Subject: [Bioperl-l] help! In-Reply-To: References: <8DFE8929-F3F7-4D0A-AD9E-A11983020EDF@gmail.com> <4F31917A.9030804@illinois.edu> Message-ID: <4F31AAB5.5010201@illinois.edu> The catch with a PERL5LIB approach for a new installation is the very possibility that non-core dependencies will not be installed. We'll have a response down the road for 'why did Bio::Foo fail with the following error', with an obvious missing dependency being the problem. We've been discussing this on IRC (scott, leont and I). I tend to think the best solution for new users is to set them up with both local::lib and cpanm initially, either by simply documenting this or by getting a script up and running for them. This will resolve down-stream problems (for instance, if the distribution is split up). chris On 02/07/2012 04:39 PM, Dave Messina wrote: > Hi Casandra, > > (Wayne already answered this much more succinctly than me, but here is > my answer anyway.) > > > I think you're almost there. > > The fact that you get no error message when you type > perl -e "use Bio::SeqIO;" > > in the directory src/bioperl-live/ tells me that probably Perl doesn't > "see" where you put BioPerl. That's what the PERL5LIB variable does; it > tells Perl that it should look for Perl modules in the directories named > in the environmental variable PERL5LIB. > > If you type > > printenv PERL5LIB > > do you see > > /Users/mcasandrariera/src/bioperl-live > > ? > > If not, then redo steps 5-10 and then try typing > printenv PERL5LIB > > again. Make sure that in step 8, instead of > > export PERL5LIB=/Users/dave/src/bioperl-live > > you type > > export PERL5LIB=/Users/mcasandrariera/src/bioperl-live > > > To answer your other questions: > > although I don't know what src means... > > > src is just a directory name. It's short for "source", which itself is > really short for "source code". src the name of the directory where I > keep all of my source code libraries like BioPerl. > > > Maybe it will be useless to explain it to me, but, why this method > isn't "installing" Bioperl? Why I didn't need to do all those > preliminary steps, use Fink, and so on? I mean, if I can finally do > the same (using Bioperl). > > > The short answer is that by following the zero-install instructions, you > won't be able use some parts of BioPerl (which I'm betting you won't > need right away). > > Chris and Scott's advice is correct and the right way to go in the long > run. Once you've gotten your feet wet a bit and become more familiar > with Perl and BioPerl, you may want to come back to their approach and > try it again. > > > Best, > Dave > > > > > > On Tue, Feb 7, 2012 at 23:05, casandra > wrote: > > Ok, this is what happened. I guess this mean it worked, didn't it? > > Last login: Tue Feb 7 22:47:16 on ttys000 > maccasandra:~ mcasandrariera$ perl -e "use Bio::SeqIO;" > Can't locate Bio/SeqIO.pm in @INC (@INC contains: > /Users/mcasaandrariera/src/bioperl-live > /Library/Perl/5.12/darwin-thread-multi-2level /Library/Perl/5.12 > /Network/Library/Perl/5.12/darwin-thread-multi-2level > /Network/Library/Perl/5.12 /Library/Perl/Updates/5.12.3 > /System/Library/Perl/5.12/darwin-thread-multi-2level > /System/Library/Perl/5.12 > /System/Library/Perl/Extras/5.12/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.12 .) at -e line 1. > BEGIN failed--compilation aborted at -e line 1. > maccasandra:~ mcasandrariera$ cd ./src/bioperl-live/ > maccasandra:bioperl-live mcasandrariera$ perl -e "use Bio::SeqIO;" > maccasandra:bioperl-live mcasandrariera$ > > I took your src name (I hadn't any better suggestion, although I > don't know what src means... :P) > > Thank you so much to all of you! I have to say that it was a big > relief reading Dave telling so simple things :D thanks! > > Maybe it will be useless to explain it to me, but, why this method > isn't "installing" Bioperl? Why I didn't need to do all those > preliminary steps, use Fink, and so on? I mean, if I can finally do > the same (using Bioperl). > > And related to the previous method I tried, I read that you were > discussing "that she wants to use local::lib, but there need to be > some prereqs installed, but they can't be because she chose to use > local::lib, and it's not installed. " I really didn't "wanted" to > use it, I chose it because it was the default option, and, since I > didn't know about the alternatives, I thought that the default would > be ok... But if what Dave said works, better for me, I didn't really > know what I was doing with thosesteps (but I want to learn it soon! ;) ) > > Thank you all for your time ;) > > Casandra > > El 7 de febrero de 2012 22:38, Dave Messina > escribi?: > > I will take the opportunity to shamelessly pimp my no-install > install instructions (below and > http://seqxml.org/xml/BioPerl.html). IMHO if Casandra is just > looking to get started with BioPerl, messing with external libs > and configs is probably overkill. > > Best, > Dave > > > > There?s a quickie, ?zero-install? way to get BioPerl on your system. > 1) Okay, click here to download bioperl as a zip file: > > https://github.com/bioperl/bioperl-live/zipball/master > > > when it's done downloading, unzip it if your computer hasn?t > done it automatically. On the > command line, you would do: > > unzip bioperl-live-bioperl-release-1-5-1-rc4-4318-g342e587.zip > > or whatever the file is called. You should then have a folder > with some ugly name like > > bioperl-bioperl-live-558467a > > 3) rename that to > > bioperl-live > > 4) move that folder to wherever you want to keep it. I keep mine > in a directory called src in my > home directory. > > So on my computer if I go to the command line and cd to that > folder and type pwd I get: > > /Users/dave/src/bioperl-live > > 5) in the terminal, cd to your home directory. > > 6) see if you have a file named .bash_profile by typing > > ls -l ~/.bash_profile > > 7) if so, open that file in your favorite editor. if the file > doesn't exist, just create the file. > > 8) put this line in your .bash_profile > > export PERL5LIB=/Users/dave/src/bioperl-live > > (obviously replacing my path info with wherever you chose to put > bioperl) > > 9) save and close your .bash_profile > > 10) open a new terminal window so that the change will take effect. > > 11) on the command line of the new terminal, type > > perl -e "use Bio::SeqIO;" > > If that works, then you have "installed" bioperl. Yay! > > > > > > > > > > > > > On Tue, Feb 7, 2012 at 22:12, Scott Cain > wrote: > > Yes, but those doc don't address exactly the problem > Cassandra is > having, that she wants to use local::lib, but there need to > be some > prereqs installed, but they can't be because she chose to use > local::lib, and it's not installed. That's all fine if > you're not a > newbie and know how to properly install the prereqs before > using the > cpan shell, but when following instructions that say "use > local::lib", > I find that the instructions are completely insufficient in > actually > getting the desired software installed. Thus the need for a good > tutorial. > > Scott > > > On Tue, Feb 7, 2012 at 4:02 PM, Chris Fields > > wrote: > > I guess one key question is where these CPAN installation > instructions come > > from. They're a bit odd, and if this is from the wiki we > need to do some > > updating. > > > > Re: local::lib, the docs on CPAN are pretty nice if one > wants to use a > > single perl version. > > > > > https://metacpan.org/module/local::lib#The-bootstrapping-technique > > > > In my case I use perlbrew (which is all local by default, > and allows > > switching between perl versions). Highly recommend using > either simple > > local::lib or perlbrew in combination with cpanm. > > > > https://metacpan.org/module/perlbrew > > https://metacpan.org/module/cpanm > > > > chris > > > > > > > > > On 02/07/2012 02:55 PM, Scott Cain wrote: > >> > >> hi Cassandra, > >> > >> I don't have an answer for you at the moment. It seems > to me that > >> using local::lib is a good idea, but I've never found a > good tutorial > >> for using it, so I haven't. Perhaps someone else on the > list can > >> suggest one. > >> > >> The other thing I just wanted to mention as the admin > that approved > >> your message--I came very close to deleting it from the > queue without > >> looking at it because it is not unusual for spam > messages to have > >> generic subjects like "help!" (just for future reference :-) > >> > >> Scott > >> > >> > >> On Tue, Feb 7, 2012 at 11:11 AM, > Casandra> > wrote: > >>> > >>> Hi, > >>> > >>> I'm trying to install Bioperl but I'm a bit lost. I > know I have perl > >>> installed becaused I have already write some scripts > but I'm biologist > >>> so... > >>> not pretty sure about what messages say. > >>> > >>> My perl version: > >>> This is perl, v5.8.8 built for darwin-thread-multi-2level > >>> My computer: > >>> Mac OS X Vesion 10.5.8 > >>> > >>> I was following this preliminary steps: > >>> > >>> -------------- > >>> > >>> PRELIMINARY PREPARATION > >>> > >>> This is optional, but regardless of your subsequent > choice of > >>> installation method, it will help to carry out the > following steps. > >>> They will increase the likelyhood of installation success > >>> (especially of optional dependencies). > >>> > >>> * Upgrade CPAN: > >>> > >>> >perl -MCPAN -e shell > >>> cpan>install Bundle::CPAN > >>> cpan>q > >>> > >>> * Install/upgrade Module::Build, and make it your preferred > >>> installer: > >>> > >>> >cpan > >>> cpan>install Module::Build > >>> cpan>o conf prefer_installer MB > >>> cpan>o conf commit > >>> cpan>q > >>> > >>> * Install the expat library by whatever method is > >>> appropriate for your system. > >>> > >>> * If your expat library is installed in a non-standard > location, > >>> tell CPAN about it: > >>> > >>> >cpan > >>> cpan>o conf makepl_arg "EXPATLIBPATH=/non-standard/lib > >>> EXPATINCPATH=/non-standard/include" > >>> cpan>o conf commit > >>> > >>> -------------- > >>> > >>> And I think I did "Upgrade CPAN properly" but when I > tried the next one > >>> it > >>> started asking too many things to me, and finally it > stopped due to "some > >>> problems". In text file you can see the whole process. > >>> What did I do wrong? > >>> > >>> > >>> After solving these preliminary steps, what should I > do? What exactly > >>> .tar > >>> or .whatever should I download to install? > >>> > >>> I don't see the difference between installing it > through "built.PL" or > >>> CPAN. And I don't know if I should do this or that > "Fink*" stuff for > >>> MAC. > >>> > >>> * I went to Fink webpage and what I expected to see was > "hello! download > >>> Bioperl simply clicking here!" but far from this, what > it seems is that > >>> first I have to download some kinf of Fink-program > before starting with > >>> Bioperl... is it something close to this? > >>> > >>> I'm sorry, too many questions... But I really want to > learn to use > >>> Bioperl > >>> but I have no people to ask it face to face. > >>> > >>> Thank you so much, > >>> > >>> Casandra > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Casandra Riera > +34 629774181 > Barcelona, Spain. > > mcasandrariera at gmail.com > http://terrainsalo.blogspot.com/ > > From David.Messina at sbc.su.se Tue Feb 7 18:33:34 2012 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 8 Feb 2012 00:33:34 +0100 Subject: [Bioperl-l] help! In-Reply-To: <4F31AAB5.5010201@illinois.edu> References: <8DFE8929-F3F7-4D0A-AD9E-A11983020EDF@gmail.com> <4F31917A.9030804@illinois.edu> <4F31AAB5.5010201@illinois.edu> Message-ID: Ah right, that's a good point. Well, don't I feel like the asshole. :) Once local::lib and cpanm are installed, it's possible to run cpanm on the tarball downloaded from github, right? i.e. cpanm bioperl-live-bioperl-release-**1-5-1-rc4-4318-g342e587.zip And that should take care of dependencies, then, correct? either by simply documenting this or by getting a script up and running for > them. This will resolve down-stream problems (for instance, if the > distribution is split up). Agreed, that's a great idea. Best, Dave On Tue, Feb 7, 2012 at 23:50, Chris Fields wrote: > The catch with a PERL5LIB approach for a new installation is the very > possibility that non-core dependencies will not be installed. We'll have > a response down the road for 'why did Bio::Foo fail with the following > error', with an obvious missing dependency being the problem. > > We've been discussing this on IRC (scott, leont and I). I tend to think > the best solution for new users is to set them up with both local::lib and > cpanm initially, either by simply documenting this or by getting a script > up and running for them. This will resolve down-stream problems (for > instance, if the distribution is split up). > > chris > > > > On 02/07/2012 04:39 PM, Dave Messina wrote: > >> Hi Casandra, >> >> (Wayne already answered this much more succinctly than me, but here is >> my answer anyway.) >> >> >> I think you're almost there. >> >> The fact that you get no error message when you type >> perl -e "use Bio::SeqIO;" >> >> in the directory src/bioperl-live/ tells me that probably Perl doesn't >> "see" where you put BioPerl. That's what the PERL5LIB variable does; it >> tells Perl that it should look for Perl modules in the directories named >> in the environmental variable PERL5LIB. >> >> If you type >> >> printenv PERL5LIB >> >> do you see >> >> /Users/mcasandrariera/src/**bioperl-live >> >> ? >> >> If not, then redo steps 5-10 and then try typing >> printenv PERL5LIB >> >> again. Make sure that in step 8, instead of >> >> export PERL5LIB=/Users/dave/src/**bioperl-live >> >> you type >> >> export PERL5LIB=/Users/**mcasandrariera/src/bioperl-**live >> >> >> To answer your other questions: >> >> although I don't know what src means... >> >> >> src is just a directory name. It's short for "source", which itself is >> really short for "source code". src the name of the directory where I >> keep all of my source code libraries like BioPerl. >> >> >> Maybe it will be useless to explain it to me, but, why this method >> isn't "installing" Bioperl? Why I didn't need to do all those >> preliminary steps, use Fink, and so on? I mean, if I can finally do >> the same (using Bioperl). >> >> >> The short answer is that by following the zero-install instructions, you >> won't be able use some parts of BioPerl (which I'm betting you won't >> need right away). >> >> Chris and Scott's advice is correct and the right way to go in the long >> run. Once you've gotten your feet wet a bit and become more familiar >> with Perl and BioPerl, you may want to come back to their approach and >> try it again. >> >> >> Best, >> Dave >> >> >> >> >> >> On Tue, Feb 7, 2012 at 23:05, casandra > >> wrote: >> >> Ok, this is what happened. I guess this mean it worked, didn't it? >> >> Last login: Tue Feb 7 22:47:16 on ttys000 >> maccasandra:~ mcasandrariera$ perl -e "use Bio::SeqIO;" >> Can't locate Bio/SeqIO.pm in @INC (@INC contains: >> /Users/mcasaandrariera/src/**bioperl-live >> /Library/Perl/5.12/darwin-**thread-multi-2level /Library/Perl/5.12 >> /Network/Library/Perl/5.12/**darwin-thread-multi-2level >> /Network/Library/Perl/5.12 /Library/Perl/Updates/5.12.3 >> /System/Library/Perl/5.12/**darwin-thread-multi-2level >> /System/Library/Perl/5.12 >> /System/Library/Perl/Extras/5.**12/darwin-thread-multi-2level >> /System/Library/Perl/Extras/5.**12 .) at -e line 1. >> BEGIN failed--compilation aborted at -e line 1. >> maccasandra:~ mcasandrariera$ cd ./src/bioperl-live/ >> maccasandra:bioperl-live mcasandrariera$ perl -e "use Bio::SeqIO;" >> maccasandra:bioperl-live mcasandrariera$ >> >> I took your src name (I hadn't any better suggestion, although I >> don't know what src means... :P) >> >> Thank you so much to all of you! I have to say that it was a big >> relief reading Dave telling so simple things :D thanks! >> >> Maybe it will be useless to explain it to me, but, why this method >> isn't "installing" Bioperl? Why I didn't need to do all those >> preliminary steps, use Fink, and so on? I mean, if I can finally do >> the same (using Bioperl). >> >> And related to the previous method I tried, I read that you were >> discussing "that she wants to use local::lib, but there need to be >> some prereqs installed, but they can't be because she chose to use >> local::lib, and it's not installed. " I really didn't "wanted" to >> use it, I chose it because it was the default option, and, since I >> didn't know about the alternatives, I thought that the default would >> be ok... But if what Dave said works, better for me, I didn't really >> know what I was doing with thosesteps (but I want to learn it soon! ;) >> ) >> >> Thank you all for your time ;) >> >> Casandra >> >> El 7 de febrero de 2012 22:38, Dave Messina > >> >> escribi?: >> >> >> I will take the opportunity to shamelessly pimp my no-install >> install instructions (below and >> http://seqxml.org/xml/BioPerl.**html). >> IMHO if Casandra is just >> looking to get started with BioPerl, messing with external libs >> and configs is probably overkill. >> >> Best, >> Dave >> >> >> >> There?s a quickie, ?zero-install? way to get BioPerl on your >> system. >> 1) Okay, click here to download bioperl as a zip file: >> >> https://github.com/bioperl/**bioperl-live/zipball/master >> >> >> when it's done downloading, unzip it if your computer hasn?t >> done it automatically. On the >> command line, you would do: >> >> unzip bioperl-live-bioperl-release-**1-5-1-rc4-4318-g342e587.zip >> >> or whatever the file is called. You should then have a folder >> with some ugly name like >> >> bioperl-bioperl-live-558467a >> >> 3) rename that to >> >> bioperl-live >> >> 4) move that folder to wherever you want to keep it. I keep mine >> in a directory called src in my >> home directory. >> >> So on my computer if I go to the command line and cd to that >> folder and type pwd I get: >> >> /Users/dave/src/bioperl-live >> >> 5) in the terminal, cd to your home directory. >> >> 6) see if you have a file named .bash_profile by typing >> >> ls -l ~/.bash_profile >> >> 7) if so, open that file in your favorite editor. if the file >> doesn't exist, just create the file. >> >> 8) put this line in your .bash_profile >> >> export PERL5LIB=/Users/dave/src/**bioperl-live >> >> (obviously replacing my path info with wherever you chose to put >> bioperl) >> >> 9) save and close your .bash_profile >> >> 10) open a new terminal window so that the change will take effect. >> >> 11) on the command line of the new terminal, type >> >> perl -e "use Bio::SeqIO;" >> >> If that works, then you have "installed" bioperl. Yay! >> >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Feb 7, 2012 at 22:12, Scott Cain > > wrote: >> >> Yes, but those doc don't address exactly the problem >> Cassandra is >> having, that she wants to use local::lib, but there need to >> be some >> prereqs installed, but they can't be because she chose to use >> local::lib, and it's not installed. That's all fine if >> you're not a >> newbie and know how to properly install the prereqs before >> using the >> cpan shell, but when following instructions that say "use >> local::lib", >> I find that the instructions are completely insufficient in >> actually >> getting the desired software installed. Thus the need for a >> good >> tutorial. >> >> Scott >> >> >> On Tue, Feb 7, 2012 at 4:02 PM, Chris Fields >> **> >> wrote: >> > I guess one key question is where these CPAN installation >> instructions come >> > from. They're a bit odd, and if this is from the wiki we >> need to do some >> > updating. >> > >> > Re: local::lib, the docs on CPAN are pretty nice if one >> wants to use a >> > single perl version. >> > >> > >> https://metacpan.org/module/**local::lib#The-bootstrapping-** >> technique >> > >> > In my case I use perlbrew (which is all local by default, >> and allows >> > switching between perl versions). Highly recommend using >> either simple >> > local::lib or perlbrew in combination with cpanm. >> > >> > https://metacpan.org/module/**perlbrew >> > https://metacpan.org/module/**cpanm >> > >> > chris >> >> > >> > >> > >> > On 02/07/2012 02:55 PM, Scott Cain wrote: >> >> >> >> hi Cassandra, >> >> >> >> I don't have an answer for you at the moment. It seems >> to me that >> >> using local::lib is a good idea, but I've never found a >> good tutorial >> >> for using it, so I haven't. Perhaps someone else on the >> list can >> >> suggest one. >> >> >> >> The other thing I just wanted to mention as the admin >> that approved >> >> your message--I came very close to deleting it from the >> queue without >> >> looking at it because it is not unusual for spam >> messages to have >> >> generic subjects like "help!" (just for future reference >> :-) >> >> >> >> Scott >> >> >> >> >> >> On Tue, Feb 7, 2012 at 11:11 AM, >> Casandra> >> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> I'm trying to install Bioperl but I'm a bit lost. I >> know I have perl >> >>> installed becaused I have already write some scripts >> but I'm biologist >> >>> so... >> >>> not pretty sure about what messages say. >> >>> >> >>> My perl version: >> >>> This is perl, v5.8.8 built for darwin-thread-multi-2level >> >>> My computer: >> >>> Mac OS X Vesion 10.5.8 >> >>> >> >>> I was following this preliminary steps: >> >>> >> >>> -------------- >> >>> >> >>> PRELIMINARY PREPARATION >> >>> >> >>> This is optional, but regardless of your subsequent >> choice of >> >>> installation method, it will help to carry out the >> following steps. >> >>> They will increase the likelyhood of installation success >> >>> (especially of optional dependencies). >> >>> >> >>> * Upgrade CPAN: >> >>> >> >>> >perl -MCPAN -e shell >> >>> cpan>install Bundle::CPAN >> >>> cpan>q >> >>> >> >>> * Install/upgrade Module::Build, and make it your >> preferred >> >>> installer: >> >>> >> >>> >cpan >> >>> cpan>install Module::Build >> >>> cpan>o conf prefer_installer MB >> >>> cpan>o conf commit >> >>> cpan>q >> >>> >> >>> * Install the expat library by whatever method is >> >>> appropriate for your system. >> >>> >> >>> * If your expat library is installed in a non-standard >> location, >> >>> tell CPAN about it: >> >>> >> >>> >cpan >> >>> cpan>o conf makepl_arg "EXPATLIBPATH=/non-standard/**lib >> >>> EXPATINCPATH=/non-standard/**include" >> >>> cpan>o conf commit >> >>> >> >>> -------------- >> >>> >> >>> And I think I did "Upgrade CPAN properly" but when I >> tried the next one >> >>> it >> >>> started asking too many things to me, and finally it >> stopped due to "some >> >>> problems". In text file you can see the whole process. >> >>> What did I do wrong? >> >>> >> >>> >> >>> After solving these preliminary steps, what should I >> do? What exactly >> >>> .tar >> >>> or .whatever should I download to install? >> >>> >> >>> I don't see the difference between installing it >> through "built.PL" or >> >>> CPAN. And I don't know if I should do this or that >> "Fink*" stuff for >> >>> MAC. >> >>> >> >>> * I went to Fink webpage and what I expected to see was >> "hello! download >> >>> Bioperl simply clicking here!" but far from this, what >> it seems is that >> >>> first I have to download some kinf of Fink-program >> before starting with >> >>> Bioperl... is it something close to this? >> >>> >> >>> I'm sorry, too many questions... But I really want to >> learn to use >> >>> Bioperl >> >>> but I have no people to ask it face to face. >> >>> >> >>> Thank you so much, >> >>> >> >>> Casandra >> >>> >> >>> ______________________________**_________________ >> >>> Bioperl-l mailing list >> >>> Bioperl-l at lists.open-bio.org >> >> > >> >> >>> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >> >> >> >> >> >> >> >> >> > >> > ______________________________**_________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> >> > >> >> > http://lists.open-bio.org/**mailman/listinfo/bioperl-l >> >> >> >> -- >> ------------------------------**------------------------------ >> **------------ >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> >> >> Ontario Institute for Cancer Research >> >> ______________________________**_________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> >> > >> >> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >> >> >> >> >> >> -- >> Casandra Riera >> +34 629774181 >> Barcelona, Spain. >> >> mcasandrariera at gmail.com >> > >> http://terrainsalo.blogspot.**com/ >> >> >> > From cjfields at illinois.edu Tue Feb 7 21:25:12 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 8 Feb 2012 02:25:12 +0000 Subject: [Bioperl-l] help! In-Reply-To: References: <8DFE8929-F3F7-4D0A-AD9E-A11983020EDF@gmail.com> <4F31917A.9030804@illinois.edu> <4F31AAB5.5010201@illinois.edu> Message-ID: <704AA15E-13F6-4498-9519-DC6986191A09@illinois.edu> On Feb 7, 2012, at 5:33 PM, Dave Messina wrote: > Ah right, that's a good point. Well, don't I feel like the asshole. :) Nah. It's something we've done for quite a while, but we probably should adopt something that works better in the long run. > Once local::lib and cpanm are installed, it's possible to run cpanm on the tarball downloaded from github, right? > > i.e. > cpanm bioperl-live-bioperl-release-1-5-1-rc4-4318-g342e587.zip > > And that should take care of dependencies, then, correct? Yep. One can install from a URL as well: cpanm http://example.org/LDS/CGI.pm-3.20.tar.gz > either by simply documenting this or by getting a script up and running for them. This will resolve down-stream problems (for instance, if the distribution is split up). > > Agreed, that's a great idea. > > > Best, > Dave chris From hlapp at drycafe.net Tue Feb 7 23:37:32 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Tue, 7 Feb 2012 23:37:32 -0500 Subject: [Bioperl-l] Question on seqfeature mapping In-Reply-To: <4F31802E.9030002@labri.fr> References: <4F0C7017.90803@inria.fr> <4F0D83D8.90402@inria.fr> <4F0D8A18.1080606@inria.fr> <4F0EA69E.8040203@inria.fr> <49CC4A91-5E47-4866-AD63-62DC7F649CE6@drycafe.net> <4F0FF883.80109@inria.fr> <4F0FF8FF.6070706@inria.fr> <4F199A27.8030408@inria.fr> <054FEA92-8C5E-47D6-86AF-F71DEAFE2B63@illinois.edu> <4F31802E.9030002@labri.fr> Message-ID: <33DA4F1D-3575-4925-A123-F6A77EAE753D@drycafe.net> Hi Florian, Are you asking about the Bio::DB::Query::BioQuery interface, or an object persistence adaptor in Bio/DB/BioSQL? Either way, the code resolves which table to use by the association that needs to be queried. How these map to association tables is in the %association_entity_map mapping. You'll see that there is one for term => seqfeature, which is mapped to seqfeature_qualifier_value, for example. So if the association is between seqfeature and term, the adaptor will then use that table to access the value column. Does that make sense? -hilmar On Feb 7, 2012, at 2:49 PM, florian lajus wrote: > Hi, > I have a problem with bio queries: How can I retrieve from datadabse a seqfeature according to its annotation (tagname and value)? > The problem coming for value as we have "value" => "=>{bioentry_qualifier_value,seqfeature_qualifier_value,location_qualifier_value}.value", > in the %slot_attribut_map of the base driver. > Do you know a solution? > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From adsj at novozymes.com Wed Feb 8 07:28:11 2012 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 08 Feb 2012 13:28:11 +0100 Subject: [Bioperl-l] Memory leak in Bio::SeqIO::staden::read->staden_read_trace() ? Message-ID: <87wr7xv7ac.fsf@topper.koldfront.dk> Hi. I am using Bio::SeqIO(staden::read) to read .ab1 files in a long lived (daemon) process, and I have noticed that memory usage keeps growing somewhat rapidly. I have thrown together a small script that illustrates what I am seeing: #!/usr/bin/perl # test.pl - illustrate possible memory leak in Bio::SeqIO::staden::read->staden_read_trace # Example .ab1 file found via first hit on Google: http://www.elimbio.com/Forms/pGEM.zip use strict; use warnings; use Bio::SeqIO; while (1) { print "Reading 100 times\n"; for my $i (1 .. 100) { my $in=Bio::SeqIO->new(-file=>'pGEM_(ABI)_A01.ab1', -format=>'abi'); my $seq=$in->next_seq(); # This seems to leak memory } print "Sleeping\n"; sleep 5; } Running this, and running 'while (( 1 )); do ps fauxww | grep [t]est.pl; sleep 2; done;' I see an increase of memory usage by ~30MB per 100 read cycle; e.g. 108660, 140260, 171860, 203460, 235064, 266676 bytes. I would have expected the number to level out as $seq goes out of scope and memory is returned to Perls internal pool for reuse. If I comment out the $self->staden_read_trace() in L104 of Bio::SeqIO::staden::read then memory usage stays constant, which leads me to believe that it might be leaking (rather than some circular reference I'm overlooking). I am wondering if anyone can reproduce this problem, or if it is a local thing (I had to do some gymnastics to build bioperl-ext way-back-when, so it isn't entirely impossible); if anyone could try it and report back, that would be a great help. I am using BioPerl 1.6.1, bioperl-ext 1.5.1 and perl 5.10.1 on Ubuntu 10.04.03 lucid amd64. Best regards, Adam -- Adam Sj?gren adsj at novozymes.com From flajus at labri.fr Wed Feb 8 03:02:32 2012 From: flajus at labri.fr (Lajus Florian) Date: Wed, 08 Feb 2012 09:02:32 +0100 Subject: [Bioperl-l] Question on seqfeature mapping In-Reply-To: <33DA4F1D-3575-4925-A123-F6A77EAE753D@drycafe.net> References: <4F0C7017.90803@inria.fr> <4F0D83D8.90402@inria.fr> <4F0D8A18.1080606@inria.fr> <4F0EA69E.8040203@inria.fr> <49CC4A91-5E47-4866-AD63-62DC7F649CE6@drycafe.net> <4F0FF883.80109@inria.fr> <4F0FF8FF.6070706@inria.fr> <4F199A27.8030408@inria.fr> <054FEA92-8C5E-47D6-86AF-F71DEAFE2B63@illinois.edu> <4F31802E.9030002@labri.fr> <33DA4F1D-3575-4925-A123-F6A77EAE753D@drycafe.net> Message-ID: <4F322C18.7010103@labri.fr> I'm talking about the interface. I'm far from understanding all the way the association is done but for a query like this: my $query = Bio::DB::Query::BioQuery->new( -datacollections => ["Bio::Annotation::SimpleValue=>Bio::SeqFeatureI qv"], -where => ["qv::value = \'$value\'"]); according to the sql generator I have this: SELECT * FROM seqfeature, term qv WHERE seqfeature.term_id = qv.term_id AND qv.=>{bioentry_qualifier_value,seqfeature_qualifier_value}.value = 'Samatha Carter' Le 08/02/2012 05:37, Hilmar Lapp a ?crit : > Hi Florian, > > Are you asking about the Bio::DB::Query::BioQuery interface, or an object persistence adaptor in Bio/DB/BioSQL? Either way, the code resolves which table to use by the association that needs to be queried. How these map to association tables is in the %association_entity_map mapping. You'll see that there is one for term => seqfeature, which is mapped to seqfeature_qualifier_value, for example. So if the association is between seqfeature and term, the adaptor will then use that table to access the value column. > > Does that make sense? > > -hilmar > > On Feb 7, 2012, at 2:49 PM, florian lajus wrote: > >> Hi, >> I have a problem with bio queries: How can I retrieve from datadabse a seqfeature according to its annotation (tagname and value)? >> The problem coming for value as we have "value" => "=>{bioentry_qualifier_value,seqfeature_qualifier_value,location_qualifier_value}.value", >> in the %slot_attribut_map of the base driver. >> Do you know a solution? >> > From flajus at labri.fr Wed Feb 8 04:52:04 2012 From: flajus at labri.fr (Lajus Florian) Date: Wed, 08 Feb 2012 10:52:04 +0100 Subject: [Bioperl-l] Question on seqfeature mapping In-Reply-To: <4F322C18.7010103@labri.fr> References: <4F0C7017.90803@inria.fr> <4F0D83D8.90402@inria.fr> <4F0D8A18.1080606@inria.fr> <4F0EA69E.8040203@inria.fr> <49CC4A91-5E47-4866-AD63-62DC7F649CE6@drycafe.net> <4F0FF883.80109@inria.fr> <4F0FF8FF.6070706@inria.fr> <4F199A27.8030408@inria.fr> <054FEA92-8C5E-47D6-86AF-F71DEAFE2B63@illinois.edu> <4F31802E.9030002@labri.fr> <33DA4F1D-3575-4925-A123-F6A77EAE753D@drycafe.net> <4F322C18.7010103@labri.fr> Message-ID: <4F3245C4.9040506@labri.fr> Never mind. It works much better if write: "Bio::Annotation::SimpleValue<=>Bio::SeqFeatureI instead. The problem is if we also want to find by primary_tag: my $query = Bio::DB::Query::BioQuery->new( -datacollections => ["Bio::Annotation::SimpleValue<=>Bio::SeqFeatureI qv"], -where => ["Bio::Annotation::SimpleValue::tagname = \'$tag\'","qv.value = \'$value\'","Bio::SeqFeatureI.primary_tag = ".$term_primary_key]); The slot type_term_id is not mapped to column for table seqfeature Le 08/02/2012 09:02, Lajus Florian a ?crit : > I'm talking about the interface. I'm far from understanding all the way > the association is done but for a query like this: my $query = > Bio::DB::Query::BioQuery->new( -datacollections => > ["Bio::Annotation::SimpleValue=>Bio::SeqFeatureI qv"], > -where => ["qv::value = \'$value\'"]); > > according to the sql generator I have this: > SELECT * FROM seqfeature, term qv WHERE seqfeature.term_id = qv.term_id > AND qv.=>{bioentry_qualifier_value,seqfeature_qualifier_value}.value = > 'Samatha Carter' > > Le 08/02/2012 05:37, Hilmar Lapp a ?crit : >> Hi Florian, >> >> Are you asking about the Bio::DB::Query::BioQuery interface, or an >> object persistence adaptor in Bio/DB/BioSQL? Either way, the code >> resolves which table to use by the association that needs to be >> queried. How these map to association tables is in the >> %association_entity_map mapping. You'll see that there is one for term >> => seqfeature, which is mapped to seqfeature_qualifier_value, for >> example. So if the association is between seqfeature and term, the >> adaptor will then use that table to access the value column. >> >> Does that make sense? >> >> -hilmar >> >> On Feb 7, 2012, at 2:49 PM, florian lajus wrote: >> >>> Hi, >>> I have a problem with bio queries: How can I retrieve from datadabse >>> a seqfeature according to its annotation (tagname and value)? >>> The problem coming for value as we have "value" => >>> "=>{bioentry_qualifier_value,seqfeature_qualifier_value,location_qualifier_value}.value", >>> >>> in the %slot_attribut_map of the base driver. >>> Do you know a solution? >>> >> > From cjfields at illinois.edu Wed Feb 8 14:24:00 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 8 Feb 2012 19:24:00 +0000 Subject: [Bioperl-l] bioperl-guts filtering Message-ID: <206492C2-3C71-4CD6-A619-57C25E91D207@illinois.edu> Just noticed a huge backlog of spam as well as lots of github commits caught up on the bioperl-guts-l listserv. There is also some buildbot stuff coming in, we need to change the sender to something other than bioperl-guts-l (spoofing the sender to be the same as the recipient is a common spam mechanism). I am setting the default non-member to simply reject incoming email for that list and to hold member posts, along with a message to post bioperl questions to bioperl-l. Any complaints on that? Seems like we should be keeping the posts to automated stuff anyway? chris From David.Messina at sbc.su.se Wed Feb 8 14:45:01 2012 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 8 Feb 2012 20:45:01 +0100 Subject: [Bioperl-l] bioperl-guts filtering In-Reply-To: <206492C2-3C71-4CD6-A619-57C25E91D207@illinois.edu> References: <206492C2-3C71-4CD6-A619-57C25E91D207@illinois.edu> Message-ID: Sounds good to me. D On Wed, Feb 8, 2012 at 20:24, Fields, Christopher J wrote: > Just noticed a huge backlog of spam as well as lots of github commits > caught up on the bioperl-guts-l listserv. There is also some buildbot > stuff coming in, we need to change the sender to something other than > bioperl-guts-l (spoofing the sender to be the same as the recipient is a > common spam mechanism). > > I am setting the default non-member to simply reject incoming email for > that list and to hold member posts, along with a message to post bioperl > questions to bioperl-l. Any complaints on that? Seems like we should be > keeping the posts to automated stuff anyway? > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From aminmom at hotmail.com Thu Feb 9 16:23:19 2012 From: aminmom at hotmail.com (Amin Momin) Date: Fri, 10 Feb 2012 02:53:19 +0530 Subject: [Bioperl-l] parsing swissprot using SeqIO::swiss In-Reply-To: References: Message-ID: Hi I am trying to parse the swissprot text files to capture the feature (FT) information. However I cant find any documentation to get the information from the functions _print_swissprot_FTHelper and _read_swissprot_FTHelper that capture the feature table. Can someone give any suggestions. Thanks, Amin From bosborne11 at verizon.net Thu Feb 9 16:37:17 2012 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 09 Feb 2012 16:37:17 -0500 Subject: [Bioperl-l] parsing swissprot using SeqIO::swiss In-Reply-To: References: Message-ID: <115DCF0A-A787-44FA-AA87-6B333B13BD73@verizon.net> Amin, I'm not sure why you want use these methods. You should not have to use them if all you want to do is get features from SwissProt files. Have you taken a look at the HOWTO? http://www.bioperl.org/wiki/HOWTO:Feature-Annotation Brian O. On Feb 9, 2012, at 4:23 PM, Amin Momin wrote: > > Hi > > I am trying to parse the swissprot text files to capture the feature (FT) information. However I cant find any documentation to get the information from the functions _print_swissprot_FTHelper and _read_swissprot_FTHelper that capture the feature table. Can someone give any suggestions. > > Thanks, > Amin > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From avilella at gmail.com Fri Feb 10 09:42:18 2012 From: avilella at gmail.com (Albert Vilella) Date: Fri, 10 Feb 2012 14:42:18 +0000 Subject: [Bioperl-l] blast2sam In-Reply-To: References: Message-ID: I just created a ticket: https://redmine.open-bio.org/issues/3324 For this blast2sam request I did a while ago: http://bioperl.org/pipermail/bioperl-l/2011-June/035229.html Thanks, Albert. From rbuels at gmail.com Fri Feb 10 12:35:57 2012 From: rbuels at gmail.com (Robert Buels) Date: Fri, 10 Feb 2012 12:35:57 -0500 Subject: [Bioperl-l] call for project ideas - Google Summer of Code Message-ID: <4F35557D.6090405@gmail.com> Hi all, Google's Summer of Code is coming round again, very soon now (mentoring organization applications are due this week). We need to update our project ideas for prospective Summer of Code interns. The rest of the page also needs updates, changing dates and such. There's a page on the BioPerl wiki, please have a look and add your ideas for intern projects. For more on Google Summer of Code, what it is and how it works, see their FAQ at http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2012/faqs I'm sure you all can think of plenty of ideas! Here's the page: http://www.bioperl.org/wiki/Google_Summer_of_Code Please have a look, add your project ideas, and/or delete ones that have already been done, are no longer relevant, or no longer have a mentor available. Rob From shalabh.sharma7 at gmail.com Fri Feb 10 13:30:46 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 10 Feb 2012 13:30:46 -0500 Subject: [Bioperl-l] downloading 16s from NCBI Message-ID: Hi, Is it possible to download all 16s sequences from NCBI? I tried to look at Bio::DB::GenBank, but i am not sure if u can do it for all 16s . I would really appreciate if anyone can help me out. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From florent.angly at gmail.com Fri Feb 10 19:26:40 2012 From: florent.angly at gmail.com (Florent Angly) Date: Sat, 11 Feb 2012 10:26:40 +1000 Subject: [Bioperl-l] downloading 16s from NCBI In-Reply-To: References: Message-ID: <4F35B5C0.50704@gmail.com> Hi Shalabh, There are specialized databases that contain only 16S rRNA genes, namely GreenGenes, RDP, Silva. You should have a look. Regards. Florent On 11/02/12 04:30, shalabh sharma wrote: > Hi, > Is it possible to download all 16s sequences from NCBI? > I tried to look at Bio::DB::GenBank, but i am not sure if u can do it for > all 16s . > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh > > From thomas.sharpton at gmail.com Fri Feb 10 19:42:58 2012 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Fri, 10 Feb 2012 16:42:58 -0800 Subject: [Bioperl-l] downloading 16s from NCBI In-Reply-To: <4F35B5C0.50704@gmail.com> References: <4F35B5C0.50704@gmail.com> Message-ID: Hi Shalabh, Florent is correct about the specialized databases - they are great resources. To answer your specific question about all 16S sequences in GenBank, see this webpage, which contains a 16S database download link: http://www.ncbi.nlm.nih.gov/books/NBK82331/#Nov11.New_BLAST_16S_Prokaryotic_Ribosoma Best, Tom On Feb 10, 2012, at 4:26 PM, Florent Angly wrote: > Hi Shalabh, > There are specialized databases that contain only 16S rRNA genes, > namely GreenGenes, RDP, Silva. You should have a look. > Regards. > Florent > > > On 11/02/12 04:30, shalabh sharma wrote: >> Hi, >> Is it possible to download all 16s sequences from NCBI? >> I tried to look at Bio::DB::GenBank, but i am not sure if u can do >> it for >> all 16s . >> >> I would really appreciate if anyone can help me out. >> >> Thanks >> Shalabh >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Sat Feb 11 17:24:01 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Sat, 11 Feb 2012 17:24:01 -0500 Subject: [Bioperl-l] Question on seqfeature mapping In-Reply-To: <4F3245C4.9040506@labri.fr> References: <4F0C7017.90803@inria.fr> <4F0D83D8.90402@inria.fr> <4F0D8A18.1080606@inria.fr> <4F0EA69E.8040203@inria.fr> <49CC4A91-5E47-4866-AD63-62DC7F649CE6@drycafe.net> <4F0FF883.80109@inria.fr> <4F0FF8FF.6070706@inria.fr> <4F199A27.8030408@inria.fr> <054FEA92-8C5E-47D6-86AF-F71DEAFE2B63@illinois.edu> <4F31802E.9030002@labri.fr> <33DA4F1D-3575-4925-A123-F6A77EAE753D@drycafe.net> <4F322C18.7010103@labri.fr> <4F3245C4.9040506@labri.fr> Message-ID: On Feb 8, 2012, at 4:52 AM, Lajus Florian wrote: > Never mind. It works much better if write: "Bio::Annotation::SimpleValue<=>Bio::SeqFeatureI instead. Yes indeed, for n-n associations you need to use "<=>". > The problem is if we also want to find by primary_tag: > my $query = Bio::DB::Query::BioQuery->new( > -datacollections => ["Bio::Annotation::SimpleValue<=>Bio::SeqFeatureI qv"], > -where => ["Bio::Annotation::SimpleValue::tagname = \'$tag\'","qv.value = \'$value\'","Bio::SeqFeatureI.primary_tag = ".$term_primary_key]); > > The slot type_term_id is not mapped to column for table seqfeature It is actually - both the mapping for term and for seqfeature have the mapping of primary_tag to type_term_id. What error are you getting? And shouldn't Bio::SeqFeatureI.primary_tag have a double colon instead of the dot? (Also, BTW, you should be able to say "qv.tagname" instead of "Bio::Annotation::SimpleValue::tagname", just as you do for the value column - have you tried that and found it not to work? Either way, what you have for that part looks correct.) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From dan.bolser at gmail.com Mon Feb 13 07:53:43 2012 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 13 Feb 2012 12:53:43 +0000 Subject: [Bioperl-l] Fwd: Interested in Variation? In-Reply-To: References: Message-ID: Job at the EBI: "... the primary responsibility of the post-holder will be the development of pipelines and storage solutions for variation data deriving from whole genome re-sequencing." http://goo.gl/eQrRu or http://ig14.i-grasp.com/fe/tpl_embl01.asp?s=LktVsYDaNlCOtQqCli&jobid=47627,4187528723&key=45520467&c=126152212583&pagestamp=dbnjwyufpgvrmyvkmt Cheers, Dan. From rondonbio at yahoo.com.br Wed Feb 15 13:59:15 2012 From: rondonbio at yahoo.com.br (Rondon Neto) Date: Wed, 15 Feb 2012 10:59:15 -0800 (PST) Subject: [Bioperl-l] Bio::SearchIO::XML::BlastHandler problems Message-ID: <1329332355.18626.YahooMailNeo@web130206.mail.mud.yahoo.com> Hello everybody, unffortunally I'm having problems with Bio::SearchIO, that I wasn't having before. Perl is returning this to me: " Can't locate object method "_eventHandler" via package "Bio::SearchIO::XML::BlastHandler" at /usr/local/share/perl/5.10.1/Bio/SearchIO/blastxml.pm line 151. " I believe it's happening because I installed a genome assembler that use a different version of Bio::SearchIO that I have before. thank you, Rondon the subroutine that use this module is above: sub nucleotide_coverage{ #Bio::SearchIO dependent #This subroutine returns a Hash and a file with nucleotide coverage #for each query in an blast alignment xlm file. The input is the #alignment file and the name of output (just the index). #USAGE::: ?$ref = nucleotide_coverage("blast.xml", "gene_family"); ? ? ? ? my ($alignment_file, $gene_family) = @_; ? ? ? ? my $alignment = new Bio::SearchIO ( -format => 'blastxml', ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -file ? => $alignment_file ); ? ? ? ? print "Parseando o resultado do BLAST\n"; ? ? ? ? my %positions; ? ? ? ? my @used_reads; ? ? ? ? open OUT, ">reads_per_CDS.txt"; ? ? ? ? while (my $result = $alignment->next_result) { ? ? ? ? ? ? ? ? my $query_name = $result->query_name(); ? ? ? ? ? ? ? ? my $tam = $result -> query_length(); ? ? ? ? ? ? ? ? for (0..$tam-1){ ${$positions{$query_name}}[$_] = 0 } ? ? ? ? ? ? ? ? while (my $hit = $result->next_hit) { ? ? ? ? ? ? ? ? ? ? ? ? my $hit_name = $hit->name; ? ? ? ? ? ? ? ? ? ? ? ? LABEL: ? ? ? ? ? ? ? ? ? ? ? ? foreach my $read (@used_reads) { ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? if ( $read eq $hit_name ) { ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? next LABEL; ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? } ? ? ? ? ? ? ? ? ? ? ? ? } ? ? ? ? ? ? ? ? ? ? ? ? print OUT "$query_name\t$hit_name\n"; ? ? ? ? ? ? ? ? ? ? ? ? while (my $hsp = $hit->next_hsp) { ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? my $query_name = $result->query_name(); ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? my @pos = $hsp->seq_inds('query','identical'); ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? foreach my $num (@pos) { ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ${$positions{$query_name}}[$num-1]++; ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? } ? ? ? ? ? ? ? ? ? ? ? ? } ? ? ? ? ? ? ? ? push (@used_reads, $hit_name); ? ? ? ? ? ? ? ? } ? ? ? ? } ? ? ? ? close OUT; ? ? ? ? my $outfile = "nucleotide_coverage.txt"; ? ? ? ? open OUT, ">$outfile" or die $!; ? ? ? ? foreach my $key (keys %positions){ ? ? ? ? ? ? ? ? print OUT "$key\t@{$positions{$key}}\n"; ? ? ? ? } ? ? ? ? close OUT; ? ? ? ? return \%positions; } From rondonbio at yahoo.com.br Wed Feb 15 14:43:48 2012 From: rondonbio at yahoo.com.br (Rondon Neto) Date: Wed, 15 Feb 2012 11:43:48 -0800 (PST) Subject: [Bioperl-l] Bio::SearchIO::XML::BlastHandler problems In-Reply-To: <1329335001.11066.YahooMailNeo@web130202.mail.mud.yahoo.com> References: <1329332355.18626.YahooMailNeo@web130206.mail.mud.yahoo.com> <1329335001.11066.YahooMailNeo@web130202.mail.mud.yahoo.com> Message-ID: <1329335028.66134.YahooMailNeo@web130206.mail.mud.yahoo.com> sorry, I didn't ask anything. So.. Can anyone help me to solve it? thanks again Rondon ________________________________ De: Rondon Neto Para: "Bioperl-l at lists.open-bio.org" Enviadas: Quarta-feira, 15 de Fevereiro de 2012 16:59 Assunto: [Bioperl-l] Bio::SearchIO::XML::BlastHandler problems Hello everybody, unffortunally I'm having problems with Bio::SearchIO, that I wasn't having before. Perl is returning this to me: " Can't locate object method "_eventHandler" via package "Bio::SearchIO::XML::BlastHandler" at /usr/local/share/perl/5.10.1/Bio/SearchIO/blastxml.pm line 151. " I believe it's happening because I installed a genome assembler that use a different version of Bio::SearchIO that I have before. thank you, Rondon the subroutine that use this module is above: sub nucleotide_coverage{ #Bio::SearchIO dependent #This subroutine returns a Hash and a file with nucleotide coverage #for each query in an blast alignment xlm file. The input is the #alignment file and the name of output (just the index). #USAGE::: ?$ref = nucleotide_coverage("blast.xml", "gene_family"); ? ? ? ? my ($alignment_file, $gene_family) = @_; ? ? ? ? my $alignment = new Bio::SearchIO ( -format => 'blastxml', ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -file ? => $alignment_file ); ? ? ? ? print "Parseando o resultado do BLAST\n"; ? ? ? ? my %positions; ? ? ? ? my @used_reads; ? ? ? ? open OUT, ">reads_per_CDS.txt"; ? ? ? ? while (my $result = $alignment->next_result) { ? ? ? ? ? ? ? ? my $query_name = $result->query_name(); ? ? ? ? ? ? ? ? my $tam = $result -> query_length(); ? ? ? ? ? ? ? ? for (0..$tam-1){ ${$positions{$query_name}}[$_] = 0 } ? ? ? ? ? ? ? ? while (my $hit = $result->next_hit) { ? ? ? ? ? ? ? ? ? ? ? ? my $hit_name = $hit->name; ? ? ? ? ? ? ? ? ? ? ? ? LABEL: ? ? ? ? ? ? ? ? ? ? ? ? foreach my $read (@used_reads) { ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? if ( $read eq $hit_name ) { ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? next LABEL; ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? } ? ? ? ? ? ? ? ? ? ? ? ? } ? ? ? ? ? ? ? ? ? ? ? ? print OUT "$query_name\t$hit_name\n"; ? ? ? ? ? ? ? ? ? ? ? ? while (my $hsp = $hit->next_hsp) { ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? my $query_name = $result->query_name(); ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? my @pos = $hsp->seq_inds('query','identical'); ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? foreach my $num (@pos) { ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ${$positions{$query_name}}[$num-1]++; ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? } ? ? ? ? ? ? ? ? ? ? ? ? } ? ? ? ? ? ? ? ? push (@used_reads, $hit_name); ? ? ? ? ? ? ? ? } ? ? ? ? } ? ? ? ? close OUT; ? ? ? ? my $outfile = "nucleotide_coverage.txt"; ? ? ? ? open OUT, ">$outfile" or die $!; ? ? ? ? foreach my $key (keys %positions){ ? ? ? ? ? ? ? ? print OUT "$key\t@{$positions{$key}}\n"; ? ? ? ? } ? ? ? ? close OUT; ? ? ? ? return \%positions; } _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From flajus at labri.fr Wed Feb 15 03:04:24 2012 From: flajus at labri.fr (Lajus Florian) Date: Wed, 15 Feb 2012 09:04:24 +0100 Subject: [Bioperl-l] Question on seqfeature mapping In-Reply-To: References: <4F0C7017.90803@inria.fr> <4F0D83D8.90402@inria.fr> <4F0D8A18.1080606@inria.fr> <4F0EA69E.8040203@inria.fr> <49CC4A91-5E47-4866-AD63-62DC7F649CE6@drycafe.net> <4F0FF883.80109@inria.fr> <4F0FF8FF.6070706@inria.fr> <4F199A27.8030408@inria.fr> <054FEA92-8C5E-47D6-86AF-F71DEAFE2B63@illinois.edu> <4F31802E.9030002@labri.fr> <33DA4F1D-3575-4925-A123-F6A77EAE753D@drycafe.net> <4F322C18.7010103@labri.fr> <4F3245C4.9040506@labri.fr> Message-ID: <4F3B6708.9090307@labri.fr> In fact, the query works fine. I don't understand why it didn't work when I wrote the mail... Le 11/02/2012 23:24, Hilmar Lapp a ?crit : > > On Feb 8, 2012, at 4:52 AM, Lajus Florian wrote: > >> Never mind. It works much better if write: "Bio::Annotation::SimpleValue<=>Bio::SeqFeatureI instead. > > Yes indeed, for n-n associations you need to use "<=>". > >> The problem is if we also want to find by primary_tag: >> my $query = Bio::DB::Query::BioQuery->new( >> -datacollections => ["Bio::Annotation::SimpleValue<=>Bio::SeqFeatureI qv"], >> -where => ["Bio::Annotation::SimpleValue::tagname = \'$tag\'","qv.value = \'$value\'","Bio::SeqFeatureI.primary_tag = ".$term_primary_key]); >> >> The slot type_term_id is not mapped to column for table seqfeature > > It is actually - both the mapping for term and for seqfeature have the mapping of primary_tag to type_term_id. What error are you getting? > > And shouldn't Bio::SeqFeatureI.primary_tag have a double colon instead of the dot? (Also, BTW, you should be able to say "qv.tagname" instead of "Bio::Annotation::SimpleValue::tagname", just as you do for the value column - have you tried that and found it not to work? Either way, what you have for that part looks correct.) > > -hilmar From tarakaramji at gmail.com Thu Feb 16 14:17:37 2012 From: tarakaramji at gmail.com (tarakaramji M) Date: Fri, 17 Feb 2012 00:47:37 +0530 Subject: [Bioperl-l] Query regarding retrieval sequences Message-ID: hi, i wander if we can retrieve sequences together at once of reuired genes. in more simple words i have set of gene or EST IDs for which i have get the sequence from database together? -- tarakaramji From admin at yapcna.org Thu Feb 16 16:45:46 2012 From: admin at yapcna.org (JT Smith) Date: Thu, 16 Feb 2012 15:45:46 -0600 Subject: [Bioperl-l] YAPC::NA & BioPerl Message-ID: <9E03C4BE-C93F-4EB8-9BC5-24F9492C23E3@yapcna.org> I'd like to run a BioPerl track at YAPC::NA 2012. Our theme this year is "Perl in the Wild", which is all about real-world applications of Perl. I can't imagine anything more real-world than what you folks do with Perl. If you are interested in giving a talk, you can submit one here: http://act.yapcna.org/2012/newtalk If we get enough submissions, I'll set aside an entire room just for BioPerl talks. Likewise, if you want to run a mini-workshop to get people bootstrapped on BioPerl, I'd love to have that as well. I just need someone to submit that as a talk. We have our workshops limited to 2 hours, but if you need more time than that, you can submit multiple sessions and I'll schedule them back to back. If there's anything else I can do to make YAPC::NA 2012 a good destination for BioPerl, please let me know. For those of you who don't know, YAPC::NA 2012 will be held June 13-15 in Madison, WI. Madison is a big BioTech town, so I'm sure many of you are familiar with it, and even have colleagues who work here. JT Smith Director, YAPC::NA 2012 http://www.yapcna.org From l.m.timmermans at students.uu.nl Fri Feb 17 08:02:21 2012 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Fri, 17 Feb 2012 14:02:21 +0100 Subject: [Bioperl-l] YAPC::NA & BioPerl In-Reply-To: <9E03C4BE-C93F-4EB8-9BC5-24F9492C23E3@yapcna.org> References: <9E03C4BE-C93F-4EB8-9BC5-24F9492C23E3@yapcna.org> Message-ID: On Thu, Feb 16, 2012 at 10:45 PM, JT Smith wrote: > I'd like to run a BioPerl track at YAPC::NA 2012. Our theme this year is > "Perl in the Wild", which is all about real-world applications of Perl. I > can't imagine anything more real-world than what you folks do with Perl. > > If you are interested in giving a talk, you can submit one here: > http://act.yapcna.org/2012/newtalk > > If we get enough submissions, I'll set aside an entire room just for > BioPerl talks. > > Likewise, if you want to run a mini-workshop to get people bootstrapped on > BioPerl, I'd love to have that as well. I just need someone to submit that > as a talk. We have our workshops limited to 2 hours, but if you need more > time than that, you can submit multiple sessions and I'll schedule them > back to back. > > If there's anything else I can do to make YAPC::NA 2012 a good destination > for BioPerl, please let me know. > > For those of you who don't know, YAPC::NA 2012 will be held June 13-15 in > Madison, WI. Madison is a big BioTech town, so I'm sure many of you are > familiar with it, and even have colleagues who work here. > I think that's be an excellent idea. Anything that would bring the bioinformatics community closer to the open source community would be a good thing; they often seem rather absent at conferences and the like in my experience. Leon From hlapp at drycafe.net Fri Feb 17 09:45:51 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Fri, 17 Feb 2012 09:45:51 -0500 Subject: [Bioperl-l] YAPC::NA & BioPerl In-Reply-To: References: <9E03C4BE-C93F-4EB8-9BC5-24F9492C23E3@yapcna.org> Message-ID: <039387B1-114D-47E1-B86B-7057FCF6216C@drycafe.net> On Feb 17, 2012, at 8:02 AM, Leon Timmermans wrote: > I think that's be an excellent idea. Anything that would bring the > bioinformatics community closer to the open source community would be a > good thing; they often seem rather absent at conferences and the like in my > experience. +1 I agree very much. Attending the pure open-source conferences has been a challenge for me, frankly - I've wanted to go to YAPC::NA or OSCON for years, but finding both the time and budget among the science conferences that I more or less have to attend already is real difficult. I'd imagine that others face similar challenges. So any efforts at cross-fertilization are good. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From cjfields at illinois.edu Fri Feb 17 10:30:54 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 17 Feb 2012 15:30:54 +0000 Subject: [Bioperl-l] YAPC::NA & BioPerl In-Reply-To: <039387B1-114D-47E1-B86B-7057FCF6216C@drycafe.net> References: <9E03C4BE-C93F-4EB8-9BC5-24F9492C23E3@yapcna.org> <039387B1-114D-47E1-B86B-7057FCF6216C@drycafe.net> Message-ID: <3C915E17-46A8-4385-87E2-834F2F68CAFF@illinois.edu> On Feb 17, 2012, at 8:45 AM, Hilmar Lapp wrote: > On Feb 17, 2012, at 8:02 AM, Leon Timmermans wrote: > >> I think that's be an excellent idea. Anything that would bring the >> bioinformatics community closer to the open source community would be a >> good thing; they often seem rather absent at conferences and the like in my >> experience. > > +1 > > I agree very much. Attending the pure open-source conferences has been a challenge for me, frankly - I've wanted to go to YAPC::NA or OSCON for years, but finding both the time and budget among the science conferences that I more or less have to attend already is real difficult. I'd imagine that others face similar challenges. So any efforts at cross-fertilization are good. > > -hilmar Yes, this is a significant problem for me as well (not to mention the constraints I have in my current job), so committing to anything right now is a bit tricky. I do think a 'warts-and-all' bioperl talk would be good, mainly to emphasize the point that we do cover a lot of graound in the toolkit but that there are definitely areas that could be improved within the code, the distribution, etc. A mini version of this could also be used at BOSC I suppose. chris From cjfields at illinois.edu Fri Feb 17 10:47:29 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 17 Feb 2012 15:47:29 +0000 Subject: [Bioperl-l] YAPC::NA & BioPerl In-Reply-To: <3C915E17-46A8-4385-87E2-834F2F68CAFF@illinois.edu> References: <9E03C4BE-C93F-4EB8-9BC5-24F9492C23E3@yapcna.org> <039387B1-114D-47E1-B86B-7057FCF6216C@drycafe.net> <3C915E17-46A8-4385-87E2-834F2F68CAFF@illinois.edu> Message-ID: On Feb 17, 2012, at 9:30 AM, Fields, Christopher J wrote: > On Feb 17, 2012, at 8:45 AM, Hilmar Lapp wrote: > >> On Feb 17, 2012, at 8:02 AM, Leon Timmermans wrote: >> >>> I think that's be an excellent idea. Anything that would bring the >>> bioinformatics community closer to the open source community would be a >>> good thing; they often seem rather absent at conferences and the like in my >>> experience. >> >> +1 >> >> I agree very much. Attending the pure open-source conferences has been a challenge for me, frankly - I've wanted to go to YAPC::NA or OSCON for years, but finding both the time and budget among the science conferences that I more or less have to attend already is real difficult. I'd imagine that others face similar challenges. So any efforts at cross-fertilization are good. >> >> -hilmar > > Yes, this is a significant problem for me as well (not to mention the constraints I have in my current job), so committing to anything right now is a bit tricky. > > I do think a 'warts-and-all' bioperl talk would be good, mainly to emphasize the point that we do cover a lot of graound in the toolkit but that there are definitely areas that could be improved within the code, the distribution, etc. A mini version of this could also be used at BOSC I suppose. > > chris Reread that and it gives a more positive expectation than I realized. I just want to re-emphasize the bit above about committing so everyone's expectations are tempered; I am really constrained on time commitments (particularly those involving travel) until fall. However, I fully support anyone giving a talk at YAPC, though, and I can help out in whatever way possible, including coding. chris From dcmertens.perl at gmail.com Fri Feb 17 12:20:43 2012 From: dcmertens.perl at gmail.com (David Mertens) Date: Fri, 17 Feb 2012 11:20:43 -0600 Subject: [Bioperl-l] YAPC::NA & BioPerl Message-ID: Hey folks - I would really like to see BioPerl have a presence at YAPC. For what it's worth, I have submitted a couple of talks about PDL at YAPC this year, one of them being an introduction to PDL. Even if the talk isn't accepted, I will be more than happy to meet up with anybody who has questions about PDL. As for BioPerl at YAPC, I know very little about BioPerl but would really like to learn more. I would be *thrilled* if somebody could give a min-workshop! This would make for a very scientific YAPC: an intro to PDL *and* an intro to BioPerl. :-D David -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." -- Brian Kernighan From jason.stajich at gmail.com Fri Feb 17 12:42:29 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 17 Feb 2012 09:42:29 -0800 Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [BioPerl - Bug #3328] (New) segregating sites calculation fails on gapped sequences References: Message-ID: This should be an easy bug for someone to fix -- I am pretty sure the solution is to ignore gapped columns but I haven't looked deeper and I don't have any time right now to work on bioperl fixes so be great if someone wanted to help out here. The redmine bug info is appended below. Jason Begin forwarded message: > From: redmine at redmine.open-bio.org > Subject: [Bioperl-guts-l] [BioPerl - Bug #3328] (New) segregating sites calculation fails on gapped sequences > Date: February 17, 2012 9:39:42 AM PST > To: bioperl-guts-l at lists.open-bio.org > > > Issue #3328 has been reported by Jason Stajich. > > ---------------------------------------- > Bug #3328: segregating sites calculation fails on gapped sequences > https://redmine.open-bio.org/issues/3328 > > Author: Jason Stajich > Status: New > Priority: Normal > Assignee: Bioperl Guts > Category: Bio::PopGen > Target version: > URL: > > > > I am Cheng-Ruei Lee, a graduate student in Duke Biology. I'm analyzing many DNA alignments of a plant species. > I first used (Bio::PopGen::Utilities -> aln_to_population()) to read in the fasta format alignment, and then use Bio::PopGen::Statistics to calculate some statistics without outgroup. Most gene work fine, but I think a bug happened when it meets alignments like this: > >> Genotype1 > ATGATCGTAGCTGATGCTGTGATCGATCGCTAGCTAGCTCGA >> Genotype2 > ------------GATGCTGTGATCGATCGCTAGCTAGCTCGA >> Genotype3 > ------------GATGCTGTGATCGATCGCTAGCTAGCTCGA >> Genotype4 > ------------GATGCTGTGATCGATCGCTAGCTAGCTCGA > > I get this data set from other people. I guess due to the annotation program people used, the definition of coding sequence is much longer in genotype 1 than in other genotypes. This creates a long stretch of gap in the very beginning. Whenever Bio::PopGen meets this kind of genes, the number of singleton counts boost a lot - seems like the long stretch of sites with gap is also counted as singletons. Also, some Fu & Li statistics boosted. The "number of segregation sites" seems not to be affected. (And therefore, there are genes with hundreds of singleton sites but only a few total segregating sites.) > May be a possible bug in Bio::PopGen::Utilities when reading in the data? Or when calculating singletons? > > Sincerely, > Cheng-Ruei Lee > > > -- > You have received this notification because you have either subscribed to it, or are involved in it. > To change your notification preferences, please click here and login: http://redmine.open-bio.org > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From admin at yapcna.org Fri Feb 17 13:51:36 2012 From: admin at yapcna.org (JT Smith) Date: Fri, 17 Feb 2012 12:51:36 -0600 Subject: [Bioperl-l] YAPC::NA & BioPerl In-Reply-To: References: Message-ID: <2B17B64F-70F2-4C40-90CE-35C3A6953DCA@yapcna.org> I'm also going to get a PDL track started at YAPC. I'll be announcing that next week. JT Smith Director, YAPC::NA 2012 http://www.yapcna.org PS Another way you can support Perl is by adding a link to the bottom of your web site, or on your credits / thank you page to http://www.perl.org On Feb 17, 2012, at 11:20 AM, David Mertens wrote: > Hey folks - > > I would really like to see BioPerl have a presence at YAPC. For what it's > worth, I have submitted a couple of talks about PDL at YAPC this year, one > of them being an introduction to PDL. Even if the talk isn't accepted, I > will be more than happy to meet up with anybody who has questions about > PDL. As for BioPerl at YAPC, I know very little about BioPerl but would > really like to learn more. I would be *thrilled* if somebody could give a > min-workshop! This would make for a very scientific YAPC: an intro to PDL > *and* an intro to BioPerl. :-D > > David > > -- > "Debugging is twice as hard as writing the code in the first place. > Therefore, if you write the code as cleverly as possible, you are, > by definition, not smart enough to debug it." -- Brian Kernighan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Feb 17 14:11:48 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 17 Feb 2012 19:11:48 +0000 Subject: [Bioperl-l] Bio::SearchIO::XML::BlastHandler problems In-Reply-To: <1329335028.66134.YahooMailNeo@web130206.mail.mud.yahoo.com> References: <1329332355.18626.YahooMailNeo@web130206.mail.mud.yahoo.com> <1329335001.11066.YahooMailNeo@web130202.mail.mud.yahoo.com> <1329335028.66134.YahooMailNeo@web130206.mail.mud.yahoo.com> Message-ID: <3F5F0A6E-C51E-4DF9-A293-924A5AA5245B@illinois.edu> I'm not sure what you mean. Do you mean the installed assembler is using a specific version of BioPerl than what you have installed elsewhere (e.g. there are two local versions)? If so, it's possible the two versions are somehow getting mixed up (odd that it would happen, though). You can probably add a 'use lib PATH' directive in the script to be more explicit. BLAST XML parsing requires two different parsing mechanisms based upon the type of BLAST run (PSI-BLAST or normal BLAST); older versions of bioperl prior to 1.6 lacked this, though it's possible the last release in the 1.5 series did have it, not sure. chris On Feb 15, 2012, at 1:43 PM, Rondon Neto wrote: > sorry, I didn't ask anything. So.. > > > Can anyone help me to solve it? > > thanks again > > Rondon > > > > ________________________________ > De: Rondon Neto > Para: "Bioperl-l at lists.open-bio.org" > Enviadas: Quarta-feira, 15 de Fevereiro de 2012 16:59 > Assunto: [Bioperl-l] Bio::SearchIO::XML::BlastHandler problems > > Hello everybody, > unffortunally I'm having problems with Bio::SearchIO, that I wasn't having before. Perl is returning this to me: > " Can't locate object method "_eventHandler" via package "Bio::SearchIO::XML::BlastHandler" at /usr/local/share/perl/5.10.1/Bio/SearchIO/blastxml.pm line 151. " > > I believe it's happening because I installed a genome assembler that use a different version of Bio::SearchIO that I have before. > > > thank you, > > Rondon > > the subroutine that use this module is above: > > sub nucleotide_coverage{ > #Bio::SearchIO dependent > #This subroutine returns a Hash and a file with nucleotide coverage > #for each query in an blast alignment xlm file. The input is the > #alignment file and the name of output (just the index). > #USAGE::: $ref = nucleotide_coverage("blast.xml", > "gene_family"); > > my ($alignment_file, $gene_family) = @_; > > > my $alignment = new Bio::SearchIO ( -format => 'blastxml', > -file => $alignment_file ); > > print "Parseando o resultado do BLAST\n"; > my %positions; > my @used_reads; > open OUT, ">reads_per_CDS.txt"; > while (my $result = $alignment->next_result) { > my $query_name = $result->query_name(); > my $tam = $result -> query_length(); > > for (0..$tam-1){ ${$positions{$query_name}}[$_] = 0 } > while (my $hit = $result->next_hit) { > my $hit_name = $hit->name; > LABEL: > foreach my $read (@used_reads) { > if ( $read eq $hit_name ) { > next LABEL; > } > > } > print OUT "$query_name\t$hit_name\n"; > while (my $hsp = $hit->next_hsp) { > my $query_name = $result->query_name(); > my @pos = $hsp->seq_inds('query','identical'); > foreach my $num (@pos) { > ${$positions{$query_name}}[$num-1]++; > > } > } > push (@used_reads, $hit_name); > } > } > close OUT; > > my $outfile = "nucleotide_coverage.txt"; > open OUT, ">$outfile" or die $!; > foreach my $key (keys %positions){ > > print OUT "$key\t@{$positions{$key}}\n"; > } > > close OUT; > > return \%positions; > > } > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Fri Feb 17 17:40:29 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 17 Feb 2012 22:40:29 +0000 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI E-Utilities Update In-Reply-To: References: Message-ID: Hi all, Just FYI, the following was also changed in this week's Entrez update to EFetch 2.0 (see forwarded email below). This was breaking some Biopython scripts - depending on how they passed in the id parameters. It turns out we relied on the undocumented and now withdrawn form in one of our examples, so some users had copied this style. Biopython 1.59 will solve this. I know BioJava is looking at the more publicised changes to retmode - I don't know if BioPerl or BioRuby was affected. Regards, Peter ---------- Forwarded message ---------- From: Date: Fri, Feb 17, 2012 at 7:09 PM Subject: [Utilities-announce] NCBI E-Utilities Update To: NLM/NCBI List utilities-announce The most recent NCBI E-Utilities update includes a more stringent check for correct URL parameters. EFetch URLs with multiple IDs must be entered as: id=1,2,3 EFetch no longer accepts invalid URL parameters, e.g., id=1&id=2&id=3 Please see the online E-Utilities help for additional information: http://www.ncbi.nlm.nih.gov/books/NBK25500/ EFetch online help: http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch Thank you. _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From cjfields at illinois.edu Fri Feb 17 21:54:44 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Sat, 18 Feb 2012 02:54:44 +0000 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI E-Utilities Update In-Reply-To: References: Message-ID: On Feb 17, 2012, at 4:40 PM, Peter Cock wrote: > Hi all, > > Just FYI, the following was also changed in this week's Entrez > update to EFetch 2.0 (see forwarded email below). > > This was breaking some Biopython scripts - depending on how > they passed in the id parameters. It turns out we relied on the > undocumented and now withdrawn form in one of our examples, > so some users had copied this style. Biopython 1.59 will solve > this. > > I know BioJava is looking at the more publicised changes to > retmode - I don't know if BioPerl or BioRuby was affected. No, I checked the BioPerl modules against regression tests today after seeing the announcement. Seems everything is fine; the main issue on NCBI's end that seemed to break things was how multiple IDs were joined. > Regards, > > Peter chris From jalevine at email.arizona.edu Fri Feb 17 21:03:14 2012 From: jalevine at email.arizona.edu (Joshua Levine) Date: Fri, 17 Feb 2012 19:03:14 -0700 Subject: [Bioperl-l] Can't get bioperl to install Message-ID: Hello, I used to have bioperl working just fine on my machine (MacBook Pro, running OS X v10.6.8). I tried installing some new perl modules ( DBI & DBD::mysql) and have not gotten them to work yet, but when I went back to run a script that uses bioperl, I got this error message: Can't locate Bio/SeqIO.pm in @INC... Now I just want to get bioperl working again, even if I can't also get DBI & DBD:mysql working. Any help would be greatly appreciated -Josh -- Joshua A. Levine Graduate Student University of Arizona Ecology and Evolutionary Biology BioScience West 121 jalevine at email.arizona.edu From cjfields at illinois.edu Fri Feb 17 22:16:18 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Sat, 18 Feb 2012 03:16:18 +0000 Subject: [Bioperl-l] Can't get bioperl to install In-Reply-To: References: Message-ID: <3B75BC6C-B624-4FF2-AF91-8D3146E4259D@illinois.edu> The error indicates that BioPerl isn't in your @INC (paths where perl searches for modules). There are a number of things that could cause this, namely changing the version of perl you are using, upgrades to the OS, changes in local configuration, e.g. missing PERL5LIB. In short, anything that can possibly modify the search path. You could possibly try adding the bioperl directory to PERL5LIB locally. What you *don't* want to do is attempt to add a version-specific path (say, a directory for an older version of perl in the system perl library). See this for more details on @INC: http://stackoverflow.com/questions/2526804/how-is-perls-inc-constructed-aka-what-are-all-the-ways-of-affecting-where-pe chris On Feb 17, 2012, at 8:03 PM, Joshua Levine wrote: > Hello, > > I used to have bioperl working just fine on my machine (MacBook Pro, > running OS X v10.6.8). I tried installing some new perl modules ( DBI & > DBD::mysql) and have not gotten them to work yet, but when I went back to > run a script that uses bioperl, I got this error message: > > Can't locate Bio/SeqIO.pm in @INC... > > Now I just want to get bioperl working again, even if I can't also get DBI > & DBD:mysql working. > > Any help would be greatly appreciated > -Josh > > > -- > Joshua A. Levine > > Graduate Student > University of Arizona > Ecology and Evolutionary Biology > BioScience West 121 > jalevine at email.arizona.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From carandraug+dev at gmail.com Mon Feb 20 19:44:16 2012 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Tue, 21 Feb 2012 00:44:16 +0000 Subject: [Bioperl-l] Search for sequence inside sequence Message-ID: Hi everyone is there a method to check if a sequence exists inside another one? I have a sequence object and a small string of a sequence. I'd like to know if it's present? Currently I only have say "Found on $file" if $seq->seq =~ /$string/i; It solves my problem but this doesn't look very smart to me. Is there a better way? Maybe even something that would also accept S to match against [G|C] for example? Thanks in advance, Carn? From Kevin.M.Brown at asu.edu Mon Feb 20 20:12:33 2012 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 20 Feb 2012 18:12:33 -0700 Subject: [Bioperl-l] Search for sequence inside sequence In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B408219A7A@EX02.asurite.ad.asu.edu> Build up the Regexp outside of that line and then use it in there. $string = "[GC]TAAGGACAA[AC]..."; print "Found in $file" if $seq->seq =~ /$string/I; If you want your code to do substitution for the extra characters, then you'll need to interpret them into their regex equivalents. Such as reading in the string desired, and doing s/s/\[gc\]/I, etc... -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Carn? Draug Sent: Monday, February 20, 2012 5:44 PM To: bioperl mailing list Subject: [Bioperl-l] Search for sequence inside sequence Hi everyone is there a method to check if a sequence exists inside another one? I have a sequence object and a small string of a sequence. I'd like to know if it's present? Currently I only have say "Found on $file" if $seq->seq =~ /$string/i; It solves my problem but this doesn't look very smart to me. Is there a better way? Maybe even something that would also accept S to match against [G|C] for example? Thanks in advance, Carn? _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Feb 20 21:28:03 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 21 Feb 2012 02:28:03 +0000 Subject: [Bioperl-l] Search for sequence inside sequence In-Reply-To: <1A4207F8295607498283FE9E93B775B408219A7A@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B408219A7A@EX02.asurite.ad.asu.edu> Message-ID: <16F7B63A-5F72-41F3-94AE-63B7E53FD50A@illinois.edu> There is a BioPerl-ish way of doing this, namely Bio::Tools::SeqPattern. Might be worth a look (though a simple regex should also suffice). chris On Feb 20, 2012, at 7:12 PM, Kevin Brown wrote: > Build up the Regexp outside of that line and then use it in there. > > $string = "[GC]TAAGGACAA[AC]..."; > print "Found in $file" if $seq->seq =~ /$string/I; > > If you want your code to do substitution for the extra characters, then you'll need to interpret them into their regex equivalents. Such as reading in the string desired, and doing s/s/\[gc\]/I, etc... > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Carn? Draug > Sent: Monday, February 20, 2012 5:44 PM > To: bioperl mailing list > Subject: [Bioperl-l] Search for sequence inside sequence > > Hi everyone > > is there a method to check if a sequence exists inside another one? I have a sequence object and a small string of a sequence. I'd like to know if it's present? Currently I only have > > say "Found on $file" if $seq->seq =~ /$string/i; > > It solves my problem but this doesn't look very smart to me. Is there a better way? Maybe even something that would also accept S to match against [G|C] for example? > > Thanks in advance, > Carn? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From carandraug+dev at gmail.com Tue Feb 21 05:18:37 2012 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Tue, 21 Feb 2012 10:18:37 +0000 Subject: [Bioperl-l] Search for sequence inside sequence In-Reply-To: <16F7B63A-5F72-41F3-94AE-63B7E53FD50A@illinois.edu> References: <1A4207F8295607498283FE9E93B775B408219A7A@EX02.asurite.ad.asu.edu> <16F7B63A-5F72-41F3-94AE-63B7E53FD50A@illinois.edu> Message-ID: On 21 February 2012 02:28, Fields, Christopher J wrote: > There is a BioPerl-ish way of doing this, namely Bio::Tools::SeqPattern. ?Might be worth a look (though a simple regex should also suffice). Thank you both. I looked into this class and into the code in examples/tools/seq_pattern.pl. It seems this module is more to create a regexp from other string which can then be used on a normal regexp. $pattern = "SS"; $regex = new Bio::Tools::SeqPattern(-seq =>$pattern, -type =>'Dna'); print "Found in $file" if $seq->seq =~ /$regex->expand/i; print "Found in $file" if $seq->seq =~ /$regex->revcom(1)->str/i; ## to also search on the revcom Do you think it is acceptable to add a method that would allow for: say "Found in $file" if $seq->match(-seq-> $pattern ); maybe have an extra option that will also check the revcom of: say "Found in $file" if $seq->match(-seq-> $pattern -revcom-> 1); Carn? From arlett.wrona at c-lecta.de Tue Feb 21 06:14:14 2012 From: arlett.wrona at c-lecta.de (Enzyme) Date: Tue, 21 Feb 2012 03:14:14 -0800 (PST) Subject: [Bioperl-l] BLAST example doesn't work Message-ID: <33362540.post@talk.nabble.com> Hi, i'm new to BioPerl and wanna use it for BLAST. But the example doesn't work. I tried some possible solutions and at the moment i don't know if the code is wrong or my bio perl package don't work how it should: use Bio::Tools::Run::StandAloneBlast; my $factory = Bio::Tools::Run::StandAloneBlast->new(p => 'blastn', d => 'nr', e => '1e-5'); my $seq = Bio::PrimarySeq->new(-id => 'test1', -seq => 'AGATCAGTAGATGATAGGGGTAGA'); my $report = $factory->blastall($seq); # get back a {{PM|Bio::SearchIO}} report I get this error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Command 'run' not registered STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:472 STACK: Bio::Tools::Run::WrapperBase::set_parameters C:/Perl/site/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm:1203 STACK: Bio::Tools::Run::WrapperBase::new C:/Perl/site/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm:505 STACK: Bio::Tools::Run::StandAloneBlast::new C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:366 STACK: Bio::Tools::Run::StandAloneNCBIBlast::new C:/Perl/site/lib/Bio\Tools\Run\StandAloneNCBIBlast.pm:166 STACK: C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:382 Any ideas? I downloaded the database "nr" correctly. The environment paths were created. Thanks! Enzyme -- View this message in context: http://old.nabble.com/BLAST-example-doesn%27t-work-tp33362540p33362540.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From fossandonc at hotmail.com Wed Feb 22 13:55:23 2012 From: fossandonc at hotmail.com (=?iso-8859-1?Q?Francisco_J._Ossand=F3n?=) Date: Wed, 22 Feb 2012 15:55:23 -0300 Subject: [Bioperl-l] Fasta sequence width question Message-ID: Hello, I have a question about the width used by Bioperl in the Fasta format. This format recommends that lines of text be shorter than 80 characters, but there is no really fixed length for the sequence lines. http://en.wikipedia.org/wiki/FASTA_format http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml I usually download sequences from NCBI and their Fasta sequences always use a sequence line width of 70 characters (like http://www.ncbi.nlm.nih.gov/protein/50842445?report=fasta ), while Bioperl uses a default of 60 to write in Fasta format using Bio::SeqIO::Fasta ("BEGIN { $WIDTH = 60}"). Is there a particular reason to set the width at 60 characters?? Or maybe it could be changed to 70 to match NCBI?? Cheers, Francisco J. Ossandon From bosborne11 at verizon.net Wed Feb 22 14:28:12 2012 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 22 Feb 2012 14:28:12 -0500 Subject: [Bioperl-l] Fasta sequence width question In-Reply-To: References: Message-ID: <871478E1-15CD-4F3D-B95E-71193723D593@verizon.net> Francis, You can set this yourself. http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/SeqIO/fasta.pm#width Brian O. On Feb 22, 2012, at 1:55 PM, Francisco J. Ossand?n wrote: > Hello, > I have a question about the width used by Bioperl in the Fasta format. This > format recommends that lines of text be shorter than 80 characters, but > there is no really fixed length for the sequence lines. > http://en.wikipedia.org/wiki/FASTA_format > http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml > > I usually download sequences from NCBI and their Fasta sequences always use > a sequence line width of 70 characters (like > http://www.ncbi.nlm.nih.gov/protein/50842445?report=fasta ), while Bioperl > uses a default of 60 to write in Fasta format using Bio::SeqIO::Fasta > ("BEGIN { $WIDTH = 60}"). Is there a particular reason to set the width at > 60 characters?? Or maybe it could be changed to 70 to match NCBI?? > > Cheers, > > Francisco J. Ossandon > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fossandonc at hotmail.com Wed Feb 22 16:37:42 2012 From: fossandonc at hotmail.com (=?iso-8859-1?Q?Francisco_J._Ossand=F3n?=) Date: Wed, 22 Feb 2012 18:37:42 -0300 Subject: [Bioperl-l] Fasta sequence width question In-Reply-To: <871478E1-15CD-4F3D-B95E-71193723D593@verizon.net> References: <871478E1-15CD-4F3D-B95E-71193723D593@verizon.net> Message-ID: Yes, I know about the method to change the width myself, thanks. =) My question was more about if there was an specific reason to choose that default value (60) instead of the NCBI value (70). And if it should be changed to 70 to match NCBI or not. Cheers, Francisco J. Ossandon -----Mensaje original----- De: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de Brian Osborne Enviado el: mi?rcoles, 22 de febrero de 2012 16:28 Para: Francisco J. Ossand?n CC: Bioperl-l at lists.open-bio.org Asunto: Re: [Bioperl-l] Fasta sequence width question Francis, You can set this yourself. http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/SeqIO/fasta.pm#width Brian O. On Feb 22, 2012, at 1:55 PM, Francisco J. Ossand?n wrote: > Hello, > I have a question about the width used by Bioperl in the Fasta format. > This format recommends that lines of text be shorter than 80 > characters, but there is no really fixed length for the sequence lines. > http://en.wikipedia.org/wiki/FASTA_format > http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml > > I usually download sequences from NCBI and their Fasta sequences > always use a sequence line width of 70 characters (like > http://www.ncbi.nlm.nih.gov/protein/50842445?report=fasta ), while > Bioperl uses a default of 60 to write in Fasta format using > Bio::SeqIO::Fasta ("BEGIN { $WIDTH = 60}"). Is there a particular > reason to set the width at > 60 characters?? Or maybe it could be changed to 70 to match NCBI?? > > Cheers, > > Francisco J. Ossandon > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Feb 22 17:42:04 2012 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 22 Feb 2012 16:42:04 -0600 Subject: [Bioperl-l] Fasta sequence width question In-Reply-To: References: <871478E1-15CD-4F3D-B95E-71193723D593@verizon.net> Message-ID: <4F456F3C.4070903@illinois.edu> No reason, beyond legacy and possibly a fear that changing it could cause unforeseen consequences down the road. chris On 02/22/2012 03:37 PM, Francisco J. Ossand?n wrote: > Yes, I know about the method to change the width myself, thanks. =) > My question was more about if there was an specific reason to choose that > default value (60) instead of the NCBI value (70). And if it should be > changed to 70 to match NCBI or not. > > Cheers, > > Francisco J. Ossandon > > -----Mensaje original----- > De: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de Brian Osborne > Enviado el: mi?rcoles, 22 de febrero de 2012 16:28 > Para: Francisco J. Ossand?n > CC: Bioperl-l at lists.open-bio.org > Asunto: Re: [Bioperl-l] Fasta sequence width question > > Francis, > > You can set this yourself. > > http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/SeqIO/fasta.pm#width > > Brian O. > > > On Feb 22, 2012, at 1:55 PM, Francisco J. Ossand?n wrote: > >> Hello, >> I have a question about the width used by Bioperl in the Fasta format. >> This format recommends that lines of text be shorter than 80 >> characters, but there is no really fixed length for the sequence lines. >> http://en.wikipedia.org/wiki/FASTA_format >> http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml >> >> I usually download sequences from NCBI and their Fasta sequences >> always use a sequence line width of 70 characters (like >> http://www.ncbi.nlm.nih.gov/protein/50842445?report=fasta ), while >> Bioperl uses a default of 60 to write in Fasta format using >> Bio::SeqIO::Fasta ("BEGIN { $WIDTH = 60}"). Is there a particular >> reason to set the width at >> 60 characters?? Or maybe it could be changed to 70 to match NCBI?? >> >> Cheers, >> >> Francisco J. Ossandon >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Wed Feb 22 16:50:58 2012 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 22 Feb 2012 16:50:58 -0500 Subject: [Bioperl-l] Fasta sequence width question In-Reply-To: References: <871478E1-15CD-4F3D-B95E-71193723D593@verizon.net> Message-ID: <9A106184-4082-4E23-B91B-6DBCCE0B319E@verizon.net> Francis, There's has never been a decision, to my knowledge, to base "global" Bioperl defaults on NCBI's conventions. Yes, there are tools in Bioperl that access NCBI in various ways but Bioperl does not generally refer to NCBI for its definitions. Brian O. On Feb 22, 2012, at 4:37 PM, Francisco J. Ossand?n wrote: > Yes, I know about the method to change the width myself, thanks. =) > My question was more about if there was an specific reason to choose that > default value (60) instead of the NCBI value (70). And if it should be > changed to 70 to match NCBI or not. > > Cheers, > > Francisco J. Ossandon > > -----Mensaje original----- > De: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de Brian Osborne > Enviado el: mi?rcoles, 22 de febrero de 2012 16:28 > Para: Francisco J. Ossand?n > CC: Bioperl-l at lists.open-bio.org > Asunto: Re: [Bioperl-l] Fasta sequence width question > > Francis, > > You can set this yourself. > > http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/SeqIO/fasta.pm#width > > Brian O. > > > On Feb 22, 2012, at 1:55 PM, Francisco J. Ossand?n wrote: > >> Hello, >> I have a question about the width used by Bioperl in the Fasta format. >> This format recommends that lines of text be shorter than 80 >> characters, but there is no really fixed length for the sequence lines. >> http://en.wikipedia.org/wiki/FASTA_format >> http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml >> >> I usually download sequences from NCBI and their Fasta sequences >> always use a sequence line width of 70 characters (like >> http://www.ncbi.nlm.nih.gov/protein/50842445?report=fasta ), while >> Bioperl uses a default of 60 to write in Fasta format using >> Bio::SeqIO::Fasta ("BEGIN { $WIDTH = 60}"). Is there a particular >> reason to set the width at >> 60 characters?? Or maybe it could be changed to 70 to match NCBI?? >> >> Cheers, >> >> Francisco J. Ossandon >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From adlai at refenestration.com Thu Feb 23 19:43:41 2012 From: adlai at refenestration.com (Adlai Burman) Date: Fri, 24 Feb 2012 01:43:41 +0100 Subject: [Bioperl-l] Navigating a genbank file Message-ID: <3F92E96A-2A69-4F6F-AA92-AAA7A2122BD8@refenestration.com> I am struggling with Bioperl to do something which I know is simple (I stumbled on it before, but it was just by dumb luck and I forgot how it is done). Basically what I am trying to do is, given a target gene symbol extract the intergenic region between that CDS and the next one 3'. One ugly way I could do this would be: (1) Make an array of all of the CDSs in each gb file (2) Make a second run through the file using either the symbol prior or post the target symbol (depending on the strand of the target symbol). This is, of course, cumbersome and unnecessary but I can't figure out how it should be done. Here is a skeletal version of what I understand about getting feature info with bioperl. Can anyone help me figure out how to access the CDS features on either side? Please? Thank you, Adlai #!/usr/bin/perl use strict; use warnings; use IO::String; use Bio::Perl; use Bio::SeqIO; use IO::String; my $target_sym = shift; my $file = "../Dropbox/local_gb/*"; my $seqio = Bio::SeqIO-> new( -file => $file, -format => 'GenBank', ); my $seq = $seqio->next_seq; for my $feats ($seq->get_SeqFeatures){ if ($feats->primary_tag eq "CDS"){ my $start = $feats->location->start; my $end = $feats->location->end; my $strand = $feats->strand; if ($feats->has_tag('gene')) { for my $val ($feats->get_tag_values('gene')){ if ($val eq $target_sym){ print $start."\n"; print "$val\n"; } } } } } From abhishek.vit at gmail.com Thu Feb 23 19:55:02 2012 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Thu, 23 Feb 2012 16:55:02 -0800 Subject: [Bioperl-l] fetching all alignments from a sam/bam by read header in perl Message-ID: I am wondering if there is a slick way access all the possible alignments for a read present in sam or bam file given the read header. Since the existing codebase is in perl I would prefer something which can be done in/via perl. By default BAM's are indexed by location so the inbuilt samtools indexing wont work I guess. I should also say the input bam file will have in the order of 500 million total alignments and many reads are expected to be aligned to more than one place in the genome. Given the size of the data loading it all in one big hash is not turning out to be memory friendly. PS: I also posted this earlier on Biostar. Thanks! -Abhi From mcoyne at channing.harvard.edu Thu Feb 23 21:01:51 2012 From: mcoyne at channing.harvard.edu (Michael Coyne) Date: Thu, 23 Feb 2012 21:01:51 -0500 Subject: [Bioperl-l] Bioperl-l Digest, Vol 106, Issue 18 In-Reply-To: References: Message-ID: Sixty is evenly divisible by three (20 codons), while 70 is not...? On Thu, Feb 23, 2012 at 12:00 PM, wrote: > Send Bioperl-l mailing list submissions to > bioperl-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/bioperl-l > or, via email, send a message with subject or body 'help' to > bioperl-l-request at lists.open-bio.org > > You can reach the person managing the list at > bioperl-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioperl-l digest..." > > > Today's Topics: > > 1. Fasta sequence width question ( Francisco J. Ossand?n ) > 2. Re: Fasta sequence width question (Brian Osborne) > 3. Re: Fasta sequence width question ( Francisco J. Ossand?n ) > 4. Re: Fasta sequence width question (Chris Fields) > 5. Re: Fasta sequence width question (Brian Osborne) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 22 Feb 2012 15:55:23 -0300 > From: " Francisco J. Ossand?n " > Subject: [Bioperl-l] Fasta sequence width question > To: > Message-ID: > Content-Type: text/plain; charset="iso-8859-1" > > Hello, > I have a question about the width used by Bioperl in the Fasta format. This > format recommends that lines of text be shorter than 80 characters, but > there is no really fixed length for the sequence lines. > http://en.wikipedia.org/wiki/FASTA_format > http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml > > I usually download sequences from NCBI and their Fasta sequences always use > a sequence line width of 70 characters (like > http://www.ncbi.nlm.nih.gov/protein/50842445?report=fasta ), while Bioperl > uses a default of 60 to write in Fasta format using Bio::SeqIO::Fasta > ("BEGIN { $WIDTH = 60}"). Is there a particular reason to set the width at > 60 characters?? Or maybe it could be changed to 70 to match NCBI?? > > Cheers, > > Francisco J. Ossandon > > > > > ------------------------------ > > Message: 2 > Date: Wed, 22 Feb 2012 14:28:12 -0500 > From: Brian Osborne > Subject: Re: [Bioperl-l] Fasta sequence width question > To: Francisco J. Ossand?n > Cc: Bioperl-l at lists.open-bio.org > Message-ID: <871478E1-15CD-4F3D-B95E-71193723D593 at verizon.net> > Content-Type: text/plain; charset=iso-8859-1 > > Francis, > > You can set this yourself. > > http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/SeqIO/fasta.pm#width > > Brian O. > > > On Feb 22, 2012, at 1:55 PM, Francisco J. Ossand?n wrote: > > > Hello, > > I have a question about the width used by Bioperl in the Fasta format. > This > > format recommends that lines of text be shorter than 80 characters, but > > there is no really fixed length for the sequence lines. > > http://en.wikipedia.org/wiki/FASTA_format > > http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml > > > > I usually download sequences from NCBI and their Fasta sequences always > use > > a sequence line width of 70 characters (like > > http://www.ncbi.nlm.nih.gov/protein/50842445?report=fasta ), while > Bioperl > > uses a default of 60 to write in Fasta format using Bio::SeqIO::Fasta > > ("BEGIN { $WIDTH = 60}"). Is there a particular reason to set the width > at > > 60 characters?? Or maybe it could be changed to 70 to match NCBI?? > > > > Cheers, > > > > Francisco J. Ossandon > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > ------------------------------ > > Message: 3 > Date: Wed, 22 Feb 2012 18:37:42 -0300 > From: " Francisco J. Ossand?n " > Subject: Re: [Bioperl-l] Fasta sequence width question > To: "'Brian Osborne'" > Cc: Bioperl-l at lists.open-bio.org > Message-ID: > Content-Type: text/plain; charset="iso-8859-1" > > Yes, I know about the method to change the width myself, thanks. =) > My question was more about if there was an specific reason to choose that > default value (60) instead of the NCBI value (70). And if it should be > changed to 70 to match NCBI or not. > > Cheers, > > Francisco J. Ossandon > > -----Mensaje original----- > De: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de Brian Osborne > Enviado el: mi?rcoles, 22 de febrero de 2012 16:28 > Para: Francisco J. Ossand?n > CC: Bioperl-l at lists.open-bio.org > Asunto: Re: [Bioperl-l] Fasta sequence width question > > Francis, > > You can set this yourself. > > http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/SeqIO/fasta.pm#width > > Brian O. > > > On Feb 22, 2012, at 1:55 PM, Francisco J. Ossand?n wrote: > > > Hello, > > I have a question about the width used by Bioperl in the Fasta format. > > This format recommends that lines of text be shorter than 80 > > characters, but there is no really fixed length for the sequence lines. > > http://en.wikipedia.org/wiki/FASTA_format > > http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml > > > > I usually download sequences from NCBI and their Fasta sequences > > always use a sequence line width of 70 characters (like > > http://www.ncbi.nlm.nih.gov/protein/50842445?report=fasta ), while > > Bioperl uses a default of 60 to write in Fasta format using > > Bio::SeqIO::Fasta ("BEGIN { $WIDTH = 60}"). Is there a particular > > reason to set the width at > > 60 characters?? Or maybe it could be changed to 70 to match NCBI?? > > > > Cheers, > > > > Francisco J. Ossandon > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > ------------------------------ > > Message: 4 > Date: Wed, 22 Feb 2012 16:42:04 -0600 > From: Chris Fields > Subject: Re: [Bioperl-l] Fasta sequence width question > To: bioperl-l at lists.open-bio.org > Message-ID: <4F456F3C.4070903 at illinois.edu> > Content-Type: text/plain; charset="ISO-8859-1"; format=flowed > > No reason, beyond legacy and possibly a fear that changing it could > cause unforeseen consequences down the road. > > chris > > On 02/22/2012 03:37 PM, Francisco J. Ossand?n wrote: > > Yes, I know about the method to change the width myself, thanks. =) > > My question was more about if there was an specific reason to choose that > > default value (60) instead of the NCBI value (70). And if it should be > > changed to 70 to match NCBI or not. > > > > Cheers, > > > > Francisco J. Ossandon > > > > -----Mensaje original----- > > De: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de Brian Osborne > > Enviado el: mi?rcoles, 22 de febrero de 2012 16:28 > > Para: Francisco J. Ossand?n > > CC: Bioperl-l at lists.open-bio.org > > Asunto: Re: [Bioperl-l] Fasta sequence width question > > > > Francis, > > > > You can set this yourself. > > > > > http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/SeqIO/fasta.pm#width > > > > Brian O. > > > > > > On Feb 22, 2012, at 1:55 PM, Francisco J. Ossand?n wrote: > > > >> Hello, > >> I have a question about the width used by Bioperl in the Fasta format. > >> This format recommends that lines of text be shorter than 80 > >> characters, but there is no really fixed length for the sequence lines. > >> http://en.wikipedia.org/wiki/FASTA_format > >> http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml > >> > >> I usually download sequences from NCBI and their Fasta sequences > >> always use a sequence line width of 70 characters (like > >> http://www.ncbi.nlm.nih.gov/protein/50842445?report=fasta ), while > >> Bioperl uses a default of 60 to write in Fasta format using > >> Bio::SeqIO::Fasta ("BEGIN { $WIDTH = 60}"). Is there a particular > >> reason to set the width at > >> 60 characters?? Or maybe it could be changed to 70 to match NCBI?? > >> > >> Cheers, > >> > >> Francisco J. Ossandon > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > ------------------------------ > > Message: 5 > Date: Wed, 22 Feb 2012 16:50:58 -0500 > From: Brian Osborne > Subject: Re: [Bioperl-l] Fasta sequence width question > To: Francisco J. Ossand?n > Cc: Bioperl-l at lists.open-bio.org > Message-ID: <9A106184-4082-4E23-B91B-6DBCCE0B319E at verizon.net> > Content-Type: text/plain; charset=iso-8859-1 > > Francis, > > There's has never been a decision, to my knowledge, to base "global" > Bioperl defaults on NCBI's conventions. Yes, there are tools in Bioperl > that access NCBI in various ways but Bioperl does not generally refer to > NCBI for its definitions. > > Brian O. > > On Feb 22, 2012, at 4:37 PM, Francisco J. Ossand?n wrote: > > > Yes, I know about the method to change the width myself, thanks. =) > > My question was more about if there was an specific reason to choose that > > default value (60) instead of the NCBI value (70). And if it should be > > changed to 70 to match NCBI or not. > > > > Cheers, > > > > Francisco J. Ossandon > > > > -----Mensaje original----- > > De: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de Brian Osborne > > Enviado el: mi?rcoles, 22 de febrero de 2012 16:28 > > Para: Francisco J. Ossand?n > > CC: Bioperl-l at lists.open-bio.org > > Asunto: Re: [Bioperl-l] Fasta sequence width question > > > > Francis, > > > > You can set this yourself. > > > > > http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/SeqIO/fasta.pm#width > > > > Brian O. > > > > > > On Feb 22, 2012, at 1:55 PM, Francisco J. Ossand?n wrote: > > > >> Hello, > >> I have a question about the width used by Bioperl in the Fasta format. > >> This format recommends that lines of text be shorter than 80 > >> characters, but there is no really fixed length for the sequence lines. > >> http://en.wikipedia.org/wiki/FASTA_format > >> http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml > >> > >> I usually download sequences from NCBI and their Fasta sequences > >> always use a sequence line width of 70 characters (like > >> http://www.ncbi.nlm.nih.gov/protein/50842445?report=fasta ), while > >> Bioperl uses a default of 60 to write in Fasta format using > >> Bio::SeqIO::Fasta ("BEGIN { $WIDTH = 60}"). Is there a particular > >> reason to set the width at > >> 60 characters?? Or maybe it could be changed to 70 to match NCBI?? > >> > >> Cheers, > >> > >> Francisco J. Ossandon > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > ------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > End of Bioperl-l Digest, Vol 106, Issue 18 > ****************************************** > From jason.stajich at gmail.com Fri Feb 24 01:49:38 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 23 Feb 2012 22:49:38 -0800 Subject: [Bioperl-l] Navigating a genbank file In-Reply-To: <3F92E96A-2A69-4F6F-AA92-AAA7A2122BD8@refenestration.com> References: <3F92E96A-2A69-4F6F-AA92-AAA7A2122BD8@refenestration.com> Message-ID: You just need a $last_CDS variable. Here's code that does this for genes retrieved from Bio::DB::SeqFeature but the concept is the same. https://github.com/hyphaltip/genome-scripts/blob/master/seqfeature/get_intergenic_seq.pl On Feb 23, 2012, at 4:43 PM, Adlai Burman wrote: > I am struggling with Bioperl to do something which I know is simple (I stumbled on it before, but it was just by dumb luck and I forgot how it is done). Basically what I am trying to do is, given a target gene symbol extract the intergenic region between that CDS and the next one 3'. One ugly way I could do this would be: > (1) Make an array of all of the CDSs in each gb file > (2) Make a second run through the file using either the symbol prior or post the target symbol (depending on the strand of the target symbol). > > This is, of course, cumbersome and unnecessary but I can't figure out how it should be done. > Here is a skeletal version of what I understand about getting feature info with bioperl. > Can anyone help me figure out how to access the CDS features on either side? > Please? > Thank you, > > Adlai > > #!/usr/bin/perl > use strict; > use warnings; > use IO::String; > use Bio::Perl; > use Bio::SeqIO; > use IO::String; > > > my $target_sym = shift; > my $file = "../Dropbox/local_gb/*"; > > my $seqio = Bio::SeqIO-> new( > -file => $file, > -format => 'GenBank', > ); > > my $seq = $seqio->next_seq; > for my $feats ($seq->get_SeqFeatures){ > if ($feats->primary_tag eq "CDS"){ > my $start = $feats->location->start; > my $end = $feats->location->end; > my $strand = $feats->strand; > if ($feats->has_tag('gene')) { > for my $val ($feats->get_tag_values('gene')){ > if ($val eq $target_sym){ > print $start."\n"; > print "$val\n"; > } > > } > } > } > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From p.j.a.cock at googlemail.com Fri Feb 24 04:24:35 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Feb 2012 09:24:35 +0000 Subject: [Bioperl-l] fetching all alignments from a sam/bam by read header in perl In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 12:55 AM, Abhishek Pratap wrote: > I am wondering if there is a slick way access all the possible > alignments for a read present in sam or bam file given the read > header. Since the existing codebase is in perl I would prefer > something which can be done in/via perl. > > By default BAM's are indexed by location so the inbuilt samtools > indexing wont work I guess. > > I should also say the input bam file will have in the order of 500 > million total alignments and many reads are expected to be aligned to > more than one place in the genome. Given the size of the data loading > it all in one big hash is not turning out to be memory friendly. Are you asking for SAM/BAM read lookup by read name? > PS: ?I also posted this earlier on Biostar. Link? Peter From briano at bioteam.net Fri Feb 24 09:58:58 2012 From: briano at bioteam.net (Brian Osborne) Date: Fri, 24 Feb 2012 09:58:58 -0500 Subject: [Bioperl-l] Can't locate object method "seq" via package "Bio::DB::Query::GenBank" In-Reply-To: <7716984D-CB98-4CC0-9E1A-69D696FB6745@gmail.com> References: <7716984D-CB98-4CC0-9E1A-69D696FB6745@gmail.com> Message-ID: <8664C24D-786A-4268-9917-9D630C58D9B4@bioteam.net> Casandra, BioPerl questions should be directed to the bioperl-l mailing list, that's where they will get the most attention. I'm CC'ing the list here. Brian O. On Feb 24, 2012, at 7:07 AM, Casandra wrote: > Hi, > I was training with "Retrieving multiple sequences from a database" > use Bio::DB::Query::GenBank; > > $query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]"; > $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide', -query => $query ) > but I'm stuck in the print step. This is what I wrote: > > #!/bin/perl -w > > use Bio::DB::Query::GenBank; > use Bio::Seq; > use Bio::SeqIO; > > $query = "Arabidopsis[ORGN] AND topoisomerase[TITL]"; > $query_obj = Bio::DB::Query::GenBank->new(-db =>'nucleotide', -query => $query); > > #$seqio_obj = Bio::SeqIO->new(-file =>'>Arab46.gb', -format =>'genbank'); > #$seqio_obj->write_seq($query_obj); > > print $query_obj->accession_number,"\n"; > > I've tried to print it to a file with seqio and to print it through terminal with the following: > print $query_obj->accession_number,"\n"; > print $query_obj->seq,"\n"; > print $query_obj->division,"\n"; > > But the error message I get when I try to print it to a file is: > --------------------- WARNING --------------------- > MSG: Bio::DB::Query::GenBank=HASH(0x965010) is not a SeqI compliant module. Attempting to dump, but may fail! > --------------------------------------------------- > Can't locate object method "seq" via package "Bio::DB::Query::GenBank" at /Library/Perl/5.8.8/Bio/SeqIO/genbank.pm line 760. > > And when I try to print it normally: > Can't locate object method "seq" via package "Bio::DB::Query::GenBank" at bp_seq4.pl line 15. > > > I went to cpan to check for this module > Bio::DB::Query::GenBank > I install Bioperl again the one that was following "Bio::DB::Query::GenBank". > But no change in the message error. > > could you help me? > > Thank you very much. Brian O. -- Brian Osborne, PhD BioTeam: http://bioteam.net email: briano at bioteam.net mobile: 978-317-3101 From abhishek.vit at gmail.com Fri Feb 24 09:58:33 2012 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Fri, 24 Feb 2012 06:58:33 -0800 Subject: [Bioperl-l] fetching all alignments from a sam/bam by read header in perl In-Reply-To: References: Message-ID: Hi Peter You got it right. Here is the link : http://biostar.stackexchange.com/questions/17787/fetching-all-alignments-from-a-sam-bam-by-read-header-in-perl -A On Fri, Feb 24, 2012 at 1:24 AM, Peter Cock wrote: > On Fri, Feb 24, 2012 at 12:55 AM, Abhishek Pratap > wrote: >> I am wondering if there is a slick way access all the possible >> alignments for a read present in sam or bam file given the read >> header. Since the existing codebase is in perl I would prefer >> something which can be done in/via perl. >> >> By default BAM's are indexed by location so the inbuilt samtools >> indexing wont work I guess. >> >> I should also say the input bam file will have in the order of 500 >> million total alignments and many reads are expected to be aligned to >> more than one place in the genome. Given the size of the data loading >> it all in one big hash is not turning out to be memory friendly. > > Are you asking for SAM/BAM read lookup by read name? > >> PS: ?I also posted this earlier on Biostar. > > Link? > > Peter From sami.kilpinen at medisapiens.com Thu Feb 23 02:20:10 2012 From: sami.kilpinen at medisapiens.com (Sami Kilpinen) Date: Thu, 23 Feb 2012 09:20:10 +0200 Subject: [Bioperl-l] Problem with Bio::GeneMapper Message-ID: <6EDB646F-017F-4BA1-BA32-70A7F0972BB2@medisapiens.com> Hello, I have been trying to solve following issues with GeneMapper. 1) ->to_string(); throws an error 2) Mapper doesn't seem to do much of anything Bioperl 1.6.1 in use and relevant parts of the code are as follows: --------------------------------------------------- use Bio::Coordinate::GeneMapper; use Bio::Location::Split; use Bio::Location::Simple; # get a Bio::Location::Split or an array of Bio::LocationI objects # holding the start, end and strand of all the exons in chromosomal # (or entry) coordinates my $splits = Bio::Location::Split->new(); # $exons come from mongodb, not relevant here as data is printed to screen for debug purposes while ( my $exon = $exons->next ) { my $e_start=$exon->{'start'}; my $e_end=$exon->{'end'}; my $e_strand=$exon->{'strand'}; print "Exon_start,Exon-end,Exon-strand\n"; print $e_start,",",$e_end,",",$e_strand,"\n"; $splits->add_sub_Location(Bio::Location::Simple->new(-start=>$e_start,-end=>$e_end,-strand=>$e_strand)); } print "Location::Split object string dump\n"; print $splits->to_FTstring(),"\n"; # get a Bio::RangeI representing the start, end and strand of the CDS # in chromosomal (or entry) coordinates my $cds = Bio::Location::Simple->new(-start => 752890, -end => 753369, -strand => 1 ); # create a gene mapper and set it to map from chromosomal to cds coordinates my $gene = Bio::Coordinate::GeneMapper->new(-in =>'chr', -out =>'cds', -cds =>$cds, -exons=>$splits ); # get a a Bio::Location or sequence feature in input (chr) coordinates my $loc = Bio::Location::Simple->new(-start => 753584, -end => 753584, -strand => 1 ); # Trying to set strict boundaries $gene->strict('cds'); # map the location into output coordinates and get a new location object my $newloc = $gene->map($loc); print "new location in cds coordinates\n"; print $newloc->start(),"\n"; print "Mapper internal data in human readable format\n"; my $tmp=$gene->to_string(); ------------------------------ And the result of running this are: ========================== Exon_start,Exon-end,Exon-strand 752751,753582,1 Exon_start,Exon-end,Exon-strand 754103,755214,1 Location::Split object string dump join(752751..753582,754103..755214) new location in cds coordinates 695 Mapper internal data in human readable format ---------------------------------------- chr-gene (1-2) gene offset: 752889 (753368) gene strand: 1 gene-intron (2-5) Can't call method "each_mapper" on an undefined value at /usr/local/share/perl/5.10.1/Bio/Coordinate/GeneMapper.pm line 1013. ========================== Thus here we have two exons and cds situating completely in the first exon. GeneMapper calculates (correctly) that chr coordinate 752892 is nucleotide 2 from the start of cds (752890) but then throws an error about each_mapper. Then, if I ask for cds coordinate for chr coordinate (753769) which is between the exons and beyond the end of the cds (753369) it returns me value 880 and same error (why it even claims that such an cds coordinate exists? cds is 479 long...). I have tried to test some more complex genes where cds contains >1 exons and if I ask chr=>cds coordinate mapping when the chr coordinate situates in exon 2 it seems that it does not take the intron into account.. it just returns the "asked chr coordinate"-"cds start coordinate". This simple calculation is all that this module does for me... What I would like to do in this more complex cases is that it calculates the length of all exon nucleotides between cds start and the asked chr coordinate. In other words ("asked chr coordinate"-"cds start in chr coordinates")-length of all introns between those. Isn't this what GeneMapper is supposed to do? I would highly appreciate any help with this issue. Regards, Sami Kilpinen From arlett.wrona at c-lecta.de Tue Feb 21 04:16:21 2012 From: arlett.wrona at c-lecta.de (Enzyme) Date: Tue, 21 Feb 2012 01:16:21 -0800 (PST) Subject: [Bioperl-l] BLAST example doesn't work Message-ID: <33362540.post@talk.nabble.com> Hi, i'm new to BioPerl and wanna use it for BLAST. But the example doesn't work. I tried some possible solutions and at the moment i don't know if the code is wrong or my bio perl package don't work how it should: use Bio::Tools::Run::StandAloneBlast; my $factory = Bio::Tools::Run::StandAloneBlast->new(p => 'blastn', d => 'nr', e => '1e-5'); my $seq = Bio::PrimarySeq->new(-id => 'test1', -seq => 'AGATCAGTAGATGATAGGGGTAGA'); my $report = $factory->blastall($seq); # get back a {{PM|Bio::SearchIO}} report I get this error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Command 'run' not registered STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:472 STACK: Bio::Tools::Run::WrapperBase::set_parameters C:/Perl/site/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm:1203 STACK: Bio::Tools::Run::WrapperBase::new C:/Perl/site/lib/Bio/Tools/Run/WrapperBase/CommandExts.pm:505 STACK: Bio::Tools::Run::StandAloneBlast::new C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:366 STACK: Bio::Tools::Run::StandAloneNCBIBlast::new C:/Perl/site/lib/Bio\Tools\Run\StandAloneNCBIBlast.pm:166 STACK: C:/Perl/site/lib/Bio/Tools/Run/StandAloneBlast.pm:382 Any ideas? I downloaded the database "nr" correctly. The environment paths were created. Thanks! Enzyme -- View this message in context: http://old.nabble.com/BLAST-example-doesn%27t-work-tp33362540p33362540.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From kasandrah at gmail.com Fri Feb 24 10:24:05 2012 From: kasandrah at gmail.com (Casandra) Date: Fri, 24 Feb 2012 16:24:05 +0100 Subject: [Bioperl-l] Can't locate object method "seq" via package "Bio::DB::Query::GenBank" In-Reply-To: <8664C24D-786A-4268-9917-9D630C58D9B4@bioteam.net> References: <7716984D-CB98-4CC0-9E1A-69D696FB6745@gmail.com> <8664C24D-786A-4268-9917-9D630C58D9B4@bioteam.net> Message-ID: <6AE6A02D-BF93-498D-8FCC-34FE40E13143@gmail.com> I saw my mistake but thank you anyway : ) El 24/02/2012, a las 15:58, Brian Osborne escribi?: > Casandra, > > BioPerl questions should be directed to the bioperl-l mailing list, > that's where they will get the most attention. I'm CC'ing the list > here. > > Brian O. > > > > On Feb 24, 2012, at 7:07 AM, Casandra wrote: > >> Hi, >> I was training with "Retrieving multiple sequences from a database" >> use Bio::DB::Query::GenBank; >> >> $query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and >> 0:3000[SLEN]"; >> $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide', - >> query => $query ) >> but I'm stuck in the print step. This is what I wrote: >> >> #!/bin/perl -w >> >> use Bio::DB::Query::GenBank; >> use Bio::Seq; >> use Bio::SeqIO; >> >> $query = "Arabidopsis[ORGN] AND topoisomerase[TITL]"; >> $query_obj = Bio::DB::Query::GenBank->new(-db =>'nucleotide', - >> query => $query); >> >> #$seqio_obj = Bio::SeqIO->new(-file =>'>Arab46.gb', -format >> =>'genbank'); >> #$seqio_obj->write_seq($query_obj); >> >> print $query_obj->accession_number,"\n"; >> >> I've tried to print it to a file with seqio and to print it through >> terminal with the following: >> print $query_obj->accession_number,"\n"; >> print $query_obj->seq,"\n"; >> print $query_obj->division,"\n"; >> >> But the error message I get when I try to print it to a file is: >> --------------------- WARNING --------------------- >> MSG: Bio::DB::Query::GenBank=HASH(0x965010) is not a SeqI >> compliant module. Attempting to dump, but may fail! >> --------------------------------------------------- >> Can't locate object method "seq" via package >> "Bio::DB::Query::GenBank" at /Library/Perl/5.8.8/Bio/SeqIO/ >> genbank.pm line 760. >> >> And when I try to print it normally: >> Can't locate object method "seq" via package >> "Bio::DB::Query::GenBank" at bp_seq4.pl line 15. >> >> >> I went to cpan to check for this module >> Bio::DB::Query::GenBank >> I install Bioperl again the one that was following >> "Bio::DB::Query::GenBank". >> But no change in the message error. >> >> could you help me? >> >> Thank you very much. > > Brian O. > -- > Brian Osborne, PhD > BioTeam: http://bioteam.net > email: briano at bioteam.net > mobile: 978-317-3101 > From adlai at refenestration.com Fri Feb 24 14:24:18 2012 From: adlai at refenestration.com (Adlai Burman) Date: Fri, 24 Feb 2012 20:24:18 +0100 Subject: [Bioperl-l] Navigating a genbank file In-Reply-To: References: <3F92E96A-2A69-4F6F-AA92-AAA7A2122BD8@refenestration.com> Message-ID: Thanks, Jason. Your code looks elegant and should do the trick. Unfortunately it is a little over my quasi (perl) newbie head. I'm sure I will learn a lot from trying to figure out how to implement it. Who would have guessed that grabbing intergenic regions could be such a nosebleed (for perl flat-foots that is)? In the meantime if you, or anybody else, can help me out with this in a more neophyte digestible way it would be excruciatingly appreciated. Regards, Adlai On Feb 24, 2012, at 7:49 AM, Jason Stajich wrote: > You just need a $last_CDS variable. > Here's code that does this for genes retrieved from Bio::DB::SeqFeature but the concept is the same. > > https://github.com/hyphaltip/genome-scripts/blob/master/seqfeature/get_intergenic_seq.pl > > On Feb 23, 2012, at 4:43 PM, Adlai Burman wrote: > >> I am struggling with Bioperl to do something which I know is simple (I stumbled on it before, but it was just by dumb luck and I forgot how it is done). Basically what I am trying to do is, given a target gene symbol extract the intergenic region between that CDS and the next one 3'. One ugly way I could do this would be: >> (1) Make an array of all of the CDSs in each gb file >> (2) Make a second run through the file using either the symbol prior or post the target symbol (depending on the strand of the target symbol). >> >> This is, of course, cumbersome and unnecessary but I can't figure out how it should be done. >> Here is a skeletal version of what I understand about getting feature info with bioperl. >> Can anyone help me figure out how to access the CDS features on either side? >> Please? >> Thank you, >> >> Adlai >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> use IO::String; >> use Bio::Perl; >> use Bio::SeqIO; >> use IO::String; >> >> >> my $target_sym = shift; >> my $file = "../Dropbox/local_gb/*"; >> >> my $seqio = Bio::SeqIO-> new( >> -file => $file, >> -format => 'GenBank', >> ); >> >> my $seq = $seqio->next_seq; >> for my $feats ($seq->get_SeqFeatures){ >> if ($feats->primary_tag eq "CDS"){ >> my $start = $feats->location->start; >> my $end = $feats->location->end; >> my $strand = $feats->strand; >> if ($feats->has_tag('gene')) { >> for my $val ($feats->get_tag_values('gene')){ >> if ($val eq $target_sym){ >> print $start."\n"; >> print "$val\n"; >> } >> >> } >> } >> } >> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > From adlai at refenestration.com Fri Feb 24 15:43:40 2012 From: adlai at refenestration.com (Adlai Burman) Date: Fri, 24 Feb 2012 21:43:40 +0100 Subject: [Bioperl-l] Odd problem with get_tag_values Message-ID: <1C2950DB-455A-4B80-8D23-4A241FB857BE@refenestration.com> I have come across a perplexing problem with trying to parse sequence features into hashes from gb records. This is the minimal code which shows my problem: #!/usr/bin/perl use strict; use warnings; use IO::String; use Bio::Perl; use Bio::SeqIO; use IO::String; my @files = ; foreach my $file(@files){ my @cds_features = grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures; my %strands = map {$_->get_tag_values('gene'), $_->strand} @cds_features; ##This Is The Culprit. . . . #do nifty stuff } For some files this approach works just fine. For others the script dies immediately with the error message: ------------- EXCEPTION ------------- MSG: asking for tag value that does not exist gene STACK Bio::SeqFeature::Generic::get_tag_values /Users/adlai/Downloads/BioPerl-1.6.1/Bio/SeqFeature/Generic.pm:517 STACK toplevel tosend.pl:16 ------------------------------------- The difference in the files that parse and those that don't seems to be that the files that crash have "intron" and "exon" tags. They ALL have "gene" tags. Does anyone know why this is a problem and what can be done to circumvent it? Thanks, Adlai From jason.stajich at gmail.com Fri Feb 24 16:21:07 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 24 Feb 2012 13:21:07 -0800 Subject: [Bioperl-l] Odd problem with get_tag_values In-Reply-To: <1C2950DB-455A-4B80-8D23-4A241FB857BE@refenestration.com> References: <1C2950DB-455A-4B80-8D23-4A241FB857BE@refenestration.com> Message-ID: not all CDS will be annotated with a 'gene' tag, this is due to variation in how annotation is done and that there is not a requirement that there be a gene tag for all CDS features. You can protect your query - we often do this when dealing with data from the wild by testing for has_tag first. my %strands; for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) { if( $cds->has_tag('gene') ) { my ($gene) = $cds->get_tag_values('gene'); # get the 1st one, this returns a list $strands{$gene} = $cds->strand; } else { # look in alternative places for a name, e.g. locus, ... } } An alternative is to loop through your list of tags in order of preference my %strands; for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) { for my $tag ( qw(gene locus name product accession note) ) { if( $cds->has_tag($tag) ) { my ($name) = $cds->get_tag_values($tag); # get the 1st one, this returns a list $strands{$name} = $cds->strand; $seen = 1; last; } if( ! $seen ) { warn("not tag found for feature at ", $cds->location->to_FTstring, "\n"); } } On Feb 24, 2012, at 12:43 PM, Adlai Burman wrote: > I have come across a perplexing problem with trying to parse sequence features into hashes from gb records. This is the minimal code which shows my problem: > > #!/usr/bin/perl > use strict; > use warnings; > use IO::String; > use Bio::Perl; > use Bio::SeqIO; > use IO::String; > > my @files = ; > foreach my $file(@files){ > > > my @cds_features = grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures; > my %strands = map {$_->get_tag_values('gene'), $_->strand} @cds_features; ##This Is The Culprit. > . > . > . > #do nifty stuff > } > > For some files this approach works just fine. > For others the script dies immediately with the error message: > > ------------- EXCEPTION ------------- > MSG: asking for tag value that does not exist gene > STACK Bio::SeqFeature::Generic::get_tag_values /Users/adlai/Downloads/BioPerl-1.6.1/Bio/SeqFeature/Generic.pm:517 > STACK toplevel tosend.pl:16 > ------------------------------------- > > The difference in the files that parse and those that don't seems to be that the files that crash have "intron" and "exon" tags. They ALL have "gene" tags. > Does anyone know why this is a problem and what can be done to circumvent it? > > Thanks, > Adlai > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From adlai at refenestration.com Fri Feb 24 16:22:09 2012 From: adlai at refenestration.com (Adlai Burman) Date: Fri, 24 Feb 2012 22:22:09 +0100 Subject: [Bioperl-l] Odd problem with get_tag_values In-Reply-To: <1840B7A6-D949-44CA-8CDE-1313F06CD0CD@verizon.net> References: <1C2950DB-455A-4B80-8D23-4A241FB857BE@refenestration.com> <1840B7A6-D949-44CA-8CDE-1313F06CD0CD@verizon.net> Message-ID: <68A216EA-2175-45C1-98CE-E4F949C565FC@refenestration.com> Hey, Brian. No, I am not absolutely sure about that but I am checking now. In the process of checking this I found out that one of the files that successfully parsed (NC_015139) had NO Features tags (other that "source"). No "CDS", no "gene" etc... ok, now I am sure. I checked one record that didn't parse and and all the CDS's have "gene" tags. Go figure. Regarding your suggestion: I agree, checking for the existence of a tag is an important thing to do and everything parses great in a script I wrote which uses that. This, however, might be problematic for two reasons: (1) The fact that the aforementioned featureless record parses and one of the crashers does have a full complement of properly placed "genes" suggest that this might not address the problem, and (2) On a more humbling note, I don't know how to embed such a check into the one line hash generator, my %strands = map {$_->get_tag_values('gene'), $_->strand} @cds_features; , which wold be perfect for what I am coding now. Thanks for your response, Adlai On Feb 24, 2012, at 10:03 PM, Brian Osborne wrote: > Adlai, > > You are absolutely sure that every single CDS feature has a "gene" tag inside it? > > If this is not the case then you have to use the "if ($cds_feature->has_tag("gene")) ?" type of logic. > > Brian O. > > On Feb 24, 2012, at 3:43 PM, Adlai Burman wrote: > >> I have come across a perplexing problem with trying to parse sequence features into hashes from gb records. This is the minimal code which shows my problem: >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> use IO::String; >> use Bio::Perl; >> use Bio::SeqIO; >> use IO::String; >> >> my @files = ; >> foreach my $file(@files){ >> >> >> my @cds_features = grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures; >> my %strands = map {$_->get_tag_values('gene'), $_->strand} @cds_features; ##This Is The Culprit. >> . >> . >> . >> #do nifty stuff >> } >> >> For some files this approach works just fine. >> For others the script dies immediately with the error message: >> >> ------------- EXCEPTION ------------- >> MSG: asking for tag value that does not exist gene >> STACK Bio::SeqFeature::Generic::get_tag_values /Users/adlai/Downloads/BioPerl-1.6.1/Bio/SeqFeature/Generic.pm:517 >> STACK toplevel tosend.pl:16 >> ------------------------------------- >> >> The difference in the files that parse and those that don't seems to be that the files that crash have "intron" and "exon" tags. They ALL have "gene" tags. >> Does anyone know why this is a problem and what can be done to circumvent it? >> >> Thanks, >> Adlai >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From adlai at refenestration.com Fri Feb 24 16:27:19 2012 From: adlai at refenestration.com (Adlai Burman) Date: Fri, 24 Feb 2012 22:27:19 +0100 Subject: [Bioperl-l] Odd problem with get_tag_values In-Reply-To: <1840B7A6-D949-44CA-8CDE-1313F06CD0CD@verizon.net> References: <1C2950DB-455A-4B80-8D23-4A241FB857BE@refenestration.com> <1840B7A6-D949-44CA-8CDE-1313F06CD0CD@verizon.net> Message-ID: P.S. In case anyone is interested (and I hope they are), here is an example of one record that fails here and one that doesn't: NC_015820 parses and NC_012927 fails. Unlike what I said earlier, some of the good ones have exons and some of the crashers don't. On Feb 24, 2012, at 10:03 PM, Brian Osborne wrote: > Adlai, > > You are absolutely sure that every single CDS feature has a "gene" tag inside it? > > If this is not the case then you have to use the "if ($cds_feature->has_tag("gene")) ?" type of logic. > > Brian O. > > On Feb 24, 2012, at 3:43 PM, Adlai Burman wrote: > >> I have come across a perplexing problem with trying to parse sequence features into hashes from gb records. This is the minimal code which shows my problem: >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> use IO::String; >> use Bio::Perl; >> use Bio::SeqIO; >> use IO::String; >> >> my @files = ; >> foreach my $file(@files){ >> >> >> my @cds_features = grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures; >> my %strands = map {$_->get_tag_values('gene'), $_->strand} @cds_features; ##This Is The Culprit. >> . >> . >> . >> #do nifty stuff >> } >> >> For some files this approach works just fine. >> For others the script dies immediately with the error message: >> >> ------------- EXCEPTION ------------- >> MSG: asking for tag value that does not exist gene >> STACK Bio::SeqFeature::Generic::get_tag_values /Users/adlai/Downloads/BioPerl-1.6.1/Bio/SeqFeature/Generic.pm:517 >> STACK toplevel tosend.pl:16 >> ------------------------------------- >> >> The difference in the files that parse and those that don't seems to be that the files that crash have "intron" and "exon" tags. They ALL have "gene" tags. >> Does anyone know why this is a problem and what can be done to circumvent it? >> >> Thanks, >> Adlai >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Fri Feb 24 16:28:52 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 24 Feb 2012 21:28:52 +0000 Subject: [Bioperl-l] Odd problem with get_tag_values In-Reply-To: <1C2950DB-455A-4B80-8D23-4A241FB857BE@refenestration.com> References: <1C2950DB-455A-4B80-8D23-4A241FB857BE@refenestration.com> Message-ID: <63416EB7-8C06-4904-8E36-84D6A655B6CD@illinois.edu> On 02/24/2012 02:43 PM, Adlai Burman wrote: > I have come across a perplexing problem with trying to parse sequence features into hashes from gb records. This is the minimal code which shows my problem: > > #!/usr/bin/perl > use strict; > use warnings; > use IO::String; > use Bio::Perl; > use Bio::SeqIO; > use IO::String; > > my @files =; > foreach my $file(@files){ > > > my @cds_features = grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures; > my %strands = map {$_->get_tag_values('gene'), $_->strand} @cds_features; ##This Is The Culprit. > . > . > . > #do nifty stuff > } > > For some files this approach works just fine. > For others the script dies immediately with the error message: > > ------------- EXCEPTION ------------- > MSG: asking for tag value that does not exist gene > STACK Bio::SeqFeature::Generic::get_tag_values /Users/adlai/Downloads/BioPerl-1.6.1/Bio/SeqFeature/Generic.pm:517 > STACK toplevel tosend.pl:16 > ------------------------------------- There are two possibilities: 1) There is at least one feature w/o a 'gene' tag for those files. 2) This is a bug. Either way it's hard to tell b/c we don't have the example data you are checking. I would note this is *not* the way to screen for features with specific tags, though, at least with the current API. You have to actually check for the presence of the tag first with has_tag('gene'). You could do that within the grep: { $_->primary_tag eq 'CDS' && $_->has_tag('gene') } > The difference in the files that parse and those that don't seems to be that the files that crash have "intron" and "exon" tags. They ALL have "gene" tags. > Does anyone know why this is a problem and what can be done to circumvent it? > > Thanks, > Adlai chris From adlai at refenestration.com Fri Feb 24 16:33:31 2012 From: adlai at refenestration.com (Adlai Burman) Date: Fri, 24 Feb 2012 22:33:31 +0100 Subject: [Bioperl-l] Odd problem with get_tag_values In-Reply-To: References: <1C2950DB-455A-4B80-8D23-4A241FB857BE@refenestration.com> Message-ID: Thanks so much, Jason. I will give that a try in after I get a few hours of much needed sleep :-) On Feb 24, 2012, at 10:21 PM, Jason Stajich wrote: > not all CDS will be annotated with a 'gene' tag, this is due to variation in how annotation is done and that there is not a requirement that there be a gene tag for all CDS features. > > You can protect your query - we often do this when dealing with data from the wild by testing for has_tag first. > > my %strands; > for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) { > if( $cds->has_tag('gene') ) { > my ($gene) = $cds->get_tag_values('gene'); # get the 1st one, this returns a list > $strands{$gene} = $cds->strand; > } else { # look in alternative places for a name, e.g. locus, > ... > } > } > > An alternative is to loop through your list of tags in order of preference > > my %strands; > for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) { > for my $tag ( qw(gene locus name product accession note) ) { > if( $cds->has_tag($tag) ) { > my ($name) = $cds->get_tag_values($tag); # get the 1st one, this returns a list > $strands{$name} = $cds->strand; > $seen = 1; > last; > } > if( ! $seen ) { > warn("not tag found for feature at ", $cds->location->to_FTstring, "\n"); > } > } > > On Feb 24, 2012, at 12:43 PM, Adlai Burman wrote: > >> I have come across a perplexing problem with trying to parse sequence features into hashes from gb records. This is the minimal code which shows my problem: >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> use IO::String; >> use Bio::Perl; >> use Bio::SeqIO; >> use IO::String; >> >> my @files = ; >> foreach my $file(@files){ >> >> >> my @cds_features = grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures; >> my %strands = map {$_->get_tag_values('gene'), $_->strand} @cds_features; ##This Is The Culprit. >> . >> . >> . >> #do nifty stuff >> } >> >> For some files this approach works just fine. >> For others the script dies immediately with the error message: >> >> ------------- EXCEPTION ------------- >> MSG: asking for tag value that does not exist gene >> STACK Bio::SeqFeature::Generic::get_tag_values /Users/adlai/Downloads/BioPerl-1.6.1/Bio/SeqFeature/Generic.pm:517 >> STACK toplevel tosend.pl:16 >> ------------------------------------- >> >> The difference in the files that parse and those that don't seems to be that the files that crash have "intron" and "exon" tags. They ALL have "gene" tags. >> Does anyone know why this is a problem and what can be done to circumvent it? >> >> Thanks, >> Adlai >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > From cjfields at illinois.edu Fri Feb 24 16:46:55 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 24 Feb 2012 21:46:55 +0000 Subject: [Bioperl-l] Odd problem with get_tag_values In-Reply-To: References: <1C2950DB-455A-4B80-8D23-4A241FB857BE@refenestration.com> Message-ID: Using has_tag('gene') as a pre-screen works for me for both example seqs. chris On Feb 24, 2012, at 3:33 PM, Adlai Burman wrote: > Thanks so much, Jason. > I will give that a try in after I get a few hours of much needed sleep :-) > > > On Feb 24, 2012, at 10:21 PM, Jason Stajich wrote: > >> not all CDS will be annotated with a 'gene' tag, this is due to variation in how annotation is done and that there is not a requirement that there be a gene tag for all CDS features. >> >> You can protect your query - we often do this when dealing with data from the wild by testing for has_tag first. >> >> my %strands; >> for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) { >> if( $cds->has_tag('gene') ) { >> my ($gene) = $cds->get_tag_values('gene'); # get the 1st one, this returns a list >> $strands{$gene} = $cds->strand; >> } else { # look in alternative places for a name, e.g. locus, >> ... >> } >> } >> >> An alternative is to loop through your list of tags in order of preference >> >> my %strands; >> for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) { >> for my $tag ( qw(gene locus name product accession note) ) { >> if( $cds->has_tag($tag) ) { >> my ($name) = $cds->get_tag_values($tag); # get the 1st one, this returns a list >> $strands{$name} = $cds->strand; >> $seen = 1; >> last; >> } >> if( ! $seen ) { >> warn("not tag found for feature at ", $cds->location->to_FTstring, "\n"); >> } >> } >> >> On Feb 24, 2012, at 12:43 PM, Adlai Burman wrote: >> >>> I have come across a perplexing problem with trying to parse sequence features into hashes from gb records. This is the minimal code which shows my problem: >>> >>> #!/usr/bin/perl >>> use strict; >>> use warnings; >>> use IO::String; >>> use Bio::Perl; >>> use Bio::SeqIO; >>> use IO::String; >>> >>> my @files = ; >>> foreach my $file(@files){ >>> >>> >>> my @cds_features = grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures; >>> my %strands = map {$_->get_tag_values('gene'), $_->strand} @cds_features; ##This Is The Culprit. >>> . >>> . >>> . >>> #do nifty stuff >>> } >>> >>> For some files this approach works just fine. >>> For others the script dies immediately with the error message: >>> >>> ------------- EXCEPTION ------------- >>> MSG: asking for tag value that does not exist gene >>> STACK Bio::SeqFeature::Generic::get_tag_values /Users/adlai/Downloads/BioPerl-1.6.1/Bio/SeqFeature/Generic.pm:517 >>> STACK toplevel tosend.pl:16 >>> ------------------------------------- >>> >>> The difference in the files that parse and those that don't seems to be that the files that crash have "intron" and "exon" tags. They ALL have "gene" tags. >>> Does anyone know why this is a problem and what can be done to circumvent it? >>> >>> Thanks, >>> Adlai >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From adlai at refenestration.com Fri Feb 24 16:55:57 2012 From: adlai at refenestration.com (Adlai Burman) Date: Fri, 24 Feb 2012 22:55:57 +0100 Subject: [Bioperl-l] Odd problem with get_tag_values In-Reply-To: References: <1C2950DB-455A-4B80-8D23-4A241FB857BE@refenestration.com> Message-ID: <20C8CE5C-FBA5-4333-8A85-86804595DB3C@refenestration.com> Jason, Your first solution, indeed, did the trick (though I'm not sure why). There was no need to for checking "else." I'm not sure why some records with a full set of "gene" tags would not parse without the check, but everything parsed with it. Brian, you were right. Thanks again, Adlai On Feb 24, 2012, at 10:21 PM, Jason Stajich wrote: > not all CDS will be annotated with a 'gene' tag, this is due to variation in how annotation is done and that there is not a requirement that there be a gene tag for all CDS features. > > You can protect your query - we often do this when dealing with data from the wild by testing for has_tag first. > > my %strands; > for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) { > if( $cds->has_tag('gene') ) { > my ($gene) = $cds->get_tag_values('gene'); # get the 1st one, this returns a list > $strands{$gene} = $cds->strand; > } else { # look in alternative places for a name, e.g. locus, > ... > } > } > > An alternative is to loop through your list of tags in order of preference > > my %strands; > for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) { > for my $tag ( qw(gene locus name product accession note) ) { > if( $cds->has_tag($tag) ) { > my ($name) = $cds->get_tag_values($tag); # get the 1st one, this returns a list > $strands{$name} = $cds->strand; > $seen = 1; > last; > } > if( ! $seen ) { > warn("not tag found for feature at ", $cds->location->to_FTstring, "\n"); > } > } > > On Feb 24, 2012, at 12:43 PM, Adlai Burman wrote: > >> I have come across a perplexing problem with trying to parse sequence features into hashes from gb records. This is the minimal code which shows my problem: >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> use IO::String; >> use Bio::Perl; >> use Bio::SeqIO; >> use IO::String; >> >> my @files = ; >> foreach my $file(@files){ >> >> >> my @cds_features = grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures; >> my %strands = map {$_->get_tag_values('gene'), $_->strand} @cds_features; ##This Is The Culprit. >> . >> . >> . >> #do nifty stuff >> } >> >> For some files this approach works just fine. >> For others the script dies immediately with the error message: >> >> ------------- EXCEPTION ------------- >> MSG: asking for tag value that does not exist gene >> STACK Bio::SeqFeature::Generic::get_tag_values /Users/adlai/Downloads/BioPerl-1.6.1/Bio/SeqFeature/Generic.pm:517 >> STACK toplevel tosend.pl:16 >> ------------------------------------- >> >> The difference in the files that parse and those that don't seems to be that the files that crash have "intron" and "exon" tags. They ALL have "gene" tags. >> Does anyone know why this is a problem and what can be done to circumvent it? >> >> Thanks, >> Adlai >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > From adlai at refenestration.com Fri Feb 24 16:57:24 2012 From: adlai at refenestration.com (Adlai Burman) Date: Fri, 24 Feb 2012 22:57:24 +0100 Subject: [Bioperl-l] Odd problem with get_tag_values In-Reply-To: References: <1C2950DB-455A-4B80-8D23-4A241FB857BE@refenestration.com> Message-ID: <1230FA01-6821-4C33-9C15-14669E5DCEFA@refenestration.com> On Feb 24, 2012, at 10:46 PM, Fields, Christopher J wrote: > Using has_tag('gene') as a pre-screen works for me for both example seqs. > Me too :-) Dobrou noc and cheers, Adlai > chris > > On Feb 24, 2012, at 3:33 PM, Adlai Burman wrote: > >> Thanks so much, Jason. >> I will give that a try in after I get a few hours of much needed sleep :-) >> >> >> On Feb 24, 2012, at 10:21 PM, Jason Stajich wrote: >> >>> not all CDS will be annotated with a 'gene' tag, this is due to variation in how annotation is done and that there is not a requirement that there be a gene tag for all CDS features. >>> >>> You can protect your query - we often do this when dealing with data from the wild by testing for has_tag first. >>> >>> my %strands; >>> for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) { >>> if( $cds->has_tag('gene') ) { >>> my ($gene) = $cds->get_tag_values('gene'); # get the 1st one, this returns a list >>> $strands{$gene} = $cds->strand; >>> } else { # look in alternative places for a name, e.g. locus, >>> ... >>> } >>> } >>> >>> An alternative is to loop through your list of tags in order of preference >>> >>> my %strands; >>> for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) { >>> for my $tag ( qw(gene locus name product accession note) ) { >>> if( $cds->has_tag($tag) ) { >>> my ($name) = $cds->get_tag_values($tag); # get the 1st one, this returns a list >>> $strands{$name} = $cds->strand; >>> $seen = 1; >>> last; >>> } >>> if( ! $seen ) { >>> warn("not tag found for feature at ", $cds->location->to_FTstring, "\n"); >>> } >>> } >>> >>> On Feb 24, 2012, at 12:43 PM, Adlai Burman wrote: >>> >>>> I have come across a perplexing problem with trying to parse sequence features into hashes from gb records. This is the minimal code which shows my problem: >>>> >>>> #!/usr/bin/perl >>>> use strict; >>>> use warnings; >>>> use IO::String; >>>> use Bio::Perl; >>>> use Bio::SeqIO; >>>> use IO::String; >>>> >>>> my @files = ; >>>> foreach my $file(@files){ >>>> >>>> >>>> my @cds_features = grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures; >>>> my %strands = map {$_->get_tag_values('gene'), $_->strand} @cds_features; ##This Is The Culprit. >>>> . >>>> . >>>> . >>>> #do nifty stuff >>>> } >>>> >>>> For some files this approach works just fine. >>>> For others the script dies immediately with the error message: >>>> >>>> ------------- EXCEPTION ------------- >>>> MSG: asking for tag value that does not exist gene >>>> STACK Bio::SeqFeature::Generic::get_tag_values /Users/adlai/Downloads/BioPerl-1.6.1/Bio/SeqFeature/Generic.pm:517 >>>> STACK toplevel tosend.pl:16 >>>> ------------------------------------- >>>> >>>> The difference in the files that parse and those that don't seems to be that the files that crash have "intron" and "exon" tags. They ALL have "gene" tags. >>>> Does anyone know why this is a problem and what can be done to circumvent it? >>>> >>>> Thanks, >>>> Adlai >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Jason Stajich >>> jason.stajich at gmail.com >>> jason at bioperl.org >>> >>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bosborne11 at verizon.net Fri Feb 24 16:03:49 2012 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 24 Feb 2012 16:03:49 -0500 Subject: [Bioperl-l] Odd problem with get_tag_values In-Reply-To: <1C2950DB-455A-4B80-8D23-4A241FB857BE@refenestration.com> References: <1C2950DB-455A-4B80-8D23-4A241FB857BE@refenestration.com> Message-ID: <1840B7A6-D949-44CA-8CDE-1313F06CD0CD@verizon.net> Adlai, You are absolutely sure that every single CDS feature has a "gene" tag inside it? If this is not the case then you have to use the "if ($cds_feature->has_tag("gene")) ?" type of logic. Brian O. On Feb 24, 2012, at 3:43 PM, Adlai Burman wrote: > I have come across a perplexing problem with trying to parse sequence features into hashes from gb records. This is the minimal code which shows my problem: > > #!/usr/bin/perl > use strict; > use warnings; > use IO::String; > use Bio::Perl; > use Bio::SeqIO; > use IO::String; > > my @files = ; > foreach my $file(@files){ > > > my @cds_features = grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures; > my %strands = map {$_->get_tag_values('gene'), $_->strand} @cds_features; ##This Is The Culprit. > . > . > . > #do nifty stuff > } > > For some files this approach works just fine. > For others the script dies immediately with the error message: > > ------------- EXCEPTION ------------- > MSG: asking for tag value that does not exist gene > STACK Bio::SeqFeature::Generic::get_tag_values /Users/adlai/Downloads/BioPerl-1.6.1/Bio/SeqFeature/Generic.pm:517 > STACK toplevel tosend.pl:16 > ------------------------------------- > > The difference in the files that parse and those that don't seems to be that the files that crash have "intron" and "exon" tags. They ALL have "gene" tags. > Does anyone know why this is a problem and what can be done to circumvent it? > > Thanks, > Adlai > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Feb 24 17:38:07 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 24 Feb 2012 22:38:07 +0000 Subject: [Bioperl-l] Odd problem with get_tag_values In-Reply-To: <20C8CE5C-FBA5-4333-8A85-86804595DB3C@refenestration.com> References: <1C2950DB-455A-4B80-8D23-4A241FB857BE@refenestration.com> <20C8CE5C-FBA5-4333-8A85-86804595DB3C@refenestration.com> Message-ID: <665C4737-93DA-4466-986B-380D8F266A58@illinois.edu> There is possibly a slight disconnect here. The primary tag you are looking for is 'CDS'. Here is an example of both the primary tag for 'CDS' and 'gene' from NC_012927: gene complement(join(91716..92516,69640..69753)) /locus_tag="BaolC_p001" /trans_splicing /db_xref="GeneID:8223103" CDS complement(join(91716..91744,92285..92516,69640..69753)) /locus_tag="BaolC_p001" /trans_splicing /codon_start=1 /transl_table=11 /product="ribosomal protein S12" /protein_id="YP_003029720.1" /db_xref="GI:253729537" /db_xref="GeneID:8223103" /translation="MPTVKQLIRNARQPIRNARKSAALKGCPQRRGTCARVYTINPKK PNSALRKVARVRLTSGFEITAYIPGIGHNLQEHSVVLVRGGRVKDLPGVRYRIIRGTL DAVAVKNRQQGRSKYGVKKPKK" This 'CDS' example has no 'gene' regular tag, in fact none that I looked at did. But there is another feature (above it) that does have the *primary* tag 'gene' and same locus_tag and dbxref GeneID (it does have tags such as 'locus_tag', 'trans_splicing', etc). Is that what you mean? The way that BioPerl deals with this is to return two different independent features, one with the 'CDS' primary tag and one with the 'gene' primary tag. Past attempts to somehow combine these (and then disambiguate them again later if needed for output) can be problematic, or at least they once were. chris On Feb 24, 2012, at 3:55 PM, Adlai Burman wrote: > Jason, > Your first solution, indeed, did the trick (though I'm not sure why). There was no need to for checking "else." I'm not sure why some records with a full set of "gene" tags would not parse without the check, but everything parsed with it. > > Brian, you were right. > > Thanks again, > > Adlai > On Feb 24, 2012, at 10:21 PM, Jason Stajich wrote: > >> not all CDS will be annotated with a 'gene' tag, this is due to variation in how annotation is done and that there is not a requirement that there be a gene tag for all CDS features. >> >> You can protect your query - we often do this when dealing with data from the wild by testing for has_tag first. >> >> my %strands; >> for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) { >> if( $cds->has_tag('gene') ) { >> my ($gene) = $cds->get_tag_values('gene'); # get the 1st one, this returns a list >> $strands{$gene} = $cds->strand; >> } else { # look in alternative places for a name, e.g. locus, >> ... >> } >> } >> >> An alternative is to loop through your list of tags in order of preference >> >> my %strands; >> for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) { >> for my $tag ( qw(gene locus name product accession note) ) { >> if( $cds->has_tag($tag) ) { >> my ($name) = $cds->get_tag_values($tag); # get the 1st one, this returns a list >> $strands{$name} = $cds->strand; >> $seen = 1; >> last; >> } >> if( ! $seen ) { >> warn("not tag found for feature at ", $cds->location->to_FTstring, "\n"); >> } >> } >> >> On Feb 24, 2012, at 12:43 PM, Adlai Burman wrote: >> >>> I have come across a perplexing problem with trying to parse sequence features into hashes from gb records. This is the minimal code which shows my problem: >>> >>> #!/usr/bin/perl >>> use strict; >>> use warnings; >>> use IO::String; >>> use Bio::Perl; >>> use Bio::SeqIO; >>> use IO::String; >>> >>> my @files = ; >>> foreach my $file(@files){ >>> >>> >>> my @cds_features = grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures; >>> my %strands = map {$_->get_tag_values('gene'), $_->strand} @cds_features; ##This Is The Culprit. >>> . >>> . >>> . >>> #do nifty stuff >>> } >>> >>> For some files this approach works just fine. >>> For others the script dies immediately with the error message: >>> >>> ------------- EXCEPTION ------------- >>> MSG: asking for tag value that does not exist gene >>> STACK Bio::SeqFeature::Generic::get_tag_values /Users/adlai/Downloads/BioPerl-1.6.1/Bio/SeqFeature/Generic.pm:517 >>> STACK toplevel tosend.pl:16 >>> ------------------------------------- >>> >>> The difference in the files that parse and those that don't seems to be that the files that crash have "intron" and "exon" tags. They ALL have "gene" tags. >>> Does anyone know why this is a problem and what can be done to circumvent it? >>> >>> Thanks, >>> Adlai >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From adlai at refenestration.com Fri Feb 24 18:07:37 2012 From: adlai at refenestration.com (Adlai Burman) Date: Sat, 25 Feb 2012 00:07:37 +0100 Subject: [Bioperl-l] Odd problem with get_tag_values In-Reply-To: <665C4737-93DA-4466-986B-380D8F266A58@illinois.edu> References: <1C2950DB-455A-4B80-8D23-4A241FB857BE@refenestration.com> <20C8CE5C-FBA5-4333-8A85-86804595DB3C@refenestration.com> <665C4737-93DA-4466-986B-380D8F266A58@illinois.edu> Message-ID: <758BCF25-5C17-449E-B8F0-3E3DFDF8784D@refenestration.com> I apologize, Chris. That was a terrible example I sent. I accidentally sent the only record for which there actually are no regular 'gene' tags. Other gb records that failed before Jason's tag check include NC_005086 and they all have regular 'gene' tags. I should have referenced that one. I appreciate your checking into that and sorry about the red herring. Boy is my face blushed. A. On Feb 24, 2012, at 11:38 PM, Fields, Christopher J wrote: > There is possibly a slight disconnect here. The primary tag you are looking for is 'CDS'. Here is an example of both the primary tag for 'CDS' and 'gene' from NC_012927: > > gene complement(join(91716..92516,69640..69753)) > /locus_tag="BaolC_p001" > /trans_splicing > /db_xref="GeneID:8223103" > CDS complement(join(91716..91744,92285..92516,69640..69753)) > /locus_tag="BaolC_p001" > /trans_splicing > /codon_start=1 > /transl_table=11 > /product="ribosomal protein S12" > /protein_id="YP_003029720.1" > /db_xref="GI:253729537" > /db_xref="GeneID:8223103" > /translation="MPTVKQLIRNARQPIRNARKSAALKGCPQRRGTCARVYTINPKK > PNSALRKVARVRLTSGFEITAYIPGIGHNLQEHSVVLVRGGRVKDLPGVRYRIIRGTL > DAVAVKNRQQGRSKYGVKKPKK" > > This 'CDS' example has no 'gene' regular tag, in fact none that I looked at did. But there is another feature (above it) that does have the *primary* tag 'gene' and same locus_tag and dbxref GeneID (it does have tags such as 'locus_tag', 'trans_splicing', etc). Is that what you mean? > > The way that BioPerl deals with this is to return two different independent features, one with the 'CDS' primary tag and one with the 'gene' primary tag. Past attempts to somehow combine these (and then disambiguate them again later if needed for output) can be problematic, or at least they once were. > > chris > > > On Feb 24, 2012, at 3:55 PM, Adlai Burman wrote: > >> Jason, >> Your first solution, indeed, did the trick (though I'm not sure why). There was no need to for checking "else." I'm not sure why some records with a full set of "gene" tags would not parse without the check, but everything parsed with it. >> >> Brian, you were right. >> >> Thanks again, >> >> Adlai >> On Feb 24, 2012, at 10:21 PM, Jason Stajich wrote: >> >>> not all CDS will be annotated with a 'gene' tag, this is due to variation in how annotation is done and that there is not a requirement that there be a gene tag for all CDS features. >>> >>> You can protect your query - we often do this when dealing with data from the wild by testing for has_tag first. >>> >>> my %strands; >>> for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) { >>> if( $cds->has_tag('gene') ) { >>> my ($gene) = $cds->get_tag_values('gene'); # get the 1st one, this returns a list >>> $strands{$gene} = $cds->strand; >>> } else { # look in alternative places for a name, e.g. locus, >>> ... >>> } >>> } >>> >>> An alternative is to loop through your list of tags in order of preference >>> >>> my %strands; >>> for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) { >>> for my $tag ( qw(gene locus name product accession note) ) { >>> if( $cds->has_tag($tag) ) { >>> my ($name) = $cds->get_tag_values($tag); # get the 1st one, this returns a list >>> $strands{$name} = $cds->strand; >>> $seen = 1; >>> last; >>> } >>> if( ! $seen ) { >>> warn("not tag found for feature at ", $cds->location->to_FTstring, "\n"); >>> } >>> } >>> >>> On Feb 24, 2012, at 12:43 PM, Adlai Burman wrote: >>> >>>> I have come across a perplexing problem with trying to parse sequence features into hashes from gb records. This is the minimal code which shows my problem: >>>> >>>> #!/usr/bin/perl >>>> use strict; >>>> use warnings; >>>> use IO::String; >>>> use Bio::Perl; >>>> use Bio::SeqIO; >>>> use IO::String; >>>> >>>> my @files = ; >>>> foreach my $file(@files){ >>>> >>>> >>>> my @cds_features = grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures; >>>> my %strands = map {$_->get_tag_values('gene'), $_->strand} @cds_features; ##This Is The Culprit. >>>> . >>>> . >>>> . >>>> #do nifty stuff >>>> } >>>> >>>> For some files this approach works just fine. >>>> For others the script dies immediately with the error message: >>>> >>>> ------------- EXCEPTION ------------- >>>> MSG: asking for tag value that does not exist gene >>>> STACK Bio::SeqFeature::Generic::get_tag_values /Users/adlai/Downloads/BioPerl-1.6.1/Bio/SeqFeature/Generic.pm:517 >>>> STACK toplevel tosend.pl:16 >>>> ------------------------------------- >>>> >>>> The difference in the files that parse and those that don't seems to be that the files that crash have "intron" and "exon" tags. They ALL have "gene" tags. >>>> Does anyone know why this is a problem and what can be done to circumvent it? >>>> >>>> Thanks, >>>> Adlai >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Jason Stajich >>> jason.stajich at gmail.com >>> jason at bioperl.org >>> >>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Fri Feb 24 18:28:39 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 24 Feb 2012 23:28:39 +0000 Subject: [Bioperl-l] Odd problem with get_tag_values In-Reply-To: <758BCF25-5C17-449E-B8F0-3E3DFDF8784D@refenestration.com> References: <1C2950DB-455A-4B80-8D23-4A241FB857BE@refenestration.com> <20C8CE5C-FBA5-4333-8A85-86804595DB3C@refenestration.com> <665C4737-93DA-4466-986B-380D8F266A58@illinois.edu>, <758BCF25-5C17-449E-B8F0-3E3DFDF8784D@refenestration.com> Message-ID: <0F3F54D9-84BF-4136-8D23-52A979351038@illinois.edu> No problem. This highlights the #1 problem with genbank files (and the reasons we keep things as simple as possible), namely lack of consistency. Not a problem with NCBI per se, but a problem nonetheless. Chris On Feb 24, 2012, at 5:07 PM, "Adlai Burman" wrote: > I apologize, Chris. That was a terrible example I sent. I accidentally sent the only record for which there actually are no regular 'gene' tags. Other gb records that failed before Jason's tag check include NC_005086 and they all have regular 'gene' tags. I should have referenced that one. > I appreciate your checking into that and sorry about the red herring. Boy is my face blushed. > > A. > > > On Feb 24, 2012, at 11:38 PM, Fields, Christopher J wrote: > >> There is possibly a slight disconnect here. The primary tag you are looking for is 'CDS'. Here is an example of both the primary tag for 'CDS' and 'gene' from NC_012927: >> >> gene complement(join(91716..92516,69640..69753)) >> /locus_tag="BaolC_p001" >> /trans_splicing >> /db_xref="GeneID:8223103" >> CDS complement(join(91716..91744,92285..92516,69640..69753)) >> /locus_tag="BaolC_p001" >> /trans_splicing >> /codon_start=1 >> /transl_table=11 >> /product="ribosomal protein S12" >> /protein_id="YP_003029720.1" >> /db_xref="GI:253729537" >> /db_xref="GeneID:8223103" >> /translation="MPTVKQLIRNARQPIRNARKSAALKGCPQRRGTCARVYTINPKK >> PNSALRKVARVRLTSGFEITAYIPGIGHNLQEHSVVLVRGGRVKDLPGVRYRIIRGTL >> DAVAVKNRQQGRSKYGVKKPKK" >> >> This 'CDS' example has no 'gene' regular tag, in fact none that I looked at did. But there is another feature (above it) that does have the *primary* tag 'gene' and same locus_tag and dbxref GeneID (it does have tags such as 'locus_tag', 'trans_splicing', etc). Is that what you mean? >> >> The way that BioPerl deals with this is to return two different independent features, one with the 'CDS' primary tag and one with the 'gene' primary tag. Past attempts to somehow combine these (and then disambiguate them again later if needed for output) can be problematic, or at least they once were. >> >> chris >> >> >> On Feb 24, 2012, at 3:55 PM, Adlai Burman wrote: >> >>> Jason, >>> Your first solution, indeed, did the trick (though I'm not sure why). There was no need to for checking "else." I'm not sure why some records with a full set of "gene" tags would not parse without the check, but everything parsed with it. >>> >>> Brian, you were right. >>> >>> Thanks again, >>> >>> Adlai >>> On Feb 24, 2012, at 10:21 PM, Jason Stajich wrote: >>> >>>> not all CDS will be annotated with a 'gene' tag, this is due to variation in how annotation is done and that there is not a requirement that there be a gene tag for all CDS features. >>>> >>>> You can protect your query - we often do this when dealing with data from the wild by testing for has_tag first. >>>> >>>> my %strands; >>>> for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) { >>>> if( $cds->has_tag('gene') ) { >>>> my ($gene) = $cds->get_tag_values('gene'); # get the 1st one, this returns a list >>>> $strands{$gene} = $cds->strand; >>>> } else { # look in alternative places for a name, e.g. locus, >>>> ... >>>> } >>>> } >>>> >>>> An alternative is to loop through your list of tags in order of preference >>>> >>>> my %strands; >>>> for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) { >>>> for my $tag ( qw(gene locus name product accession note) ) { >>>> if( $cds->has_tag($tag) ) { >>>> my ($name) = $cds->get_tag_values($tag); # get the 1st one, this returns a list >>>> $strands{$name} = $cds->strand; >>>> $seen = 1; >>>> last; >>>> } >>>> if( ! $seen ) { >>>> warn("not tag found for feature at ", $cds->location->to_FTstring, "\n"); >>>> } >>>> } >>>> >>>> On Feb 24, 2012, at 12:43 PM, Adlai Burman wrote: >>>> >>>>> I have come across a perplexing problem with trying to parse sequence features into hashes from gb records. This is the minimal code which shows my problem: >>>>> >>>>> #!/usr/bin/perl >>>>> use strict; >>>>> use warnings; >>>>> use IO::String; >>>>> use Bio::Perl; >>>>> use Bio::SeqIO; >>>>> use IO::String; >>>>> >>>>> my @files = ; >>>>> foreach my $file(@files){ >>>>> >>>>> >>>>> my @cds_features = grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures; >>>>> my %strands = map {$_->get_tag_values('gene'), $_->strand} @cds_features; ##This Is The Culprit. >>>>> . >>>>> . >>>>> . >>>>> #do nifty stuff >>>>> } >>>>> >>>>> For some files this approach works just fine. >>>>> For others the script dies immediately with the error message: >>>>> >>>>> ------------- EXCEPTION ------------- >>>>> MSG: asking for tag value that does not exist gene >>>>> STACK Bio::SeqFeature::Generic::get_tag_values /Users/adlai/Downloads/BioPerl-1.6.1/Bio/SeqFeature/Generic.pm:517 >>>>> STACK toplevel tosend.pl:16 >>>>> ------------------------------------- >>>>> >>>>> The difference in the files that parse and those that don't seems to be that the files that crash have "intron" and "exon" tags. They ALL have "gene" tags. >>>>> Does anyone know why this is a problem and what can be done to circumvent it? >>>>> >>>>> Thanks, >>>>> Adlai >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Jason Stajich >>>> jason.stajich at gmail.com >>>> jason at bioperl.org >>>> >>>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From yang.liu0508 at gmail.com Sat Feb 25 01:52:05 2012 From: yang.liu0508 at gmail.com (yang liu) Date: Sat, 25 Feb 2012 01:52:05 -0500 Subject: [Bioperl-l] extract sequences and save into files by genes Message-ID: Dear colleagues, I have multiple files named by species name. Each file has ca. 100 different genes. I want to extract the sequences and save them by gene. In the output file, the gene name would be the species name. How should I do? The input file would be like this (with the file name, Acidosasa.txt, Acorus.txt....) >rps12 ATGCCAACGGTTAAACAACTTATTAGAAACGCAAGACAGCCAATACGAAATGCTAGAAAATCGCCCGCGC TTAAGGGATGTCCTCAGCGTCGAGGAACATGTGCTAGGGTGTATACTATCAACCCCAAAAAACCCAACTC >psbA TTATCCATTAAGAGATGGAACTTCAAGAACAGCTAGGTCTAGAGGGAAGTTGTGAGCATTACGTTCGTGC ATTACCTCCATACCAAGATTAGCACGGTTGATGATATCAGCCCAAGTATTAATAACGCGACCTTGGCTAT ..... I hope the output file to be like this, file name = rps12.txt, psbA.txt.... within rps12.txt, the sequence is like, >Acidosasa ATGCCAACGGTTAAACAACTTATTAGAAACGCAAGACAGCCAATACGAAATGCTAGAAAATCGCCCGCGC TTAAGGGATGTCCTCAGCGTCGAGGAACATGTGCTAGGGTGTATACTATCAACCCCAAAAAACCCAACTC >Acorus ATGCCAACTATTAAACAACTTATTAGAAACACAAGACAGCCAATCCGAAATGTC I do not know if I expressed clearly. Thanks. From abhishek.vit at gmail.com Sun Feb 26 01:24:14 2012 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Sat, 25 Feb 2012 22:24:14 -0800 Subject: [Bioperl-l] fetching all alignments from a sam/bam by read header in perl In-Reply-To: References: Message-ID: Hi Guys Reading the doc page for Bio::DB::SAM I see there is a way to fetch reads by name (read id) but the documentation also says this is slow.(copied below). I need to do about 300-500 million look ups and if each one is costly I wanted to know if there is another slick low level way. For my application I would not have feature location just the read name. -name Filter on reads with the designated name. Note that this can be a slow operation unless accompanied by the feature location as well. -Abhi On Fri, Feb 24, 2012 at 6:58 AM, Abhishek Pratap wrote: > Hi Peter > > You got it right. > > Here is the link : > > http://biostar.stackexchange.com/questions/17787/fetching-all-alignments-from-a-sam-bam-by-read-header-in-perl > > > > -A > > On Fri, Feb 24, 2012 at 1:24 AM, Peter Cock > wrote: > > On Fri, Feb 24, 2012 at 12:55 AM, Abhishek Pratap > > wrote: > >> I am wondering if there is a slick way access all the possible > >> alignments for a read present in sam or bam file given the read > >> header. Since the existing codebase is in perl I would prefer > >> something which can be done in/via perl. > >> > >> By default BAM's are indexed by location so the inbuilt samtools > >> indexing wont work I guess. > >> > >> I should also say the input bam file will have in the order of 500 > >> million total alignments and many reads are expected to be aligned to > >> more than one place in the genome. Given the size of the data loading > >> it all in one big hash is not turning out to be memory friendly. > > > > Are you asking for SAM/BAM read lookup by read name? > > > >> PS: I also posted this earlier on Biostar. > > > > Link? > > > > Peter > From florent.angly at gmail.com Sun Feb 26 01:44:06 2012 From: florent.angly at gmail.com (Florent Angly) Date: Sun, 26 Feb 2012 16:44:06 +1000 Subject: [Bioperl-l] Fate of Bio::Tools::PCRSimulation Message-ID: <4F49D4B6.5050301@gmail.com> Hi all, I am interested in the Bio::Tools::PCRSimulation module. Supposedly it was added to Bioperl 0.3 and is also mentionned in the Bio::PrimedSeq module. However, I cannot find in the current Bioperl codebase. Any idea where it went? The reason I am asking is because I have some code to do silico PCR using regular expressions. I wanted to modularize my code more and make it into a module for Bioperl. Of course, if there is something similar in Bioperl already, I need to have a look at it. If there is nothing similar, what namespace do you suggest to use? Bio::Tools::AmpliconExtractor? Bio::Tools::AmpliconSearch? Bio::Tools::InSilicoPCR? Thanks, Florent From j_martin at lbl.gov Sun Feb 26 11:39:16 2012 From: j_martin at lbl.gov (Joel Martin) Date: Sun, 26 Feb 2012 08:39:16 -0800 Subject: [Bioperl-l] fetching all alignments from a sam/bam by read header in perl In-Reply-To: References: Message-ID: Sort the bam by name so all hits are adjacent. If you need to subsequently do random lookups then you could add / alter tags for each read with multiple hits indicating where those hits are and resort the bam by coordinate. Joel On Sat, Feb 25, 2012 at 10:24 PM, Abhishek Pratap wrote: > Hi Guys > > Reading the doc page for Bio::DB::SAM I see there is a way to fetch reads > by name (read id) but the documentation also says this is slow.(copied > below). I need to do about 300-500 million look ups and if each one is > costly I wanted to know if there is another slick low level way. For my > application I would not have feature location just the read name. > > -name Filter on reads with the designated name. Note that > this can be a slow operation unless accompanied by > the feature location as well. > > > -Abhi > > > > On Fri, Feb 24, 2012 at 6:58 AM, Abhishek Pratap >wrote: > > > Hi Peter > > > > You got it right. > > > > Here is the link : > > > > > http://biostar.stackexchange.com/questions/17787/fetching-all-alignments-from-a-sam-bam-by-read-header-in-perl > > > > > > > > -A > > > > On Fri, Feb 24, 2012 at 1:24 AM, Peter Cock > > wrote: > > > On Fri, Feb 24, 2012 at 12:55 AM, Abhishek Pratap > > > wrote: > > >> I am wondering if there is a slick way access all the possible > > >> alignments for a read present in sam or bam file given the read > > >> header. Since the existing codebase is in perl I would prefer > > >> something which can be done in/via perl. > > >> > > >> By default BAM's are indexed by location so the inbuilt samtools > > >> indexing wont work I guess. > > >> > > >> I should also say the input bam file will have in the order of 500 > > >> million total alignments and many reads are expected to be aligned to > > >> more than one place in the genome. Given the size of the data loading > > >> it all in one big hash is not turning out to be memory friendly. > > > > > > Are you asking for SAM/BAM read lookup by read name? > > > > > >> PS: I also posted this earlier on Biostar. > > > > > > Link? > > > > > > Peter > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fossandonc at hotmail.com Sun Feb 26 14:08:55 2012 From: fossandonc at hotmail.com (=?iso-8859-1?Q?Francisco_J._Ossand=F3n?=) Date: Sun, 26 Feb 2012 16:08:55 -0300 Subject: [Bioperl-l] Odd problem with get_tag_values In-Reply-To: <0F3F54D9-84BF-4136-8D23-52A979351038@illinois.edu> References: <1C2950DB-455A-4B80-8D23-4A241FB857BE@refenestration.com> <20C8CE5C-FBA5-4333-8A85-86804595DB3C@refenestration.com> <665C4737-93DA-4466-986B-380D8F266A58@illinois.edu>, <758BCF25-5C17-449E-B8F0-3E3DFDF8784D@refenestration.com> <0F3F54D9-84BF-4136-8D23-52A979351038@illinois.edu> Message-ID: I have been parsing Genbank files with Bioperl for a long time now, so for many types of data I wrote a code that always checks if the data exists before asking for it, to avoid scripts crashes/warnings (I remember I made customs GBKs deleting specific lines to manually check for crashes). I use that in all my programs and they work fine. I use Perl ternaries for most one-liners: my $version = $seq_obj->version || ''; my $definition = $seq_obj->desc || ''; my $dna_shape = $seq_obj->is_circular ? 'circular' : 'linear'; my $prot_id = $feat->has_tag('protein_id') ? ($feat->get_tag_values('protein_id'))[0] : ''; my $product = $feat->has_tag('product') ? ($feat->get_tag_values('product'))[0] : ''; I usually try to code defensively when parsing Genbanks, expecting every step to go wrong. For example, try to parse ?Escherichia coli str. K-12 substr. MG1655 chromosome? (NC_000913.gbk), and you will see many things go wrong (like CDS without /protein_id or /product tags). ;) Francisco J. Ossandon -----Mensaje original----- De: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de Fields, Christopher J Enviado el: viernes, 24 de febrero de 2012 20:29 Para: Adlai Burman CC: ; Jason Stajich Asunto: Re: [Bioperl-l] Odd problem with get_tag_values No problem. This highlights the #1 problem with genbank files (and the reasons we keep things as simple as possible), namely lack of consistency. Not a problem with NCBI per se, but a problem nonetheless. Chris On Feb 24, 2012, at 5:07 PM, "Adlai Burman" wrote: > I apologize, Chris. That was a terrible example I sent. I accidentally sent the only record for which there actually are no regular 'gene' tags. Other gb records that failed before Jason's tag check include NC_005086 and they all have regular 'gene' tags. I should have referenced that one. > I appreciate your checking into that and sorry about the red herring. Boy is my face blushed. > > A. > > > On Feb 24, 2012, at 11:38 PM, Fields, Christopher J wrote: > >> There is possibly a slight disconnect here. The primary tag you are looking for is 'CDS'. Here is an example of both the primary tag for 'CDS' and 'gene' from NC_012927: >> >> gene complement(join(91716..92516,69640..69753)) >> /locus_tag="BaolC_p001" >> /trans_splicing >> /db_xref="GeneID:8223103" >> CDS complement(join(91716..91744,92285..92516,69640..69753)) >> /locus_tag="BaolC_p001" >> /trans_splicing >> /codon_start=1 >> /transl_table=11 >> /product="ribosomal protein S12" >> /protein_id="YP_003029720.1" >> /db_xref="GI:253729537" >> /db_xref="GeneID:8223103" >> /translation="MPTVKQLIRNARQPIRNARKSAALKGCPQRRGTCARVYTINPKK >> PNSALRKVARVRLTSGFEITAYIPGIGHNLQEHSVVLVRGGRVKDLPGVRYRIIRGTL >> DAVAVKNRQQGRSKYGVKKPKK" >> >> This 'CDS' example has no 'gene' regular tag, in fact none that I looked at did. But there is another feature (above it) that does have the *primary* tag 'gene' and same locus_tag and dbxref GeneID (it does have tags such as 'locus_tag', 'trans_splicing', etc). Is that what you mean? >> >> The way that BioPerl deals with this is to return two different independent features, one with the 'CDS' primary tag and one with the 'gene' primary tag. Past attempts to somehow combine these (and then disambiguate them again later if needed for output) can be problematic, or at least they once were. >> >> chris >> >> >> On Feb 24, 2012, at 3:55 PM, Adlai Burman wrote: >> >>> Jason, >>> Your first solution, indeed, did the trick (though I'm not sure why). There was no need to for checking "else." I'm not sure why some records with a full set of "gene" tags would not parse without the check, but everything parsed with it. >>> >>> Brian, you were right. >>> >>> Thanks again, >>> >>> Adlai >>> On Feb 24, 2012, at 10:21 PM, Jason Stajich wrote: >>> >>>> not all CDS will be annotated with a 'gene' tag, this is due to variation in how annotation is done and that there is not a requirement that there be a gene tag for all CDS features. >>>> >>>> You can protect your query - we often do this when dealing with data from the wild by testing for has_tag first. >>>> >>>> my %strands; >>>> for my $cds ( grep {$_->primary_tag eq 'CDS' } >>>> Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) { if( $cds->has_tag('gene') ) { >>>> my ($gene) = $cds->get_tag_values('gene'); # get the 1st one, this returns a list >>>> $strands{$gene} = $cds->strand; } else { # look in alternative >>>> places for a name, e.g. locus, ... >>>> } >>>> } >>>> >>>> An alternative is to loop through your list of tags in order of >>>> preference >>>> >>>> my %strands; >>>> for my $cds ( grep {$_->primary_tag eq 'CDS' } >>>> Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) { for >>>> my $tag ( qw(gene locus name product accession note) ) { if( $cds->has_tag($tag) ) { >>>> my ($name) = $cds->get_tag_values($tag); # get the 1st one, this returns a list >>>> $strands{$name} = $cds->strand; >>>> $seen = 1; >>>> last; >>>> } >>>> if( ! $seen ) { >>>> warn("not tag found for feature at ", >>>> $cds->location->to_FTstring, "\n"); } } >>>> >>>> On Feb 24, 2012, at 12:43 PM, Adlai Burman wrote: >>>> >>>>> I have come across a perplexing problem with trying to parse sequence features into hashes from gb records. This is the minimal code which shows my problem: >>>>> >>>>> #!/usr/bin/perl >>>>> use strict; >>>>> use warnings; >>>>> use IO::String; >>>>> use Bio::Perl; >>>>> use Bio::SeqIO; >>>>> use IO::String; >>>>> >>>>> my @files = ; foreach my >>>>> $file(@files){ >>>>> >>>>> >>>>> my @cds_features = grep {$_->primary_tag eq 'CDS' } >>>>> Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures; >>>>> my %strands = map {$_->get_tag_values('gene'), $_->strand} @cds_features; ##This Is The Culprit. >>>>> . >>>>> . >>>>> . >>>>> #do nifty stuff >>>>> } >>>>> >>>>> For some files this approach works just fine. >>>>> For others the script dies immediately with the error message: >>>>> >>>>> ------------- EXCEPTION ------------- >>>>> MSG: asking for tag value that does not exist gene STACK >>>>> Bio::SeqFeature::Generic::get_tag_values >>>>> /Users/adlai/Downloads/BioPerl-1.6.1/Bio/SeqFeature/Generic.pm:517 >>>>> STACK toplevel tosend.pl:16 >>>>> ------------------------------------- >>>>> >>>>> The difference in the files that parse and those that don't seems to be that the files that crash have "intron" and "exon" tags. They ALL have "gene" tags. >>>>> Does anyone know why this is a problem and what can be done to circumvent it? >>>>> >>>>> Thanks, >>>>> Adlai >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Jason Stajich >>>> jason.stajich at gmail.com >>>> jason at bioperl.org >>>> >>>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Mon Feb 27 05:33:48 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 27 Feb 2012 10:33:48 +0000 Subject: [Bioperl-l] Odd problem with get_tag_values In-Reply-To: <1230FA01-6821-4C33-9C15-14669E5DCEFA@refenestration.com> References: <1C2950DB-455A-4B80-8D23-4A241FB857BE@refenestration.com> <1230FA01-6821-4C33-9C15-14669E5DCEFA@refenestration.com> Message-ID: <4F4B5C0C.8080102@gmail.com> Just to chip in on this, you can use get_tagset_values instead of get_tag_values - the former has (to me) the more Perl-ish behaviour of returning an empty list if there are none of the requested tags present, meaning that you can skip the has_tag step. Cheers, Roy. On 24/02/2012 21:57, Adlai Burman wrote: > > On Feb 24, 2012, at 10:46 PM, Fields, Christopher J wrote: > >> Using has_tag('gene') as a pre-screen works for me for both example >> seqs. >> > > Me too :-) > > Dobrou noc and cheers, > > Adlai >> chris >> >> On Feb 24, 2012, at 3:33 PM, Adlai Burman wrote: >> >>> Thanks so much, Jason. I will give that a try in after I get a >>> few hours of much needed sleep :-) >>> >>> >>> On Feb 24, 2012, at 10:21 PM, Jason Stajich wrote: >>> >>>> not all CDS will be annotated with a 'gene' tag, this is due to >>>> variation in how annotation is done and that there is not a >>>> requirement that there be a gene tag for all CDS features. >>>> >>>> You can protect your query - we often do this when dealing with >>>> data from the wild by testing for has_tag first. >>>> >>>> my %strands; for my $cds ( grep {$_->primary_tag eq 'CDS' } >>>> Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) >>>> { if( $cds->has_tag('gene') ) { my ($gene) = >>>> $cds->get_tag_values('gene'); # get the 1st one, this returns a >>>> list $strands{$gene} = $cds->strand; } else { # look in >>>> alternative places for a name, e.g. locus, ... } } >>>> >>>> An alternative is to loop through your list of tags in order of >>>> preference >>>> >>>> my %strands; for my $cds ( grep {$_->primary_tag eq 'CDS' } >>>> Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) >>>> { for my $tag ( qw(gene locus name product accession note) ) { >>>> if( $cds->has_tag($tag) ) { my ($name) = >>>> $cds->get_tag_values($tag); # get the 1st one, this returns a >>>> list $strands{$name} = $cds->strand; $seen = 1; last; } if( ! >>>> $seen ) { warn("not tag found for feature at ", >>>> $cds->location->to_FTstring, "\n"); } } >>>> >>>> On Feb 24, 2012, at 12:43 PM, Adlai Burman wrote: >>>> >>>>> I have come across a perplexing problem with trying to parse >>>>> sequence features into hashes from gb records. This is the >>>>> minimal code which shows my problem: >>>>> >>>>> #!/usr/bin/perl use strict; use warnings; use IO::String; use >>>>> Bio::Perl; use Bio::SeqIO; use IO::String; >>>>> >>>>> my @files =; foreach my >>>>> $file(@files){ >>>>> >>>>> >>>>> my @cds_features = grep {$_->primary_tag eq 'CDS' } >>>>> Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures; >>>>> my %strands = map {$_->get_tag_values('gene'), $_->strand} >>>>> @cds_features; ##This Is The Culprit. . . . #do nifty stuff >>>>> } >>>>> >>>>> For some files this approach works just fine. For others the >>>>> script dies immediately with the error message: >>>>> >>>>> ------------- EXCEPTION ------------- MSG: asking for tag >>>>> value that does not exist gene STACK >>>>> Bio::SeqFeature::Generic::get_tag_values >>>>> /Users/adlai/Downloads/BioPerl-1.6.1/Bio/SeqFeature/Generic.pm:517 >>>>> >>>>> STACK toplevel tosend.pl:16 >>>>> ------------------------------------- >>>>> >>>>> The difference in the files that parse and those that don't >>>>> seems to be that the files that crash have "intron" and >>>>> "exon" tags. They ALL have "gene" tags. Does anyone know why >>>>> this is a problem and what can be done to circumvent it? >>>>> >>>>> Thanks, Adlai >>>>> >>>>> >>>>> _______________________________________________ Bioperl-l >>>>> mailing list Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Jason Stajich jason.stajich at gmail.com jason at bioperl.org >>>> >>>> >>> >>> >>> _______________________________________________ Bioperl-l mailing >>> list Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ Bioperl-l mailing > list Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From MEC at stowers.org Mon Feb 27 10:47:51 2012 From: MEC at stowers.org (Cook, Malcolm) Date: Mon, 27 Feb 2012 09:47:51 -0600 Subject: [Bioperl-l] extract sequences and save into files by genes Message-ID: <2C40E43D1F7A56408C4463FD245DDDF997D77D30@EXCHMB-02.stowers-institute.org> You don't need bioperl for this one..... The following perl one liner will do it for you. perl -p -e 'if (1==$.) {($species = $ARGV) =~ s|\.txt||}; if (s/^>(.*)/">${species}"/e) {$gene=$1; open($O{$gene},qq{>> ${gene}.txt}); select($O{$gene})} ; close ARGV if eof' *.txt ~Malcolm > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of yang liu > Sent: Saturday, February 25, 2012 12:52 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] extract sequences and save into files by genes > > Dear colleagues, > > I have multiple files named by species name. Each file has ca. 100 > different genes. I want to extract the sequences and save them by gene. > In the output file, the gene name would be the species name. How should I > do? > > The input file would be like this (with the file name, Acidosasa.txt, > Acorus.txt....) > > >rps12 > ATGCCAACGGTTAAACAACTTATTAGAAACGCAAGACAGCCAATACGAAATGCT > AGAAAATCGCCCGCGC > TTAAGGGATGTCCTCAGCGTCGAGGAACATGTGCTAGGGTGTATACTATCAACCC > CAAAAAACCCAACTC > >psbA > TTATCCATTAAGAGATGGAACTTCAAGAACAGCTAGGTCTAGAGGGAAGTTGTG > AGCATTACGTTCGTGC > ATTACCTCCATACCAAGATTAGCACGGTTGATGATATCAGCCCAAGTATTAATAAC > GCGACCTTGGCTAT > ..... > > I hope the output file to be like this, file name = rps12.txt, psbA.txt.... > > within rps12.txt, the sequence is like, > > >Acidosasa > > ATGCCAACGGTTAAACAACTTATTAGAAACGCAAGACAGCCAATACGAAATGCT > AGAAAATCGCCCGCGC > TTAAGGGATGTCCTCAGCGTCGAGGAACATGTGCTAGGGTGTATACTATCAACCC > CAAAAAACCCAACTC > > > > > > >Acorus > ATGCCAACTATTAAACAACTTATTAGAAACACAAGACAGCCAATCCGAAATGTC > > I do not know if I expressed clearly. > > Thanks. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rbuels at gmail.com Mon Feb 27 11:24:59 2012 From: rbuels at gmail.com (Robert Buels) Date: Mon, 27 Feb 2012 11:24:59 -0500 Subject: [Bioperl-l] Update: call for Google Summer of Code project ideas Message-ID: <4F4BAE5B.40309@gmail.com> Hi all, As kindly pointed out by Reece Hart, the previous email I sent out calling for Google Summer of Code project ideas, had the wrong due date for project ideas in it. I actually want them to all be in place by Friday, March 2, which is this coming Friday. == Instructions for Wiki Editing == For each of the OBF projects that wants to do GSoC again this year, please: a.) Update the list of project ideas on your project's GSoC page (BioPython, BioPerl, BioRuby, etc). Add new ones, remove ones that have already been done or no longer relevant, etc. b.) Update the list of project ideas on the main OBF GSoC page (http://www.open-bio.org/wiki/Google_Summer_of_Code) to match. c.) Let me know via email that you have done so and it's ready for Google to peruse. == end instructions == Again, please have the updates done by this Friday (March 2). The number and quality of the project ideas are part of the evaluation process for whether OBF is accepted as a Summer of Code organization again this year, so let's come up with some good ones. :-) Rob ---- Robert Buels (prospective) 2012 OBF GSoC Organization Admin From rondonbio at yahoo.com.br Mon Feb 27 12:53:12 2012 From: rondonbio at yahoo.com.br (Rondon Neto) Date: Mon, 27 Feb 2012 09:53:12 -0800 (PST) Subject: [Bioperl-l] unninstall Bioperl In-Reply-To: <3F5F0A6E-C51E-4DF9-A293-924A5AA5245B@illinois.edu> References: <1329332355.18626.YahooMailNeo@web130206.mail.mud.yahoo.com> <1329335001.11066.YahooMailNeo@web130202.mail.mud.yahoo.com> <1329335028.66134.YahooMailNeo@web130206.mail.mud.yahoo.com> <3F5F0A6E-C51E-4DF9-A293-924A5AA5245B@illinois.edu> Message-ID: <1330365192.12597.YahooMailNeo@web130205.mail.mud.yahoo.com> Hi guys! thanks for all helps. I want to unninstall Bioperl and than, install it again. do you know how to unninstall? thank you Rondon From l.m.timmermans at students.uu.nl Mon Feb 27 14:34:25 2012 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Mon, 27 Feb 2012 20:34:25 +0100 Subject: [Bioperl-l] unninstall Bioperl In-Reply-To: <1330365192.12597.YahooMailNeo@web130205.mail.mud.yahoo.com> References: <1329332355.18626.YahooMailNeo@web130206.mail.mud.yahoo.com> <1329335001.11066.YahooMailNeo@web130202.mail.mud.yahoo.com> <1329335028.66134.YahooMailNeo@web130206.mail.mud.yahoo.com> <3F5F0A6E-C51E-4DF9-A293-924A5AA5245B@illinois.edu> <1330365192.12597.YahooMailNeo@web130205.mail.mud.yahoo.com> Message-ID: For uninstalling, you may want to try pm-uninstall (**App-pmuninstall on CPAN). Leon On Mon, Feb 27, 2012 at 6:53 PM, Rondon Neto wrote: > Hi guys! thanks for all helps. > I want to unninstall Bioperl and than, install it again. do you know how > to unninstall? > > thank you > > Rondon > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Feb 27 14:38:51 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 27 Feb 2012 19:38:51 +0000 Subject: [Bioperl-l] unninstall Bioperl In-Reply-To: References: <1329332355.18626.YahooMailNeo@web130206.mail.mud.yahoo.com> <1329335001.11066.YahooMailNeo@web130202.mail.mud.yahoo.com> <1329335028.66134.YahooMailNeo@web130206.mail.mud.yahoo.com> <3F5F0A6E-C51E-4DF9-A293-924A5AA5245B@illinois.edu> <1330365192.12597.YahooMailNeo@web130205.mail.mud.yahoo.com> Message-ID: <4781354A-7B22-4ECA-9532-84198B1B7338@illinois.edu> There is also './Build install uninst=1'. chris On Feb 27, 2012, at 1:34 PM, Leon Timmermans wrote: > For uninstalling, you may want to try pm-uninstall (**App-pmuninstall on > CPAN). > > Leon > > On Mon, Feb 27, 2012 at 6:53 PM, Rondon Neto wrote: > >> Hi guys! thanks for all helps. >> I want to unninstall Bioperl and than, install it again. do you know how >> to unninstall? >> >> thank you >> >> Rondon >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Feb 27 16:18:54 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 27 Feb 2012 21:18:54 +0000 Subject: [Bioperl-l] Fate of Bio::Tools::PCRSimulation In-Reply-To: <4F49D4B6.5050301@gmail.com> References: <4F49D4B6.5050301@gmail.com> Message-ID: On Feb 26, 2012, at 12:44 AM, Florent Angly wrote: > Hi all, > > I am interested in the Bio::Tools::PCRSimulation module. Supposedly it was added to Bioperl 0.3 and is also mentionned in the Bio::PrimedSeq module. However, I cannot find in the current Bioperl codebase. Any idea where it went? No idea; I can't find it anywhere in the code base either, and the github repo contains history going back to the original CVS repo. You can try contacting the author, possibly. > The reason I am asking is because I have some code to do silico PCR using regular expressions. I wanted to modularize my code more and make it into a module for Bioperl. Of course, if there is something similar in Bioperl already, I need to have a look at it. If there is nothing similar, what namespace do you suggest to use? Bio::Tools::AmpliconExtractor? Bio::Tools::AmpliconSearch? Bio::Tools::InSilicoPCR? > > Thanks, > > Florent Maybe the last (InSilicoPCR). chris From MEC at stowers.org Tue Feb 28 11:55:13 2012 From: MEC at stowers.org (Cook, Malcolm) Date: Tue, 28 Feb 2012 10:55:13 -0600 Subject: [Bioperl-l] extract sequences and save into files by genes In-Reply-To: References: <2C40E43D1F7A56408C4463FD245DDDF997D77D30@EXCHMB-02.stowers-institute.org> Message-ID: <2C40E43D1F7A56408C4463FD245DDDF997D77EDD@EXCHMB-02.stowers-institute.org> Yang, I'm replying back on-list. You wrote in your other email that my one-liner worked once you learned to run perl from the command line under cygwin. Great. Glad to help. Good luck. Welcome to the fray! ~Malcolm From: yang liu [mailto:yang.liu0508 at gmail.com] Sent: Monday, February 27, 2012 10:04 PM To: Cook, Malcolm Subject: Re: [Bioperl-l] extract sequences and save into files by genes Hello Malcolm, Thanks for your help. But when I run it, it returned the following line. '\.txt' is not recognized as an internal or external command, operable program or batch file. I am using windows 7, is that the problem? I have perl installed. In windows command, I firstly changed to the folder the target files exist, and then paste your script line. I am a beginner of perl. Thanks again for your help. Yang. On Mon, Feb 27, 2012 at 10:47 AM, Cook, Malcolm > wrote: You don't need bioperl for this one..... The following perl one liner will do it for you. perl -p -e 'if (1==$.) {($species = $ARGV) =~ s|\.txt||}; if (s/^>(.*)/">${species}"/e) {$gene=$1; open($O{$gene},qq{>> ${gene}.txt}); select($O{$gene})} ; close ARGV if eof' *.txt ~Malcolm > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of yang liu > Sent: Saturday, February 25, 2012 12:52 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] extract sequences and save into files by genes > > Dear colleagues, > > I have multiple files named by species name. Each file has ca. 100 > different genes. I want to extract the sequences and save them by gene. > In the output file, the gene name would be the species name. How should I > do? > > The input file would be like this (with the file name, Acidosasa.txt, > Acorus.txt....) > > >rps12 > ATGCCAACGGTTAAACAACTTATTAGAAACGCAAGACAGCCAATACGAAATGCT > AGAAAATCGCCCGCGC > TTAAGGGATGTCCTCAGCGTCGAGGAACATGTGCTAGGGTGTATACTATCAACCC > CAAAAAACCCAACTC > >psbA > TTATCCATTAAGAGATGGAACTTCAAGAACAGCTAGGTCTAGAGGGAAGTTGTG > AGCATTACGTTCGTGC > ATTACCTCCATACCAAGATTAGCACGGTTGATGATATCAGCCCAAGTATTAATAAC > GCGACCTTGGCTAT > ..... > > I hope the output file to be like this, file name = rps12.txt, psbA.txt.... > > within rps12.txt, the sequence is like, > > >Acidosasa > > ATGCCAACGGTTAAACAACTTATTAGAAACGCAAGACAGCCAATACGAAATGCT > AGAAAATCGCCCGCGC > TTAAGGGATGTCCTCAGCGTCGAGGAACATGTGCTAGGGTGTATACTATCAACCC > CAAAAAACCCAACTC > > > > > > >Acorus > ATGCCAACTATTAAACAACTTATTAGAAACACAAGACAGCCAATCCGAAATGTC > > I do not know if I expressed clearly. > > Thanks. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From crrcri at ibmb.csic.es Fri Feb 24 10:40:48 2012 From: crrcri at ibmb.csic.es (Casandra) Date: Fri, 24 Feb 2012 16:40:48 +0100 Subject: [Bioperl-l] BLAST in Bioperl - Can't call method "next_result" References: <43344F4C-9FDE-4A28-A3D0-BE05E68A32EA@gmail.com> Message-ID: <3BFA0762-0494-4733-9217-341455C81C77@ibmb.csic.es> Hi, I am installing BLAST to use it into BioPerl but I'm having some problems and I can't find how to fix them: I follow the steps in here: http://www.ncbi.nlm.nih.gov/books/NBK52640/ (at least I think I did it right) for mac. and I downloaded swissprot.gz db to try with the following script: #!/bin/perl -w use Bio::Seq; use Bio::SearchIO; use Bio::Tools::Run::StandAloneBlast; $blast_obj = Bio::Tools::Run::StandAloneBlast->new(-program => 'blastp', -database => '~/scripts/ncbi-blast-2.2.25+/db/swissprot.fa'); $seq_obj = Bio::Seq->new(-id =>"test query", -seq =>"MMSMLGGL"); $report_obj = $blast_obj->blastall($seq_obj); $result_obj = $report_obj->next_result; print $result_obj->num_hits; That I wrote based on http://www.bioperl.org/wiki/HOWTO:Beginners BLAST section. But I'm not sure if this step that they say " The example code assumes that you used the formatdb program to index the database sequence file "db.fa"" I did it properly. I just downloaded the swissprot db, keep it into /db directory and adapt the script to this. I haven't used this "formatdb" so I don't know if this script is not updated or if it should run anyway. This is what I get: Mercuri:BioPerl Casandra$ perl bp_blast.pl --------------------- WARNING --------------------- MSG: No whitespace allowed in FASTA ID [test query] --------------------------------------------------- --------------------- WARNING --------------------- MSG: No whitespace allowed in FASTA ID [test query] --------------------------------------------------- Use of uninitialized value in join or string at /Library/Perl/Updates/ 5.8.8/darwin-thread-multi-2level/File/Spec/Unix.pm line 86. --------------------- WARNING --------------------- MSG: cannot find path to blastall --------------------------------------------------- Can't call method "next_result" on an undefined value at bp_blast.pl line 13. Just for curiosity, when I change $report_obj = $blast_obj->blastall($seq_obj); to $report_obj = $blast_obj->blast($seq_obj); I get this message instead: Mercuri:BioPerl Casandra$ perl bp_blast.pl Can't locate object method "next_result" via package "Bio::Seq" at bp_blast.pl line 13. Thanks a lot From limericksean at gmail.com Tue Feb 28 16:11:13 2012 From: limericksean at gmail.com (Sean O'Keeffe) Date: Tue, 28 Feb 2012 16:11:13 -0500 Subject: [Bioperl-l] fastq splitter Message-ID: Hi, I'm trying to write a quick script to separate one large PE fastq file into 2 separate files, one for each mate pair The file is of the format (mate1) @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT + BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA && (mate2) @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC + ################################################## My idea is to separate using a regex such that / 1:/ would be the first mate pair and / 2:/ would go in the second mate file. I implemented the code below but each output file is empty. Can someone spot my error? Thanks, Sean. my $infile = shift; my $outfile1 = $infile."_1"; my $outfile2 = $infile."_2"; my $seqin = Bio::SeqIO->new( -file => "<$infile", -format => "fastq", ); my $seqout1 = Bio::SeqIO->new( -file => ">$outfile1", -format => "fastq", ); my $seqout2 = Bio::SeqIO->new( -file => ">$outfile2", -format => "fastq", ); while (my $inseq = $seqin->next_seq) { if ($seqin->desc =~ / 1:/){ $seqout1->write_seq($inseq); } else { $seqout2->write_seq($inseq); } } From kajendiran56 at googlemail.com Tue Feb 28 05:57:03 2012 From: kajendiran56 at googlemail.com (kajendiran mahendiran) Date: Tue, 28 Feb 2012 02:57:03 -0800 (PST) Subject: [Bioperl-l] gerp++ Message-ID: <01356797-31c0-4aae-a7ed-4365d361d8fd@w19g2000vbe.googlegroups.com> Dear All, thank you for taking the time to look at my post. I am trying to use the Bioperl module to analyse gerpelem output. I have used the following code: use Bio::Tools::Run::Phylo::Gerp; my$parser = Bio::Tools::Phylo::Gerp->new(-file => "test.rates.elems"); while (my $feat = $parser->next_result()) { my ($start) = $feat->start(); my ($end) = $feat->end(); my ($rs_score) = $feat->score(); my ($p_value) = ($feat->annotation->get_Annotations('p-value'))[0]- >value(); print $start, "\t", $end, "\t", $rs_score, "\t", $p_value, "\n"; } The file I am trying to read is: region 4850138 4850158 19.72 5.48888e-09 223119 0 0 region 221586 221606 19.72 5.48888e-09 223139 0 0 region 865234 865254 19.72 5.48888e-09 223159 0 0 region 4108337 4108490 42.8171 5.57134e-09 223312 0 0 region 2780908 2781050 42.5108 5.67614e-09 223454 0.1 4.3 region 3603457 3603637 42.9148 5.68931e-09 223634 0.1 4.3 region 995791 995961 42.9233 5.72716e-09 223804 0.1 4.3 region 1250933 1251094 42.8291 5.83297e-09 223965 0.1 4.3 region 4219830 4219941 40.856 5.97931e-09 224076 0.1 4.3 region 712708 712796 38.8442 6.04126e-09 224164 0.1 4.3 region 3170105 3170125 19.482 6.0786e-09 224184 0.1 4.3 region 3297610 3297802 42.563 6.08115e-09 224376 0.1 4.3 region 5296586 5296776 42.5578 6.14513e-09 224566 0.1 4.3 region 1400348 1400508 42.6387 6.30399e-09 224726 0.1 4.3 region 4103587 4103685 39.7477 6.48804e-09 224824 0.1 4.3 region 278040 278082 29.8649 6.50312e-09 224866 0.1 4.3 region 5449691 5449882 42.4149 6.58323e-09 225057 0.1 4.3 region 4435258 4435437 42.384 6.90672e-09 225236 0.1 4.3 The program runs without any errors but there is no print out. It appears that the parser is undefined and I cannot seem to figure out why. I am new to bioinformatics and I would appreciate any help. I am trying to figure out what the columns correspond to in the gerpelems output, the format does not correspond to the format given in the manual for gerp++. There are additional numbers also in the .elems file and I am not sure what they mean. Thank you once again for your help. Kajendiran From mmuratet at hudsonalpha.org Tue Feb 28 16:26:25 2012 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Tue, 28 Feb 2012 15:26:25 -0600 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: Message-ID: <2E3E60E8-370A-4495-8A59-1B62C59B2AF8@hudsonalpha.org> On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote: > Hi, > I'm trying to write a quick script to separate one large PE fastq > file into > 2 separate files, one for each mate pair > > The file is of the format (mate1) > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT > + > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA > > && (mate2) > > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC > + > ################################################## > > > My idea is to separate using a regex such that / 1:/ would be the > first > mate pair and / 2:/ would go in the second mate file. > I implemented the code below but each output file is empty. Can > someone > spot my error? > > Thanks, > Sean. > > my $infile = shift; > my $outfile1 = $infile."_1"; > my $outfile2 = $infile."_2"; > > my $seqin = Bio::SeqIO->new( > -file => "<$infile", > -format => "fastq", > ); > my $seqout1 = Bio::SeqIO->new( > -file => ">$outfile1", > -format => "fastq", > ); > > my $seqout2 = Bio::SeqIO->new( > -file => ">$outfile2", > -format => "fastq", > ); > while (my $inseq = $seqin->next_seq) { > if ($seqin->desc =~ / 1:/){ Hi Sean You're using the desc operator on the stream, not the seq object. Cheers Mike > $seqout1->write_seq($inseq); > } else { > $seqout2->write_seq($inseq); > } > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806 From cjfields at illinois.edu Tue Feb 28 16:50:30 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 28 Feb 2012 21:50:30 +0000 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: Message-ID: <2E9A9DC1-DE74-4EFD-84D1-53FCF44CB0E4@illinois.edu> Sean, If you trust the data enough, in that: 1) each record is 4 lines, 2) mate pairs are consecutive in the file, and 3) that read 1 always preceeds read 2 in the pair, then I would simply iterate through 4 lines at a time and dump to the two separate files, maybe using a flip-flop or simple record count and modulus switch. You can always run a check on the header with a regex if you don't trust it completely. Just from the sanity point-of-view, unless you're doing a lot of validation I wouldn't use Bio::SeqIO::fastq, unless you have some time on your hands and a relatively low number of seqs (it's notoriously slow at the moment). chris On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote: > Hi, > I'm trying to write a quick script to separate one large PE fastq file into > 2 separate files, one for each mate pair > > The file is of the format (mate1) > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT > + > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA > > && (mate2) > > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC > + > ################################################## > > > My idea is to separate using a regex such that / 1:/ would be the first > mate pair and / 2:/ would go in the second mate file. > I implemented the code below but each output file is empty. Can someone > spot my error? > > Thanks, > Sean. > > my $infile = shift; > my $outfile1 = $infile."_1"; > my $outfile2 = $infile."_2"; > > my $seqin = Bio::SeqIO->new( > -file => "<$infile", > -format => "fastq", > ); > my $seqout1 = Bio::SeqIO->new( > -file => ">$outfile1", > -format => "fastq", > ); > > my $seqout2 = Bio::SeqIO->new( > -file => ">$outfile2", > -format => "fastq", > ); > while (my $inseq = $seqin->next_seq) { > if ($seqin->desc =~ / 1:/){ > $seqout1->write_seq($inseq); > } else { > $seqout2->write_seq($inseq); > } > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Feb 28 17:17:47 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 28 Feb 2012 22:17:47 +0000 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E9A9DC1-DE74-4EFD-84D1-53FCF44CB0E4@illinois.edu> Message-ID: That's a bit odd. Are you using an old version of the FASTQ parser? It was revised a while ago, prior to the v1.6.1 release (the error matches one in the older parser) chris On Feb 28, 2012, at 4:01 PM, Sean O'Keeffe wrote: > Hi Chris, > Unfortunately the read pairs are not consecutive. It seems they are cat'd together. > I could use split -l on the line number that they're glued together I guess. > If this is an overnight job for a bunch of files, I can wait so don't mind using the module if it worked. > > Someone pointed out I need to switch $seqin->desc to $inseq->desc. > However, now it spits out fasta output instead of fastq and returns a bunch of warnings: Seq/Qual descriptions don't match; using sequence description > > Hmm. > > On 28 February 2012 16:50, Fields, Christopher J wrote: > Sean, > > If you trust the data enough, in that: > > 1) each record is 4 lines, > 2) mate pairs are consecutive in the file, and > 3) that read 1 always preceeds read 2 in the pair, > > then I would simply iterate through 4 lines at a time and dump to the two separate files, maybe using a flip-flop or simple record count and modulus switch. You can always run a check on the header with a regex if you don't trust it completely. > > Just from the sanity point-of-view, unless you're doing a lot of validation I wouldn't use Bio::SeqIO::fastq, unless you have some time on your hands and a relatively low number of seqs (it's notoriously slow at the moment). > > chris > > On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote: > > > Hi, > > I'm trying to write a quick script to separate one large PE fastq file into > > 2 separate files, one for each mate pair > > > > The file is of the format (mate1) > > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG > > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT > > + > > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA > > > > && (mate2) > > > > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG > > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC > > + > > ################################################## > > > > > > My idea is to separate using a regex such that / 1:/ would be the first > > mate pair and / 2:/ would go in the second mate file. > > I implemented the code below but each output file is empty. Can someone > > spot my error? > > > > Thanks, > > Sean. > > > > my $infile = shift; > > my $outfile1 = $infile."_1"; > > my $outfile2 = $infile."_2"; > > > > my $seqin = Bio::SeqIO->new( > > -file => "<$infile", > > -format => "fastq", > > ); > > my $seqout1 = Bio::SeqIO->new( > > -file => ">$outfile1", > > -format => "fastq", > > ); > > > > my $seqout2 = Bio::SeqIO->new( > > -file => ">$outfile2", > > -format => "fastq", > > ); > > while (my $inseq = $seqin->next_seq) { > > if ($seqin->desc =~ / 1:/){ > > $seqout1->write_seq($inseq); > > } else { > > $seqout2->write_seq($inseq); > > } > > } > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From florent.angly at gmail.com Wed Feb 29 17:33:04 2012 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 01 Mar 2012 08:33:04 +1000 Subject: [Bioperl-l] fastq splitter In-Reply-To: <2E3E60E8-370A-4495-8A59-1B62C59B2AF8@hudsonalpha.org> References: <2E3E60E8-370A-4495-8A59-1B62C59B2AF8@hudsonalpha.org> Message-ID: <4F4EA7A0.9050002@gmail.com> Also, the desc() method returns the part after the whitespace in the FASTA header. Hence, instead of / 1:/, your regular expression should not have the space and should be written /1:/. In fact, it would be even better (faster) it it were written as an anchored regular expression that matches only the beginning of the description, /^1:/ Note that you are apparently using the latest Illumina format, that does not follow previous convention on paired-end read headers. Hence your script will not work properly with non-latest-Illumina paired-end files. Florent On 29/02/12 07:26, Michael Muratet wrote: > > On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote: > >> Hi, >> I'm trying to write a quick script to separate one large PE fastq >> file into >> 2 separate files, one for each mate pair >> >> The file is of the format (mate1) >> @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG >> CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT >> + >> BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA >> >> && (mate2) >> >> @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG >> TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC >> + >> ################################################## >> >> >> My idea is to separate using a regex such that / 1:/ would be the first >> mate pair and / 2:/ would go in the second mate file. >> I implemented the code below but each output file is empty. Can someone >> spot my error? >> >> Thanks, >> Sean. >> >> my $infile = shift; >> my $outfile1 = $infile."_1"; >> my $outfile2 = $infile."_2"; >> >> my $seqin = Bio::SeqIO->new( >> -file => "<$infile", >> -format => "fastq", >> ); >> my $seqout1 = Bio::SeqIO->new( >> -file => ">$outfile1", >> -format => "fastq", >> ); >> >> my $seqout2 = Bio::SeqIO->new( >> -file => ">$outfile2", >> -format => "fastq", >> ); >> while (my $inseq = $seqin->next_seq) { >> if ($seqin->desc =~ / 1:/){ > Hi Sean > > You're using the desc operator on the stream, not the seq object. > > Cheers > > Mike > >> $seqout1->write_seq($inseq); >> } else { >> $seqout2->write_seq($inseq); >> } >> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmuratet at hudsonalpha.org > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From limericksean at gmail.com Tue Feb 28 17:01:18 2012 From: limericksean at gmail.com (Sean O'Keeffe) Date: Tue, 28 Feb 2012 17:01:18 -0500 Subject: [Bioperl-l] fastq splitter In-Reply-To: <2E9A9DC1-DE74-4EFD-84D1-53FCF44CB0E4@illinois.edu> References: <2E9A9DC1-DE74-4EFD-84D1-53FCF44CB0E4@illinois.edu> Message-ID: Hi Chris, Unfortunately the read pairs are not consecutive. It seems they are cat'd together. I could use split -l on the line number that they're glued together I guess. If this is an overnight job for a bunch of files, I can wait so don't mind using the module if it worked. Someone pointed out I need to switch $seqin->desc to $inseq->desc. However, now it spits out fasta output instead of fastq and returns a bunch of warnings: Seq/Qual descriptions don't match; using sequence description Hmm. On 28 February 2012 16:50, Fields, Christopher J wrote: > Sean, > > If you trust the data enough, in that: > > 1) each record is 4 lines, > 2) mate pairs are consecutive in the file, and > 3) that read 1 always preceeds read 2 in the pair, > > then I would simply iterate through 4 lines at a time and dump to the two > separate files, maybe using a flip-flop or simple record count and modulus > switch. You can always run a check on the header with a regex if you don't > trust it completely. > > Just from the sanity point-of-view, unless you're doing a lot of > validation I wouldn't use Bio::SeqIO::fastq, unless you have some time on > your hands and a relatively low number of seqs (it's notoriously slow at > the moment). > > chris > > On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote: > > > Hi, > > I'm trying to write a quick script to separate one large PE fastq file > into > > 2 separate files, one for each mate pair > > > > The file is of the format (mate1) > > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG > > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT > > + > > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA > > > > && (mate2) > > > > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG > > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC > > + > > ################################################## > > > > > > My idea is to separate using a regex such that / 1:/ would be the first > > mate pair and / 2:/ would go in the second mate file. > > I implemented the code below but each output file is empty. Can someone > > spot my error? > > > > Thanks, > > Sean. > > > > my $infile = shift; > > my $outfile1 = $infile."_1"; > > my $outfile2 = $infile."_2"; > > > > my $seqin = Bio::SeqIO->new( > > -file => "<$infile", > > -format => "fastq", > > ); > > my $seqout1 = Bio::SeqIO->new( > > -file => ">$outfile1", > > -format => "fastq", > > ); > > > > my $seqout2 = Bio::SeqIO->new( > > -file => ">$outfile2", > > -format => "fastq", > > ); > > while (my $inseq = $seqin->next_seq) { > > if ($seqin->desc =~ / 1:/){ > > $seqout1->write_seq($inseq); > > } else { > > $seqout2->write_seq($inseq); > > } > > } > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Tue Feb 28 21:40:26 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 29 Feb 2012 02:40:26 +0000 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E9A9DC1-DE74-4EFD-84D1-53FCF44CB0E4@illinois.edu> Message-ID: <5AFAB73B-7284-4B42-9810-646B89C83B20@illinois.edu> That should work. Can you send the output of 'perldoc Bio::SeqIO::fastq'? That should indicate what is being called. chris On Feb 28, 2012, at 5:50 PM, Sean O'Keeffe wrote: > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > 1.006001 > > Isn't that 1.6.1 - does it need upgrading ? > > On 28 February 2012 18:36, Sean O'Keeffe wrote: > Could be. I'll check. > > On 28 February 2012 17:17, Fields, Christopher J wrote: > That's a bit odd. Are you using an old version of the FASTQ parser? It was revised a while ago, prior to the v1.6.1 release (the error matches one in the older parser) > > chris > > On Feb 28, 2012, at 4:01 PM, Sean O'Keeffe wrote: > > > Hi Chris, > > Unfortunately the read pairs are not consecutive. It seems they are cat'd together. > > I could use split -l on the line number that they're glued together I guess. > > If this is an overnight job for a bunch of files, I can wait so don't mind using the module if it worked. > > > > Someone pointed out I need to switch $seqin->desc to $inseq->desc. > > However, now it spits out fasta output instead of fastq and returns a bunch of warnings: Seq/Qual descriptions don't match; using sequence description > > > > Hmm. > > > > On 28 February 2012 16:50, Fields, Christopher J wrote: > > Sean, > > > > If you trust the data enough, in that: > > > > 1) each record is 4 lines, > > 2) mate pairs are consecutive in the file, and > > 3) that read 1 always preceeds read 2 in the pair, > > > > then I would simply iterate through 4 lines at a time and dump to the two separate files, maybe using a flip-flop or simple record count and modulus switch. You can always run a check on the header with a regex if you don't trust it completely. > > > > Just from the sanity point-of-view, unless you're doing a lot of validation I wouldn't use Bio::SeqIO::fastq, unless you have some time on your hands and a relatively low number of seqs (it's notoriously slow at the moment). > > > > chris > > > > On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote: > > > > > Hi, > > > I'm trying to write a quick script to separate one large PE fastq file into > > > 2 separate files, one for each mate pair > > > > > > The file is of the format (mate1) > > > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG > > > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT > > > + > > > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA > > > > > > && (mate2) > > > > > > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG > > > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC > > > + > > > ################################################## > > > > > > > > > My idea is to separate using a regex such that / 1:/ would be the first > > > mate pair and / 2:/ would go in the second mate file. > > > I implemented the code below but each output file is empty. Can someone > > > spot my error? > > > > > > Thanks, > > > Sean. > > > > > > my $infile = shift; > > > my $outfile1 = $infile."_1"; > > > my $outfile2 = $infile."_2"; > > > > > > my $seqin = Bio::SeqIO->new( > > > -file => "<$infile", > > > -format => "fastq", > > > ); > > > my $seqout1 = Bio::SeqIO->new( > > > -file => ">$outfile1", > > > -format => "fastq", > > > ); > > > > > > my $seqout2 = Bio::SeqIO->new( > > > -file => ">$outfile2", > > > -format => "fastq", > > > ); > > > while (my $inseq = $seqin->next_seq) { > > > if ($seqin->desc =~ / 1:/){ > > > $seqout1->write_seq($inseq); > > > } else { > > > $seqout2->write_seq($inseq); > > > } > > > } > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > From cjfields at illinois.edu Tue Feb 28 21:42:27 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 29 Feb 2012 02:42:27 +0000 Subject: [Bioperl-l] fastq splitter In-Reply-To: <4F4EA7A0.9050002@gmail.com> References: <2E3E60E8-370A-4495-8A59-1B62C59B2AF8@hudsonalpha.org> <4F4EA7A0.9050002@gmail.com> Message-ID: Frankly, there never seemed to be a real fixed standard in the way that FASTQ headers were written (and just when it seems there is some consensus, Illumina pulls the rug out from under you), hence the reason I leave it alone. We could add some ID munging in there if needed, would just need a qr// with a standard fallback. chris On Feb 29, 2012, at 4:33 PM, Florent Angly wrote: > Also, the desc() method returns the part after the whitespace in the FASTA header. > Hence, instead of / 1:/, your regular expression should not have the space and should be written /1:/. In fact, it would be even better (faster) it it were written as an anchored regular expression that matches only the beginning of the description, /^1:/ > > Note that you are apparently using the latest Illumina format, that does not follow previous convention on paired-end read headers. Hence your script will not work properly with non-latest-Illumina paired-end files. > > Florent > > > > On 29/02/12 07:26, Michael Muratet wrote: >> >> On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote: >> >>> Hi, >>> I'm trying to write a quick script to separate one large PE fastq file into >>> 2 separate files, one for each mate pair >>> >>> The file is of the format (mate1) >>> @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG >>> CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT >>> + >>> BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA >>> >>> && (mate2) >>> >>> @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG >>> TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC >>> + >>> ################################################## >>> >>> >>> My idea is to separate using a regex such that / 1:/ would be the first >>> mate pair and / 2:/ would go in the second mate file. >>> I implemented the code below but each output file is empty. Can someone >>> spot my error? >>> >>> Thanks, >>> Sean. >>> >>> my $infile = shift; >>> my $outfile1 = $infile."_1"; >>> my $outfile2 = $infile."_2"; >>> >>> my $seqin = Bio::SeqIO->new( >>> -file => "<$infile", >>> -format => "fastq", >>> ); >>> my $seqout1 = Bio::SeqIO->new( >>> -file => ">$outfile1", >>> -format => "fastq", >>> ); >>> >>> my $seqout2 = Bio::SeqIO->new( >>> -file => ">$outfile2", >>> -format => "fastq", >>> ); >>> while (my $inseq = $seqin->next_seq) { >>> if ($seqin->desc =~ / 1:/){ >> Hi Sean >> >> You're using the desc operator on the stream, not the seq object. >> >> Cheers >> >> Mike >> >>> $seqout1->write_seq($inseq); >>> } else { >>> $seqout2->write_seq($inseq); >>> } >>> } >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Michael Muratet, Ph.D. >> Senior Scientist >> HudsonAlpha Institute for Biotechnology >> mmuratet at hudsonalpha.org >> (256) 327-0473 (p) >> (256) 327-0966 (f) >> >> Room 4005 >> 601 Genome Way >> Huntsville, Alabama 35806 >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From limericksean at gmail.com Tue Feb 28 18:36:43 2012 From: limericksean at gmail.com (Sean O'Keeffe) Date: Tue, 28 Feb 2012 18:36:43 -0500 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E9A9DC1-DE74-4EFD-84D1-53FCF44CB0E4@illinois.edu> Message-ID: Could be. I'll check. On 28 February 2012 17:17, Fields, Christopher J wrote: > That's a bit odd. Are you using an old version of the FASTQ parser? It > was revised a while ago, prior to the v1.6.1 release (the error matches one > in the older parser) > > chris > > On Feb 28, 2012, at 4:01 PM, Sean O'Keeffe wrote: > > > Hi Chris, > > Unfortunately the read pairs are not consecutive. It seems they are > cat'd together. > > I could use split -l on the line number that they're glued together I > guess. > > If this is an overnight job for a bunch of files, I can wait so don't > mind using the module if it worked. > > > > Someone pointed out I need to switch $seqin->desc to $inseq->desc. > > However, now it spits out fasta output instead of fastq and returns a > bunch of warnings: Seq/Qual descriptions don't match; using sequence > description > > > > Hmm. > > > > On 28 February 2012 16:50, Fields, Christopher J > wrote: > > Sean, > > > > If you trust the data enough, in that: > > > > 1) each record is 4 lines, > > 2) mate pairs are consecutive in the file, and > > 3) that read 1 always preceeds read 2 in the pair, > > > > then I would simply iterate through 4 lines at a time and dump to the > two separate files, maybe using a flip-flop or simple record count and > modulus switch. You can always run a check on the header with a regex if > you don't trust it completely. > > > > Just from the sanity point-of-view, unless you're doing a lot of > validation I wouldn't use Bio::SeqIO::fastq, unless you have some time on > your hands and a relatively low number of seqs (it's notoriously slow at > the moment). > > > > chris > > > > On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote: > > > > > Hi, > > > I'm trying to write a quick script to separate one large PE fastq file > into > > > 2 separate files, one for each mate pair > > > > > > The file is of the format (mate1) > > > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG > > > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT > > > + > > > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA > > > > > > && (mate2) > > > > > > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG > > > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC > > > + > > > ################################################## > > > > > > > > > My idea is to separate using a regex such that / 1:/ would be the first > > > mate pair and / 2:/ would go in the second mate file. > > > I implemented the code below but each output file is empty. Can someone > > > spot my error? > > > > > > Thanks, > > > Sean. > > > > > > my $infile = shift; > > > my $outfile1 = $infile."_1"; > > > my $outfile2 = $infile."_2"; > > > > > > my $seqin = Bio::SeqIO->new( > > > -file => "<$infile", > > > -format => "fastq", > > > ); > > > my $seqout1 = Bio::SeqIO->new( > > > -file => ">$outfile1", > > > -format => "fastq", > > > ); > > > > > > my $seqout2 = Bio::SeqIO->new( > > > -file => ">$outfile2", > > > -format => "fastq", > > > ); > > > while (my $inseq = $seqin->next_seq) { > > > if ($seqin->desc =~ / 1:/){ > > > $seqout1->write_seq($inseq); > > > } else { > > > $seqout2->write_seq($inseq); > > > } > > > } > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > From limericksean at gmail.com Tue Feb 28 18:50:25 2012 From: limericksean at gmail.com (Sean O'Keeffe) Date: Tue, 28 Feb 2012 18:50:25 -0500 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E9A9DC1-DE74-4EFD-84D1-53FCF44CB0E4@illinois.edu> Message-ID: $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' 1.006001 Isn't that 1.6.1 - does it need upgrading ? On 28 February 2012 18:36, Sean O'Keeffe wrote: > Could be. I'll check. > > On 28 February 2012 17:17, Fields, Christopher J wrote: > >> That's a bit odd. Are you using an old version of the FASTQ parser? It >> was revised a while ago, prior to the v1.6.1 release (the error matches one >> in the older parser) >> >> chris >> >> On Feb 28, 2012, at 4:01 PM, Sean O'Keeffe wrote: >> >> > Hi Chris, >> > Unfortunately the read pairs are not consecutive. It seems they are >> cat'd together. >> > I could use split -l on the line number that they're glued together I >> guess. >> > If this is an overnight job for a bunch of files, I can wait so don't >> mind using the module if it worked. >> > >> > Someone pointed out I need to switch $seqin->desc to $inseq->desc. >> > However, now it spits out fasta output instead of fastq and returns a >> bunch of warnings: Seq/Qual descriptions don't match; using sequence >> description >> > >> > Hmm. >> > >> > On 28 February 2012 16:50, Fields, Christopher J >> wrote: >> > Sean, >> > >> > If you trust the data enough, in that: >> > >> > 1) each record is 4 lines, >> > 2) mate pairs are consecutive in the file, and >> > 3) that read 1 always preceeds read 2 in the pair, >> > >> > then I would simply iterate through 4 lines at a time and dump to the >> two separate files, maybe using a flip-flop or simple record count and >> modulus switch. You can always run a check on the header with a regex if >> you don't trust it completely. >> > >> > Just from the sanity point-of-view, unless you're doing a lot of >> validation I wouldn't use Bio::SeqIO::fastq, unless you have some time on >> your hands and a relatively low number of seqs (it's notoriously slow at >> the moment). >> > >> > chris >> > >> > On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote: >> > >> > > Hi, >> > > I'm trying to write a quick script to separate one large PE fastq >> file into >> > > 2 separate files, one for each mate pair >> > > >> > > The file is of the format (mate1) >> > > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG >> > > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT >> > > + >> > > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA >> > > >> > > && (mate2) >> > > >> > > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG >> > > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC >> > > + >> > > ################################################## >> > > >> > > >> > > My idea is to separate using a regex such that / 1:/ would be the >> first >> > > mate pair and / 2:/ would go in the second mate file. >> > > I implemented the code below but each output file is empty. Can >> someone >> > > spot my error? >> > > >> > > Thanks, >> > > Sean. >> > > >> > > my $infile = shift; >> > > my $outfile1 = $infile."_1"; >> > > my $outfile2 = $infile."_2"; >> > > >> > > my $seqin = Bio::SeqIO->new( >> > > -file => "<$infile", >> > > -format => "fastq", >> > > ); >> > > my $seqout1 = Bio::SeqIO->new( >> > > -file => ">$outfile1", >> > > -format => "fastq", >> > > ); >> > > >> > > my $seqout2 = Bio::SeqIO->new( >> > > -file => ">$outfile2", >> > > -format => "fastq", >> > > ); >> > > while (my $inseq = $seqin->next_seq) { >> > > if ($seqin->desc =~ / 1:/){ >> > > $seqout1->write_seq($inseq); >> > > } else { >> > > $seqout2->write_seq($inseq); >> > > } >> > > } >> > > _______________________________________________ >> > > Bioperl-l mailing list >> > > Bioperl-l at lists.open-bio.org >> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > >> >> > From p.j.a.cock at googlemail.com Wed Feb 29 05:32:20 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 29 Feb 2012 10:32:20 +0000 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E3E60E8-370A-4495-8A59-1B62C59B2AF8@hudsonalpha.org> <4F4EA7A0.9050002@gmail.com> Message-ID: On Wed, Feb 29, 2012 at 2:42 AM, Fields, Christopher J wrote: > Frankly, there never seemed to be a real fixed standard in the way that FASTQ > headers were written (and just when it seems there is some consensus, Illumina > pulls the rug out from under you), hence the reason I leave it alone. ?We could > add some ID munging in there if needed, would just need a qr// with a standard > fallback. > > chris Indeed - just like FASTA, it seems every company/tool/database has its own conventions about the FASTQ ID line and how to stuff as much meta-data into it as possible. This is a major reason why I hope unaligned reads in SAM/BAM takes off - places like the Sanger and Broad use this in their pipelines. http://blastedbio.blogspot.com/2011/10/fastq-must-die-long-live-sambam.html Peter From mmuratet at hudsonalpha.org Wed Feb 29 08:07:48 2012 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Wed, 29 Feb 2012 07:07:48 -0600 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E9A9DC1-DE74-4EFD-84D1-53FCF44CB0E4@illinois.edu> Message-ID: On Feb 28, 2012, at 4:01 PM, Sean O'Keeffe wrote: > Hi Chris, > Unfortunately the read pairs are not consecutive. It seems they are > cat'd > together. > I could use split -l on the line number that they're glued together > I guess. > If this is an overnight job for a bunch of files, I can wait so > don't mind > using the module if it worked. > > Someone pointed out I need to switch $seqin->desc to $inseq->desc. > However, now it spits out fasta output instead of fastq and returns > a bunch > of warnings: Seq/Qual descriptions don't match; using sequence > description Hi Sean Apparently the bioperl parser expects the the 'second' header line, i.e., @first_header sequence +second_header quality_scores to have the same (redundant) identifier. When it encounters a blank line, which is the way the Illumina pipeline writes it out, it warns you. I think you have to explicitly write out the quality scores in fastq format. Cheers Mike > > Hmm. > > On 28 February 2012 16:50, Fields, Christopher J >wrote: > >> Sean, >> >> If you trust the data enough, in that: >> >> 1) each record is 4 lines, >> 2) mate pairs are consecutive in the file, and >> 3) that read 1 always preceeds read 2 in the pair, >> >> then I would simply iterate through 4 lines at a time and dump to >> the two >> separate files, maybe using a flip-flop or simple record count and >> modulus >> switch. You can always run a check on the header with a regex if >> you don't >> trust it completely. >> >> Just from the sanity point-of-view, unless you're doing a lot of >> validation I wouldn't use Bio::SeqIO::fastq, unless you have some >> time on >> your hands and a relatively low number of seqs (it's notoriously >> slow at >> the moment). >> >> chris >> >> On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote: >> >>> Hi, >>> I'm trying to write a quick script to separate one large PE fastq >>> file >> into >>> 2 separate files, one for each mate pair >>> >>> The file is of the format (mate1) >>> @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG >>> CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT >>> + >>> BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA >>> >>> && (mate2) >>> >>> @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG >>> TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC >>> + >>> ################################################## >>> >>> >>> My idea is to separate using a regex such that / 1:/ would be the >>> first >>> mate pair and / 2:/ would go in the second mate file. >>> I implemented the code below but each output file is empty. Can >>> someone >>> spot my error? >>> >>> Thanks, >>> Sean. >>> >>> my $infile = shift; >>> my $outfile1 = $infile."_1"; >>> my $outfile2 = $infile."_2"; >>> >>> my $seqin = Bio::SeqIO->new( >>> -file => "<$infile", >>> -format => "fastq", >>> ); >>> my $seqout1 = Bio::SeqIO->new( >>> -file => ">$outfile1", >>> -format => "fastq", >>> ); >>> >>> my $seqout2 = Bio::SeqIO->new( >>> -file => ">$outfile2", >>> -format => "fastq", >>> ); >>> while (my $inseq = $seqin->next_seq) { >>> if ($seqin->desc =~ / 1:/){ >>> $seqout1->write_seq($inseq); >>> } else { >>> $seqout2->write_seq($inseq); >>> } >>> } >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806 From fs5 at sanger.ac.uk Fri Feb 24 12:28:56 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Fri, 24 Feb 2012 17:28:56 +0000 Subject: [Bioperl-l] BLAST example doesn't work In-Reply-To: <33362540.post@talk.nabble.com> References: <33362540.post@talk.nabble.com> Message-ID: <4F47C8D8.9080302@sanger.ac.uk> it seems that the command line is incorrectly constructed as "blastall run ..." and I guess that's because the StandAloneBlast module didn't see your "program" parameter. try: my $factory = Bio::Tools::Run::StandAloneBlast->new( -program => 'blastn', -database => 'nr', -e => '1e-5' ); (parameter keys should start with "-") Hope that helps Frank On 21/02/12 09:16, Enzyme wrote: > use Bio::Tools::Run::StandAloneBlast; > my $factory = Bio::Tools::Run::StandAloneBlast->new(p => 'blastn', > d => 'nr', > e => '1e-5'); > my $seq = Bio::PrimarySeq->new(-id => 'test1', > -seq => 'AGATCAGTAGATGATAGGGGTAGA'); > my $report = $factory->blastall($seq); # get back a {{PM|Bio::SearchIO}} > report -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Wed Feb 29 09:38:50 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 29 Feb 2012 14:38:50 +0000 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E9A9DC1-DE74-4EFD-84D1-53FCF44CB0E4@illinois.edu> Message-ID: <2DC85C62-F086-41D9-A812-B8BFBBF95517@illinois.edu> On Feb 29, 2012, at 7:07 AM, Michael Muratet wrote: > On Feb 28, 2012, at 4:01 PM, Sean O'Keeffe wrote: > >> Hi Chris, >> Unfortunately the read pairs are not consecutive. It seems they are cat'd >> together. >> I could use split -l on the line number that they're glued together I guess. >> If this is an overnight job for a bunch of files, I can wait so don't mind >> using the module if it worked. >> >> Someone pointed out I need to switch $seqin->desc to $inseq->desc. >> However, now it spits out fasta output instead of fastq and returns a bunch >> of warnings: Seq/Qual descriptions don't match; using sequence description > Hi Sean > Apparently the bioperl parser expects the the 'second' header line, i.e., > > @first_header > sequence > +second_header > quality_scores > > to have the same (redundant) identifier. When it encounters a blank line, which is the way the Illumina pipeline writes it out, it warns you. > > I think you have to explicitly write out the quality scores in fastq format. > > Cheers > > Mike Actually no, that's not true for the latest versions. It was completely refactored in coordination with Peter Cock (Biopython) and the other Bio* toolkits along with EMBOSS to parse a wide range of FASTQ data (including the solexa/illumina variants), and also attempt to catch bad formatting issues. See this pub: http://www.ncbi.nlm.nih.gov/pubmed/20015970 This is one of the primary test examples that passes: @EAS54_6_R1_2_1_413_324 CCCTTCTTGTCTTCAGCGTTTCTCC + ;;3;;;;;;;;;;;;7;;;;;;;88 @EAS54_6_R1_2_1_540_792 TTGGCAGGCCAAGGCCGATGGATCA + ;;;;;;;;;;;7;;;;;-;;;3;83 @EAS54_6_R1_2_1_443_348 GTTGCTTCTGGCGTGGGTGGGGGGG + ;;;;;;;;;;;9;7;;.7;393333 chris From fs5 at sanger.ac.uk Mon Feb 27 04:48:30 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 27 Feb 2012 09:48:30 +0000 Subject: [Bioperl-l] Fate of Bio::Tools::PCRSimulation In-Reply-To: <4F49D4B6.5050301@gmail.com> References: <4F49D4B6.5050301@gmail.com> Message-ID: <4F4B516E.9090407@sanger.ac.uk> Hi Florent, I was recently thiking about something similar when adding delete/insert/ligate methods to Bio:;SeqUtils (now in bioperl-live). I was about to add a PCR method as well and I think it would fit in with that module. Similar to other methods in this module, the PCR method should be able to handle sequence features and annotations, so that the amplicons are fully annotated. I'd be willing to discuss this further and work with yuo on that if you like. Cheers, Frank On 26/02/12 06:44, Florent Angly wrote: > Hi all, > > I am interested in the Bio::Tools::PCRSimulation module. Supposedly it > was added to Bioperl 0.3 and is also mentionned in the Bio::PrimedSeq > module. However, I cannot find in the current Bioperl codebase. Any idea > where it went? > > The reason I am asking is because I have some code to do silico PCR > using regular expressions. I wanted to modularize my code more and make > it into a module for Bioperl. Of course, if there is something similar > in Bioperl already, I need to have a look at it. If there is nothing > similar, what namespace do you suggest to use? > Bio::Tools::AmpliconExtractor? Bio::Tools::AmpliconSearch? > Bio::Tools::InSilicoPCR? > > Thanks, > > Florent > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Wed Feb 29 10:23:54 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 29 Feb 2012 15:23:54 +0000 Subject: [Bioperl-l] Fate of Bio::Tools::PCRSimulation In-Reply-To: <4F4E3EEF.5050506@cam.ac.uk> References: <4F49D4B6.5050301@gmail.com> <4F4E3EEF.5050506@cam.ac.uk> Message-ID: <4325EF60-919F-46EF-91BB-D31160F0B587@illinois.edu> Seems like it was meant to be added at some point but was never committed. Definitely not in the github history for 1.3.x, this commit corresponds to the v1.3.4 tag: https://github.com/bioperl/bioperl-live/tree/0a67fa444eb19a70876017607f70ab72be38755a and it's not there. I agree with Roy, it would be nice to somehow make this a little more generic or pluggable on how it maps primers (maybe with a default pure perl method). I also think this shouldn't be bound to bioperl-live considering our current plans, it would best happen in a separate repo. chris On Feb 29, 2012, at 9:06 AM, Roy Chaudhuri wrote: > The code for Bio::Tools::PCRSimulation can be downloaded as part of this archive: > http://www.salmonella.org/bioperl/primer3_v0.3.tgz > > (There's supposedly a more recent version here: > http://www.salmonella.org/bioperl/nucleotide_analyses.tgz > but that file seems to be truncated). > > I have no idea how much would be salvagable. It seems to just use index to map the primers to the sequence, I guess it would make more sense to at least give the option of something more sophisticated like Primer3, BLAST or even a short read mapper. > > Cheers, > Roy. > > > On 27/02/2012 21:18, Fields, Christopher J wrote: >> On Feb 26, 2012, at 12:44 AM, Florent Angly wrote: >> >>> Hi all, >>> >>> I am interested in the Bio::Tools::PCRSimulation module. Supposedly >>> it was added to Bioperl 0.3 and is also mentionned in the >>> Bio::PrimedSeq module. However, I cannot find in the current >>> Bioperl codebase. Any idea where it went? >> >> No idea; I can't find it anywhere in the code base either, and the >> github repo contains history going back to the original CVS repo. >> You can try contacting the author, possibly. >> >>> The reason I am asking is because I have some code to do silico PCR >>> using regular expressions. I wanted to modularize my code more and >>> make it into a module for Bioperl. Of course, if there is something >>> similar in Bioperl already, I need to have a look at it. If there >>> is nothing similar, what namespace do you suggest to use? >>> Bio::Tools::AmpliconExtractor? Bio::Tools::AmpliconSearch? >>> Bio::Tools::InSilicoPCR? >>> >>> Thanks, >>> >>> Florent >> >> >> Maybe the last (InSilicoPCR). >> >> chris >> >> >> _______________________________________________ Bioperl-l mailing >> list Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Wed Feb 29 10:27:38 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 29 Feb 2012 15:27:38 +0000 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E3E60E8-370A-4495-8A59-1B62C59B2AF8@hudsonalpha.org> <4F4EA7A0.9050002@gmail.com> Message-ID: <3E9B001D-89E5-42B6-835D-96D8CE362AE3@illinois.edu> On Feb 29, 2012, at 4:32 AM, Peter Cock wrote: > On Wed, Feb 29, 2012 at 2:42 AM, Fields, Christopher J > wrote: >> Frankly, there never seemed to be a real fixed standard in the way that FASTQ >> headers were written (and just when it seems there is some consensus, Illumina >> pulls the rug out from under you), hence the reason I leave it alone. We could >> add some ID munging in there if needed, would just need a qr// with a standard >> fallback. >> >> chris > > Indeed - just like FASTA, it seems every company/tool/database has its own > conventions about the FASTQ ID line and how to stuff as much meta-data > into it as possible. This is a major reason why I hope unaligned reads in > SAM/BAM takes off - places like the Sanger and Broad use this in their > pipelines. > > http://blastedbio.blogspot.com/2011/10/fastq-must-die-long-live-sambam.html > > Peter Unaligned BAM makes the most sense. I've also been talking with the HDF5 folks here sporadically, they're still keen on promoting BioHDF (it is pretty fast), though that has cooled considerably. Anyone working directly with CRAM in their pipelines? chris From p.j.a.cock at googlemail.com Wed Feb 29 10:32:55 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 29 Feb 2012 15:32:55 +0000 Subject: [Bioperl-l] fastq splitter In-Reply-To: <3E9B001D-89E5-42B6-835D-96D8CE362AE3@illinois.edu> References: <2E3E60E8-370A-4495-8A59-1B62C59B2AF8@hudsonalpha.org> <4F4EA7A0.9050002@gmail.com> <3E9B001D-89E5-42B6-835D-96D8CE362AE3@illinois.edu> Message-ID: On Wed, Feb 29, 2012 at 3:27 PM, Fields, Christopher J wrote: > On Feb 29, 2012, at 4:32 AM, Peter Cock wrote: > >> On Wed, Feb 29, 2012 at 2:42 AM, Fields, Christopher J >> wrote: >>> Frankly, there never seemed to be a real fixed standard in the way that FASTQ >>> headers were written (and just when it seems there is some consensus, Illumina >>> pulls the rug out from under you), hence the reason I leave it alone. ?We could >>> add some ID munging in there if needed, would just need a qr// with a standard >>> fallback. >>> >>> chris >> >> Indeed - just like FASTA, it seems every company/tool/database has its own >> conventions about the FASTQ ID line and how to stuff as much meta-data >> into it as possible. This is a major reason why I hope unaligned reads in >> SAM/BAM takes off - places like the Sanger and Broad use this in their >> pipelines. >> >> http://blastedbio.blogspot.com/2011/10/fastq-must-die-long-live-sambam.html >> >> Peter > > Unaligned BAM makes the most sense. ?I've also been talking with the > HDF5 folks here sporadically, they're still keen on promoting BioHDF > (it is pretty fast), though that has cooled considerably. > > Anyone working directly with CRAM in their pipelines? > > chris I understand that Sanger are looking at moving their pipelines from BAM to CRAM later this year, but CRAM is still quite new and in flux. Peter From cjfields at illinois.edu Wed Feb 29 10:56:11 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 29 Feb 2012 15:56:11 +0000 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E3E60E8-370A-4495-8A59-1B62C59B2AF8@hudsonalpha.org> <4F4EA7A0.9050002@gmail.com> <3E9B001D-89E5-42B6-835D-96D8CE362AE3@illinois.edu> Message-ID: On Feb 29, 2012, at 9:32 AM, Peter Cock wrote: > On Wed, Feb 29, 2012 at 3:27 PM, Fields, Christopher J > wrote: >> On Feb 29, 2012, at 4:32 AM, Peter Cock wrote: >> >>> On Wed, Feb 29, 2012 at 2:42 AM, Fields, Christopher J >>> wrote: >>>> Frankly, there never seemed to be a real fixed standard in the way that FASTQ >>>> headers were written (and just when it seems there is some consensus, Illumina >>>> pulls the rug out from under you), hence the reason I leave it alone. We could >>>> add some ID munging in there if needed, would just need a qr// with a standard >>>> fallback. >>>> >>>> chris >>> >>> Indeed - just like FASTA, it seems every company/tool/database has its own >>> conventions about the FASTQ ID line and how to stuff as much meta-data >>> into it as possible. This is a major reason why I hope unaligned reads in >>> SAM/BAM takes off - places like the Sanger and Broad use this in their >>> pipelines. >>> >>> http://blastedbio.blogspot.com/2011/10/fastq-must-die-long-live-sambam.html >>> >>> Peter >> >> Unaligned BAM makes the most sense. I've also been talking with the >> HDF5 folks here sporadically, they're still keen on promoting BioHDF >> (it is pretty fast), though that has cooled considerably. >> >> Anyone working directly with CRAM in their pipelines? >> >> chris > > I understand that Sanger are looking at moving their pipelines from BAM to > CRAM later this year, but CRAM is still quite new and in flux. > > Peter Yeah, I wasn't sure how the community outside of Sanger is approaching this. chris From rrc22 at cam.ac.uk Wed Feb 29 10:06:23 2012 From: rrc22 at cam.ac.uk (Roy Chaudhuri) Date: Wed, 29 Feb 2012 15:06:23 +0000 Subject: [Bioperl-l] Fate of Bio::Tools::PCRSimulation In-Reply-To: References: <4F49D4B6.5050301@gmail.com> Message-ID: <4F4E3EEF.5050506@cam.ac.uk> The code for Bio::Tools::PCRSimulation can be downloaded as part of this archive: http://www.salmonella.org/bioperl/primer3_v0.3.tgz (There's supposedly a more recent version here: http://www.salmonella.org/bioperl/nucleotide_analyses.tgz but that file seems to be truncated). I have no idea how much would be salvagable. It seems to just use index to map the primers to the sequence, I guess it would make more sense to at least give the option of something more sophisticated like Primer3, BLAST or even a short read mapper. Cheers, Roy. On 27/02/2012 21:18, Fields, Christopher J wrote: > On Feb 26, 2012, at 12:44 AM, Florent Angly wrote: > >> Hi all, >> >> I am interested in the Bio::Tools::PCRSimulation module. Supposedly >> it was added to Bioperl 0.3 and is also mentionned in the >> Bio::PrimedSeq module. However, I cannot find in the current >> Bioperl codebase. Any idea where it went? > > No idea; I can't find it anywhere in the code base either, and the > github repo contains history going back to the original CVS repo. > You can try contacting the author, possibly. > >> The reason I am asking is because I have some code to do silico PCR >> using regular expressions. I wanted to modularize my code more and >> make it into a module for Bioperl. Of course, if there is something >> similar in Bioperl already, I need to have a look at it. If there >> is nothing similar, what namespace do you suggest to use? >> Bio::Tools::AmpliconExtractor? Bio::Tools::AmpliconSearch? >> Bio::Tools::InSilicoPCR? >> >> Thanks, >> >> Florent > > > Maybe the last (InSilicoPCR). > > chris > > > _______________________________________________ Bioperl-l mailing > list Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From limericksean at gmail.com Wed Feb 29 11:30:05 2012 From: limericksean at gmail.com (Sean O'Keeffe) Date: Wed, 29 Feb 2012 11:30:05 -0500 Subject: [Bioperl-l] fastq splitter In-Reply-To: <5AFAB73B-7284-4B42-9810-646B89C83B20@illinois.edu> References: <2E9A9DC1-DE74-4EFD-84D1-53FCF44CB0E4@illinois.edu> <5AFAB73B-7284-4B42-9810-646B89C83B20@illinois.edu> Message-ID: Hi Chris, Here's the perldoc for fastq - it does seem to indicate that the optional descriptor (+) must match the first header. (See DESCRIPTION). Maybe this version doesn't include the updated code you mention which Peter Cock has worked on. Sean. ====================================== Bio::SeqIO::fastq(3) User Contributed Perl Documentation Bio::SeqIO::fastq(3) NAME Bio::SeqIO::fastq - fastq sequence input/output stream SYNOPSIS ################## pertains to FASTQ parsing only ################## # grabs the FASTQ parser, specifies the Illumina variant my $in = Bio::SeqIO->new(-format => ?fastq-illumina?, -file => ?mydata.fq?); # simple ?fastq? format defaults to ?sanger? variant my $out = Bio::SeqIO->new(-format => ?fastq?, -file => ?>mydata.fq?); # $seq is a Bio::Seq::Quality object while (my $seq = $in->next_seq) { $out->write_seq($seq); # convert Illumina 1.3 to Sanger format } # for 4x faster parsing, one can do something like this for raw data use Bio::Seq::Quality; # $data is a hash reference containing all arguments to be passed to # the Bio::Seq::Quality constructor while (my $data = $in->next_dataset) { # process $data, such as trim, etc my $seq = Bio::Seq::Quality->new(%$data); # for now, write_seq only accepts Bio::Seq::Quality, but may be modified # to allow raw hash references for speed $out->write_seq($data); } DESCRIPTION This object can transform Bio::Seq and Bio::Seq::Quality objects to and from FASTQ flat file databases. FASTQ is a file format used frequently at the Sanger Centre and in next-gen sequencing to bundle a FASTA sequence and its quality data. A typical FASTQ entry takes the from: @HCDPQ1D0501 GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT..... +HCDPQ1D0501 !??*((((***+))%%%++)(%%%%).1***-+*??))**55CCF>>>>>>CCCCCCC65..... where: @ = descriptor, followed by one or more sequence lines + = optional descriptor (if present, must match first one), followed by one or more qual lines FASTQ and Bio::Seq::Quality mapping FASTQ files have sequence and quality data on single line or multiple lines, and the quality values are single-byte encoded. Data are mapped very simply to Bio::Seq::Quality instances: Data Bio::Seq::Quality method ------------------------------------------------------------------------ first non-whitespace chars in descriptor id^ descriptor line desc^ sequence lines seq quality qual* FASTQ variant namespace ^ first nonwhitespace chars are id(), everything else after (to end of line) is in desc() * Converted to PHRED quality scores where applicable (?solexa?) FASTQ variants This parser supports all variants of FASTQ, including Illumina v 1.0 and 1.3: variant note ----------------------------------------------------------- sanger original solexa Solexa, Inc. (2004), aka Illumina 1.0 illumina Illumina 1.3 The variant can be specified by passing by either passing the additional -variant parameter to the constructor: my $in = Bio::SeqIO->new(-format => ?fastq?, -variant => ?solexa?, -file => ?mysol.fq?); or by passing the format and variant together (Bio::SeqIO will now handle this and convert it accordingly to the proper argument): my $in = Bio::SeqIO->new(-format => ?fastq-solexa?, -file => ?mysol.fq?); Variants can be converted back and forth from one another; however, due to the difference in scaling for solexa quality reads, converting from ?illumina? or ?sanger? FASTQ to solexa is not recommended. .... ============================================ On 28 February 2012 21:40, Fields, Christopher J wrote: > That should work. Can you send the output of 'perldoc Bio::SeqIO::fastq'? > That should indicate what is being called. > > chris > > On Feb 28, 2012, at 5:50 PM, Sean O'Keeffe wrote: > > > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > > 1.006001 > > > > Isn't that 1.6.1 - does it need upgrading ? > > > > On 28 February 2012 18:36, Sean O'Keeffe wrote: > > Could be. I'll check. > > > > On 28 February 2012 17:17, Fields, Christopher J > wrote: > > That's a bit odd. Are you using an old version of the FASTQ parser? It > was revised a while ago, prior to the v1.6.1 release (the error matches one > in the older parser) > > > > chris > > > > On Feb 28, 2012, at 4:01 PM, Sean O'Keeffe wrote: > > > > > Hi Chris, > > > Unfortunately the read pairs are not consecutive. It seems they are > cat'd together. > > > I could use split -l on the line number that they're glued together I > guess. > > > If this is an overnight job for a bunch of files, I can wait so don't > mind using the module if it worked. > > > > > > Someone pointed out I need to switch $seqin->desc to $inseq->desc. > > > However, now it spits out fasta output instead of fastq and returns a > bunch of warnings: Seq/Qual descriptions don't match; using sequence > description > > > > > > Hmm. > > > > > > On 28 February 2012 16:50, Fields, Christopher J < > cjfields at illinois.edu> wrote: > > > Sean, > > > > > > If you trust the data enough, in that: > > > > > > 1) each record is 4 lines, > > > 2) mate pairs are consecutive in the file, and > > > 3) that read 1 always preceeds read 2 in the pair, > > > > > > then I would simply iterate through 4 lines at a time and dump to the > two separate files, maybe using a flip-flop or simple record count and > modulus switch. You can always run a check on the header with a regex if > you don't trust it completely. > > > > > > Just from the sanity point-of-view, unless you're doing a lot of > validation I wouldn't use Bio::SeqIO::fastq, unless you have some time on > your hands and a relatively low number of seqs (it's notoriously slow at > the moment). > > > > > > chris > > > > > > On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote: > > > > > > > Hi, > > > > I'm trying to write a quick script to separate one large PE fastq > file into > > > > 2 separate files, one for each mate pair > > > > > > > > The file is of the format (mate1) > > > > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG > > > > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT > > > > + > > > > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA > > > > > > > > && (mate2) > > > > > > > > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG > > > > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC > > > > + > > > > ################################################## > > > > > > > > > > > > My idea is to separate using a regex such that / 1:/ would be the > first > > > > mate pair and / 2:/ would go in the second mate file. > > > > I implemented the code below but each output file is empty. Can > someone > > > > spot my error? > > > > > > > > Thanks, > > > > Sean. > > > > > > > > my $infile = shift; > > > > my $outfile1 = $infile."_1"; > > > > my $outfile2 = $infile."_2"; > > > > > > > > my $seqin = Bio::SeqIO->new( > > > > -file => "<$infile", > > > > -format => "fastq", > > > > ); > > > > my $seqout1 = Bio::SeqIO->new( > > > > -file => ">$outfile1", > > > > -format => "fastq", > > > > ); > > > > > > > > my $seqout2 = Bio::SeqIO->new( > > > > -file => ">$outfile2", > > > > -format => "fastq", > > > > ); > > > > while (my $inseq = $seqin->next_seq) { > > > > if ($seqin->desc =~ / 1:/){ > > > > $seqout1->write_seq($inseq); > > > > } else { > > > > $seqout2->write_seq($inseq); > > > > } > > > > } > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > From p.j.a.cock at googlemail.com Wed Feb 29 11:39:13 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 29 Feb 2012 16:39:13 +0000 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E9A9DC1-DE74-4EFD-84D1-53FCF44CB0E4@illinois.edu> <5AFAB73B-7284-4B42-9810-646B89C83B20@illinois.edu> Message-ID: On Wed, Feb 29, 2012 at 4:30 PM, Sean O'Keeffe wrote: > Hi Chris, > Here's the perldoc for fastq - it does seem to indicate that the optional > descriptor (+) must match the first header. (See DESCRIPTION). i.e. If present, it must match. But the repeated descriptor can (and for space efficiency should) be omitted. As Chris mentioned earlier, there are sample files in the test suite which omit the repeated descriptor so this should be working OK. Peter From cjfields at illinois.edu Wed Feb 29 12:00:39 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 29 Feb 2012 17:00:39 +0000 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E9A9DC1-DE74-4EFD-84D1-53FCF44CB0E4@illinois.edu> <5AFAB73B-7284-4B42-9810-646B89C83B20@illinois.edu> Message-ID: <17CE7C71-FBCD-43D0-99F6-0C5305B0F354@illinois.edu> Key part of that is the second descriptor (e.g. the one for the qual line) is *optional*, otherwise if there is anything present it must match the descriptor for the sequence. My concern is the error you mentioned doesn't exist in that version. The only explanation is the wrong fastq parser version is being used. Is it possible you have two versions of bioperl installed (maybe a system one and a local one)? Or is the script you have invoked directly as an executable, maybe calling a specific version of perl with it's own @INC? chris On Feb 29, 2012, at 10:30 AM, Sean O'Keeffe wrote: > Hi Chris, > Here's the perldoc for fastq - it does seem to indicate that the optional descriptor (+) must match the first header. (See DESCRIPTION). > Maybe this version doesn't include the updated code you mention which Peter Cock has worked on. > > Sean. > > ====================================== > > Bio::SeqIO::fastq(3) User Contributed Perl Documentation Bio::SeqIO::fastq(3) > > NAME > Bio::SeqIO::fastq - fastq sequence input/output stream > > SYNOPSIS > ################## pertains to FASTQ parsing only ################## > > # grabs the FASTQ parser, specifies the Illumina variant > my $in = Bio::SeqIO->new(-format => ?fastq-illumina?, > -file => ?mydata.fq?); > > # simple ?fastq? format defaults to ?sanger? variant > my $out = Bio::SeqIO->new(-format => ?fastq?, > -file => ?>mydata.fq?); > > # $seq is a Bio::Seq::Quality object > while (my $seq = $in->next_seq) { > $out->write_seq($seq); # convert Illumina 1.3 to Sanger format > } > > # for 4x faster parsing, one can do something like this for raw data > use Bio::Seq::Quality; > > # $data is a hash reference containing all arguments to be passed to > # the Bio::Seq::Quality constructor > while (my $data = $in->next_dataset) { > # process $data, such as trim, etc > my $seq = Bio::Seq::Quality->new(%$data); > > # for now, write_seq only accepts Bio::Seq::Quality, but may be modified > # to allow raw hash references for speed > $out->write_seq($data); > } > > DESCRIPTION > This object can transform Bio::Seq and Bio::Seq::Quality objects to and from FASTQ flat file databases. > > FASTQ is a file format used frequently at the Sanger Centre and in next-gen sequencing to bundle a FASTA sequence and its quality data. A typical FASTQ entry takes the from: > > @HCDPQ1D0501 > GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT..... > +HCDPQ1D0501 > !??*((((***+))%%%++)(%%%%).1***-+*??))**55CCF>>>>>>CCCCCCC65..... > > where: > > @ = descriptor, followed by one or more sequence lines > + = optional descriptor (if present, must match first one), followed by one or > more qual lines > > FASTQ and Bio::Seq::Quality mapping > > FASTQ files have sequence and quality data on single line or multiple lines, and the quality values are single-byte encoded. Data are mapped very simply to Bio::Seq::Quality instances: > > Data Bio::Seq::Quality method > ------------------------------------------------------------------------ > first non-whitespace chars in descriptor id^ > descriptor line desc^ > sequence lines seq > quality qual* > FASTQ variant namespace > > ^ first nonwhitespace chars are id(), everything else after (to end of line) > is in desc() > * Converted to PHRED quality scores where applicable (?solexa?) > > FASTQ variants > > This parser supports all variants of FASTQ, including Illumina v 1.0 and 1.3: > > variant note > ----------------------------------------------------------- > sanger original > solexa Solexa, Inc. (2004), aka Illumina 1.0 > illumina Illumina 1.3 > > The variant can be specified by passing by either passing the additional -variant parameter to the constructor: > > my $in = Bio::SeqIO->new(-format => ?fastq?, > -variant => ?solexa?, > -file => ?mysol.fq?); > > or by passing the format and variant together (Bio::SeqIO will now handle this and convert it accordingly to the proper argument): > > my $in = Bio::SeqIO->new(-format => ?fastq-solexa?, > -file => ?mysol.fq?); > > Variants can be converted back and forth from one another; however, due to the difference in scaling for solexa quality reads, converting from ?illumina? or ?sanger? FASTQ to solexa is not recommended. > .... > > ============================================ > > On 28 February 2012 21:40, Fields, Christopher J wrote: > That should work. Can you send the output of 'perldoc Bio::SeqIO::fastq'? That should indicate what is being called. > > chris > > On Feb 28, 2012, at 5:50 PM, Sean O'Keeffe wrote: > > > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > > 1.006001 > > > > Isn't that 1.6.1 - does it need upgrading ? > > > > On 28 February 2012 18:36, Sean O'Keeffe wrote: > > Could be. I'll check. > > > > On 28 February 2012 17:17, Fields, Christopher J wrote: > > That's a bit odd. Are you using an old version of the FASTQ parser? It was revised a while ago, prior to the v1.6.1 release (the error matches one in the older parser) > > > > chris > > > > On Feb 28, 2012, at 4:01 PM, Sean O'Keeffe wrote: > > > > > Hi Chris, > > > Unfortunately the read pairs are not consecutive. It seems they are cat'd together. > > > I could use split -l on the line number that they're glued together I guess. > > > If this is an overnight job for a bunch of files, I can wait so don't mind using the module if it worked. > > > > > > Someone pointed out I need to switch $seqin->desc to $inseq->desc. > > > However, now it spits out fasta output instead of fastq and returns a bunch of warnings: Seq/Qual descriptions don't match; using sequence description > > > > > > Hmm. > > > > > > On 28 February 2012 16:50, Fields, Christopher J wrote: > > > Sean, > > > > > > If you trust the data enough, in that: > > > > > > 1) each record is 4 lines, > > > 2) mate pairs are consecutive in the file, and > > > 3) that read 1 always preceeds read 2 in the pair, > > > > > > then I would simply iterate through 4 lines at a time and dump to the two separate files, maybe using a flip-flop or simple record count and modulus switch. You can always run a check on the header with a regex if you don't trust it completely. > > > > > > Just from the sanity point-of-view, unless you're doing a lot of validation I wouldn't use Bio::SeqIO::fastq, unless you have some time on your hands and a relatively low number of seqs (it's notoriously slow at the moment). > > > > > > chris > > > > > > On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote: > > > > > > > Hi, > > > > I'm trying to write a quick script to separate one large PE fastq file into > > > > 2 separate files, one for each mate pair > > > > > > > > The file is of the format (mate1) > > > > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG > > > > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT > > > > + > > > > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA > > > > > > > > && (mate2) > > > > > > > > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG > > > > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC > > > > + > > > > ################################################## > > > > > > > > > > > > My idea is to separate using a regex such that / 1:/ would be the first > > > > mate pair and / 2:/ would go in the second mate file. > > > > I implemented the code below but each output file is empty. Can someone > > > > spot my error? > > > > > > > > Thanks, > > > > Sean. > > > > > > > > my $infile = shift; > > > > my $outfile1 = $infile."_1"; > > > > my $outfile2 = $infile."_2"; > > > > > > > > my $seqin = Bio::SeqIO->new( > > > > -file => "<$infile", > > > > -format => "fastq", > > > > ); > > > > my $seqout1 = Bio::SeqIO->new( > > > > -file => ">$outfile1", > > > > -format => "fastq", > > > > ); > > > > > > > > my $seqout2 = Bio::SeqIO->new( > > > > -file => ">$outfile2", > > > > -format => "fastq", > > > > ); > > > > while (my $inseq = $seqin->next_seq) { > > > > if ($seqin->desc =~ / 1:/){ > > > > $seqout1->write_seq($inseq); > > > > } else { > > > > $seqout2->write_seq($inseq); > > > > } > > > > } > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > From p.j.a.cock at googlemail.com Wed Feb 29 12:03:03 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 29 Feb 2012 17:03:03 +0000 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E9A9DC1-DE74-4EFD-84D1-53FCF44CB0E4@illinois.edu> <5AFAB73B-7284-4B42-9810-646B89C83B20@illinois.edu> Message-ID: On Wed, Feb 29, 2012 at 4:56 PM, Sean O'Keeffe wrote: > But wouldn't that result in a 3 line fastq output line which would screw up > other programs expecting 4 fastq lines? - e.g. bowtie. No, I mean the third line is just the plus character and the new line. Again, I refer you to the example Chris quoted earlier: http://lists.open-bio.org/pipermail/bioperl-l/2012-February/036271.html Peter From cjfields at illinois.edu Wed Feb 29 12:05:57 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 29 Feb 2012 17:05:57 +0000 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E9A9DC1-DE74-4EFD-84D1-53FCF44CB0E4@illinois.edu> <5AFAB73B-7284-4B42-9810-646B89C83B20@illinois.edu> Message-ID: No, the output by default leaves off the optional descriptor: [cjfields at pyrimidine]$ cat test.pl #!/usr/bin/env perl use strict; use warnings; use Bio::SeqIO; my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'fastq'); my $out = Bio::SeqIO->new(-fh => \*STDOUT, -format => 'fastq'); while (my $seq = $in->next_seq) {$out->write_seq($seq)}; [cjfields at pyrimidine]$ perl test.pl < example.fastq @EAS54_6_R1_2_1_413_324 CCCTTCTTGTCTTCAGCGTTTCTCC + ;;3;;;;;;;;;;;;7;;;;;;;88 @EAS54_6_R1_2_1_540_792 TTGGCAGGCCAAGGCCGATGGATCA + ;;;;;;;;;;;7;;;;;-;;;3;83 @EAS54_6_R1_2_1_443_348 GTTGCTTCTGGCGTGGGTGGGGGGG + ;;;;;;;;;;;9;7;;.7;393333 chris On Feb 29, 2012, at 10:56 AM, Sean O'Keeffe wrote: > But wouldn't that result in a 3 line fastq output line which would screw up other programs expecting 4 fastq lines? - e.g. bowtie. > > On 29 February 2012 11:39, Peter Cock wrote: > On Wed, Feb 29, 2012 at 4:30 PM, Sean O'Keeffe wrote: > > Hi Chris, > > Here's the perldoc for fastq - it does seem to indicate that the optional > > descriptor (+) must match the first header. (See DESCRIPTION). > > i.e. If present, it must match. But the repeated descriptor can > (and for space efficiency should) be omitted. > > As Chris mentioned earlier, there are sample files in the test suite > which omit the repeated descriptor so this should be working OK. > > Peter > From cjfields at illinois.edu Wed Feb 29 12:13:03 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 29 Feb 2012 17:13:03 +0000 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: Message-ID: Sean, To follow up just in case it was a bug, tested with your seq examples and they also work, so my guess is something else is wrong locally. [cjfields at pyrimidine-laptop sean]$ perl test.pl < example2.fastq @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT + BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC + ################################################## chris On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote: > Hi, > I'm trying to write a quick script to separate one large PE fastq file into > 2 separate files, one for each mate pair > > The file is of the format (mate1) > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT > + > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA > > && (mate2) > > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC > + > ################################################## > > > My idea is to separate using a regex such that / 1:/ would be the first > mate pair and / 2:/ would go in the second mate file. > I implemented the code below but each output file is empty. Can someone > spot my error? > > Thanks, > Sean. > > my $infile = shift; > my $outfile1 = $infile."_1"; > my $outfile2 = $infile."_2"; > > my $seqin = Bio::SeqIO->new( > -file => "<$infile", > -format => "fastq", > ); > my $seqout1 = Bio::SeqIO->new( > -file => ">$outfile1", > -format => "fastq", > ); > > my $seqout2 = Bio::SeqIO->new( > -file => ">$outfile2", > -format => "fastq", > ); > while (my $inseq = $seqin->next_seq) { > if ($seqin->desc =~ / 1:/){ > $seqout1->write_seq($inseq); > } else { > $seqout2->write_seq($inseq); > } > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From limericksean at gmail.com Wed Feb 29 11:56:44 2012 From: limericksean at gmail.com (Sean O'Keeffe) Date: Wed, 29 Feb 2012 11:56:44 -0500 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E9A9DC1-DE74-4EFD-84D1-53FCF44CB0E4@illinois.edu> <5AFAB73B-7284-4B42-9810-646B89C83B20@illinois.edu> Message-ID: But wouldn't that result in a 3 line fastq output line which would screw up other programs expecting 4 fastq lines? - e.g. bowtie. On 29 February 2012 11:39, Peter Cock wrote: > On Wed, Feb 29, 2012 at 4:30 PM, Sean O'Keeffe > wrote: > > Hi Chris, > > Here's the perldoc for fastq - it does seem to indicate that the optional > > descriptor (+) must match the first header. (See DESCRIPTION). > > i.e. If present, it must match. But the repeated descriptor can > (and for space efficiency should) be omitted. > > As Chris mentioned earlier, there are sample files in the test suite > which omit the repeated descriptor so this should be working OK. > > Peter > From limericksean at gmail.com Wed Feb 29 12:33:01 2012 From: limericksean at gmail.com (Sean O'Keeffe) Date: Wed, 29 Feb 2012 12:33:01 -0500 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: Message-ID: Yes. I ran my script on a cluster which may have had bioperl installed, not sure. Running it locally = success. Thanks all! On 29 February 2012 12:13, Fields, Christopher J wrote: > Sean, > > To follow up just in case it was a bug, tested with your seq examples and > they also work, so my guess is something else is wrong locally. > > [cjfields at pyrimidine-laptop sean]$ perl test.pl < example2.fastq > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT > + > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC > + > ################################################## > > chris > > On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote: > > > Hi, > > I'm trying to write a quick script to separate one large PE fastq file > into > > 2 separate files, one for each mate pair > > > > The file is of the format (mate1) > > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG > > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT > > + > > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA > > > > && (mate2) > > > > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG > > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC > > + > > ################################################## > > > > > > My idea is to separate using a regex such that / 1:/ would be the first > > mate pair and / 2:/ would go in the second mate file. > > I implemented the code below but each output file is empty. Can someone > > spot my error? > > > > Thanks, > > Sean. > > > > my $infile = shift; > > my $outfile1 = $infile."_1"; > > my $outfile2 = $infile."_2"; > > > > my $seqin = Bio::SeqIO->new( > > -file => "<$infile", > > -format => "fastq", > > ); > > my $seqout1 = Bio::SeqIO->new( > > -file => ">$outfile1", > > -format => "fastq", > > ); > > > > my $seqout2 = Bio::SeqIO->new( > > -file => ">$outfile2", > > -format => "fastq", > > ); > > while (my $inseq = $seqin->next_seq) { > > if ($seqin->desc =~ / 1:/){ > > $seqout1->write_seq($inseq); > > } else { > > $seqout2->write_seq($inseq); > > } > > } > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From thomas.sharpton at gmail.com Wed Feb 29 13:11:39 2012 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 29 Feb 2012 10:11:39 -0800 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: Message-ID: This was an interesting thread to follow (I'm about to dive into Illimina data). Glad you found the cause of the problem, Sean. FYI - you may already know this trick, but when I work on a cluster, my first command in my submission script is to always source my bash profile (.profile, .bashrc, etc. depending on your setup). This way you can control the structure of the PERL5LIB variable (among others) on the slave nodes and ensure your local perl modules are preferentially called. Of course there are other solutions to this problem too. Best, Tom On Feb 29, 2012 9:38 AM, "Sean O'Keeffe" wrote: > Yes. I ran my script on a cluster which may have had bioperl installed, not > sure. > Running it locally = success. > > Thanks all! > > > > On 29 February 2012 12:13, Fields, Christopher J >wrote: > > > Sean, > > > > To follow up just in case it was a bug, tested with your seq examples and > > they also work, so my guess is something else is wrong locally. > > > > [cjfields at pyrimidine-laptop sean]$ perl test.pl < example2.fastq > > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG > > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT > > + > > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA > > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG > > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC > > + > > ################################################## > > > > chris > > > > On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote: > > > > > Hi, > > > I'm trying to write a quick script to separate one large PE fastq file > > into > > > 2 separate files, one for each mate pair > > > > > > The file is of the format (mate1) > > > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG > > > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT > > > + > > > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA > > > > > > && (mate2) > > > > > > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG > > > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC > > > + > > > ################################################## > > > > > > > > > My idea is to separate using a regex such that / 1:/ would be the first > > > mate pair and / 2:/ would go in the second mate file. > > > I implemented the code below but each output file is empty. Can someone > > > spot my error? > > > > > > Thanks, > > > Sean. > > > > > > my $infile = shift; > > > my $outfile1 = $infile."_1"; > > > my $outfile2 = $infile."_2"; > > > > > > my $seqin = Bio::SeqIO->new( > > > -file => "<$infile", > > > -format => "fastq", > > > ); > > > my $seqout1 = Bio::SeqIO->new( > > > -file => ">$outfile1", > > > -format => "fastq", > > > ); > > > > > > my $seqout2 = Bio::SeqIO->new( > > > -file => ">$outfile2", > > > -format => "fastq", > > > ); > > > while (my $inseq = $seqin->next_seq) { > > > if ($seqin->desc =~ / 1:/){ > > > $seqout1->write_seq($inseq); > > > } else { > > > $seqout2->write_seq($inseq); > > > } > > > } > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Wed Feb 29 13:23:27 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 29 Feb 2012 18:23:27 +0000 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: Message-ID: Just want to say, if you can set up a local perl and local::lib it makes your life a LOT easier. Particularly if you are running jobs on older versions of RHEL, which notoriously stuck with outdated/broken versions of perl (as well as other tools). chris On Feb 29, 2012, at 12:11 PM, Thomas Sharpton wrote: > This was an interesting thread to follow (I'm about to dive into Illimina data). Glad you found the cause of the problem, Sean. > > FYI - you may already know this trick, but when I work on a cluster, my first command in my submission script is to always source my bash profile (.profile, .bashrc, etc. depending on your setup). > > This way you can control the structure of the PERL5LIB variable (among others) on the slave nodes and ensure your local perl modules are preferentially called. > > Of course there are other solutions to this problem too. > > Best, > Tom > > On Feb 29, 2012 9:38 AM, "Sean O'Keeffe" wrote: > Yes. I ran my script on a cluster which may have had bioperl installed, not > sure. > Running it locally = success. > > Thanks all! > > > > On 29 February 2012 12:13, Fields, Christopher J wrote: > > > Sean, > > > > To follow up just in case it was a bug, tested with your seq examples and > > they also work, so my guess is something else is wrong locally. > > > > [cjfields at pyrimidine-laptop sean]$ perl test.pl < example2.fastq > > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG > > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT > > + > > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA > > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG > > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC > > + > > ################################################## > > > > chris > > > > On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote: > > > > > Hi, > > > I'm trying to write a quick script to separate one large PE fastq file > > into > > > 2 separate files, one for each mate pair > > > > > > The file is of the format (mate1) > > > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG > > > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT > > > + > > > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA > > > > > > && (mate2) > > > > > > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG > > > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC > > > + > > > ################################################## > > > > > > > > > My idea is to separate using a regex such that / 1:/ would be the first > > > mate pair and / 2:/ would go in the second mate file. > > > I implemented the code below but each output file is empty. Can someone > > > spot my error? > > > > > > Thanks, > > > Sean. > > > > > > my $infile = shift; > > > my $outfile1 = $infile."_1"; > > > my $outfile2 = $infile."_2"; > > > > > > my $seqin = Bio::SeqIO->new( > > > -file => "<$infile", > > > -format => "fastq", > > > ); > > > my $seqout1 = Bio::SeqIO->new( > > > -file => ">$outfile1", > > > -format => "fastq", > > > ); > > > > > > my $seqout2 = Bio::SeqIO->new( > > > -file => ">$outfile2", > > > -format => "fastq", > > > ); > > > while (my $inseq = $seqin->next_seq) { > > > if ($seqin->desc =~ / 1:/){ > > > $seqout1->write_seq($inseq); > > > } else { > > > $seqout2->write_seq($inseq); > > > } > > > } > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Scott.Markel at accelrys.com Wed Feb 29 13:25:05 2012 From: Scott.Markel at accelrys.com (Scott Markel) Date: Wed, 29 Feb 2012 10:25:05 -0800 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E9A9DC1-DE74-4EFD-84D1-53FCF44CB0E4@illinois.edu> <5AFAB73B-7284-4B42-9810-646B89C83B20@illinois.edu> Message-ID: <5ACBA19439E77B43A06F4CAB897EC97704652DE6F5@EXCH1-COLO.accelrys.net> The leading character ('+') of the optional descriptor line is still written, just not the rest of the line. The line count shouldn't change. Scott -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sean O'Keeffe Sent: Wednesday, 29 February 29 2012 8:57 AM To: Peter Cock Cc: Chris Fields; Subject: Re: [Bioperl-l] fastq splitter But wouldn't that result in a 3 line fastq output line which would screw up other programs expecting 4 fastq lines? - e.g. bowtie. On 29 February 2012 11:39, Peter Cock wrote: > On Wed, Feb 29, 2012 at 4:30 PM, Sean O'Keeffe > wrote: > > Hi Chris, > > Here's the perldoc for fastq - it does seem to indicate that the optional > > descriptor (+) must match the first header. (See DESCRIPTION). > > i.e. If present, it must match. But the repeated descriptor can > (and for space efficiency should) be omitted. > > As Chris mentioned earlier, there are sample files in the test suite > which omit the repeated descriptor so this should be working OK. > > Peter > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Wed Feb 29 14:52:39 2012 From: hartzell at alerce.com (George Hartzell) Date: Wed, 29 Feb 2012 11:52:39 -0800 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: Message-ID: <20302.33287.509248.407270@gargle.gargle.HOWL> Fields, Christopher J writes: > Just want to say, if you can set up a local perl and local::lib it > makes your life a LOT easier. Particularly if you are running jobs > on older versions of RHEL, which notoriously stuck with > outdated/broken versions of perl (as well as other tools). > [...] And Perlbrew takes away your last excuse for not building perls and setting up local::lib's. http://perlbrew.pl/ g. From cjfields at illinois.edu Wed Feb 29 21:30:38 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 1 Mar 2012 02:30:38 +0000 Subject: [Bioperl-l] Fate of Bio::Tools::PCRSimulation In-Reply-To: <4F501063.4010109@gmail.com> References: <4F49D4B6.5050301@gmail.com> <4F4E3EEF.5050506@cam.ac.uk> <4325EF60-919F-46EF-91BB-D31160F0B587@illinois.edu> <4F501063.4010109@gmail.com> Message-ID: There are a number of very good reasons to separate out common code and create new repos for new code. The problem about adding new code into core is it ties your code development to bioperl-live's release cycle and versioning. Also, what I (and others) would not like to see is any additional dependencies introduced, but a separate release allows you to (1) both add a dependency w/o affecting core, and (2) make it required, so no fiddling with checking for the module prior to running tests on it. As an example, I can easily see something like Bio::SearchIO::blastxml living on it's own since it has a set of outside dependencies. BTW, separation of modules into separate distributions (even single modules) based on functionality above and beyond that defined in a core is very common in the perl world. Beyond the obvious example of anything non-core in perl (all installable via CPAN), Moose, Dist::Zilla, Catalyst, Dancer, etc all have separately installable dists that layer additional functionality and have a separate maintenance path. chris On Mar 1, 2012, at 6:12 PM, Florent Angly wrote: > Thanks for everybody's feedback. > > I am looking at existing modules to hold template sequence, amplicon sequence and primer information. There is the Bio::SeqFeature::Primer and Bio::Seq::PrimedSeq. At the moment the PrimedSeq object places Primer objects on the target sequence. I have been looking at refreshing these modules (they are quite old), add some sanity to them and make sure they are suitable for a generic implementation of PCR (or amplicon search, which I find a more suitable name since it is a far cry from simulating PCR cycles, etc). > > I will make a remote branch today to make it easier for interested parties to experiment and contribute. > > As you can see Chris, the amplicon search feature would use two existing bioperl-live modules and only add one, tentatively in the Bio::Tools::AmpliconSearch namespace. I am not convinced that this warrants a separate distro. > > Florent > > On 01/03/12 01:23, Fields, Christopher J wrote: >> Seems like it was meant to be added at some point but was never committed. Definitely not in the github history for 1.3.x, this commit corresponds to the v1.3.4 tag: >> >> https://github.com/bioperl/bioperl-live/tree/0a67fa444eb19a70876017607f70ab72be38755a >> >> and it's not there. >> >> I agree with Roy, it would be nice to somehow make this a little more generic or pluggable on how it maps primers (maybe with a default pure perl method). I also think this shouldn't be bound to bioperl-live considering our current plans, it would best happen in a separate repo. >> >> chris >> >> On Feb 29, 2012, at 9:06 AM, Roy Chaudhuri wrote: >> >>> The code for Bio::Tools::PCRSimulation can be downloaded as part of this archive: >>> http://www.salmonella.org/bioperl/primer3_v0.3.tgz >>> >>> (There's supposedly a more recent version here: >>> http://www.salmonella.org/bioperl/nucleotide_analyses.tgz >>> but that file seems to be truncated). >>> >>> I have no idea how much would be salvagable. It seems to just use index to map the primers to the sequence, I guess it would make more sense to at least give the option of something more sophisticated like Primer3, BLAST or even a short read mapper. >>> >>> Cheers, >>> Roy. >>> >>> >>> On 27/02/2012 21:18, Fields, Christopher J wrote: >>>> On Feb 26, 2012, at 12:44 AM, Florent Angly wrote: >>>> >>>>> Hi all, >>>>> >>>>> I am interested in the Bio::Tools::PCRSimulation module. Supposedly >>>>> it was added to Bioperl 0.3 and is also mentionned in the >>>>> Bio::PrimedSeq module. However, I cannot find in the current >>>>> Bioperl codebase. Any idea where it went? >>>> No idea; I can't find it anywhere in the code base either, and the >>>> github repo contains history going back to the original CVS repo. >>>> You can try contacting the author, possibly. >>>> >>>>> The reason I am asking is because I have some code to do silico PCR >>>>> using regular expressions. I wanted to modularize my code more and >>>>> make it into a module for Bioperl. Of course, if there is something >>>>> similar in Bioperl already, I need to have a look at it. If there >>>>> is nothing similar, what namespace do you suggest to use? >>>>> Bio::Tools::AmpliconExtractor? Bio::Tools::AmpliconSearch? >>>>> Bio::Tools::InSilicoPCR? >>>>> >>>>> Thanks, >>>>> >>>>> Florent >>>> >>>> Maybe the last (InSilicoPCR). >>>> >>>> chris >>>> >>>> >>>> _______________________________________________ Bioperl-l mailing >>>> list Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Wed Feb 29 21:42:06 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 1 Mar 2012 02:42:06 +0000 Subject: [Bioperl-l] Fate of Bio::Tools::PCRSimulation In-Reply-To: References: <4F49D4B6.5050301@gmail.com> <4F4E3EEF.5050506@cam.ac.uk> <4325EF60-919F-46EF-91BB-D31160F0B587@illinois.edu> <4F501063.4010109@gmail.com> Message-ID: <96E7CE58-0657-4194-A906-83022348F84A@illinois.edu> Florent, Just want to add, my previous response isn't meant as an admonishment, hope it didn't come across that way, but sometimes email makes it hard to discern the difference. I simply meant to demonstrate my opinion that I find releasing one's code is much simpler (e.g. you can decide the rules and dictate when the code is ready for release), and if we can make getting good code into user's hands easier, more flexible, and more consistent I think that is always a better path. chris On Feb 29, 2012, at 8:30 PM, Fields, Christopher J wrote: > There are a number of very good reasons to separate out common code and create new repos for new code. The problem about adding new code into core is it ties your code development to bioperl-live's release cycle and versioning. Also, what I (and others) would not like to see is any additional dependencies introduced, but a separate release allows you to (1) both add a dependency w/o affecting core, and (2) make it required, so no fiddling with checking for the module prior to running tests on it. > > As an example, I can easily see something like Bio::SearchIO::blastxml living on it's own since it has a set of outside dependencies. > > BTW, separation of modules into separate distributions (even single modules) based on functionality above and beyond that defined in a core is very common in the perl world. Beyond the obvious example of anything non-core in perl (all installable via CPAN), Moose, Dist::Zilla, Catalyst, Dancer, etc all have separately installable dists that layer additional functionality and have a separate maintenance path. > > chris > > On Mar 1, 2012, at 6:12 PM, Florent Angly wrote: > >> Thanks for everybody's feedback. >> >> I am looking at existing modules to hold template sequence, amplicon sequence and primer information. There is the Bio::SeqFeature::Primer and Bio::Seq::PrimedSeq. At the moment the PrimedSeq object places Primer objects on the target sequence. I have been looking at refreshing these modules (they are quite old), add some sanity to them and make sure they are suitable for a generic implementation of PCR (or amplicon search, which I find a more suitable name since it is a far cry from simulating PCR cycles, etc). >> >> I will make a remote branch today to make it easier for interested parties to experiment and contribute. >> >> As you can see Chris, the amplicon search feature would use two existing bioperl-live modules and only add one, tentatively in the Bio::Tools::AmpliconSearch namespace. I am not convinced that this warrants a separate distro. >> >> Florent >> >> On 01/03/12 01:23, Fields, Christopher J wrote: >>> Seems like it was meant to be added at some point but was never committed. Definitely not in the github history for 1.3.x, this commit corresponds to the v1.3.4 tag: >>> >>> https://github.com/bioperl/bioperl-live/tree/0a67fa444eb19a70876017607f70ab72be38755a >>> >>> and it's not there. >>> >>> I agree with Roy, it would be nice to somehow make this a little more generic or pluggable on how it maps primers (maybe with a default pure perl method). I also think this shouldn't be bound to bioperl-live considering our current plans, it would best happen in a separate repo. >>> >>> chris >>> >>> On Feb 29, 2012, at 9:06 AM, Roy Chaudhuri wrote: >>> >>>> The code for Bio::Tools::PCRSimulation can be downloaded as part of this archive: >>>> http://www.salmonella.org/bioperl/primer3_v0.3.tgz >>>> >>>> (There's supposedly a more recent version here: >>>> http://www.salmonella.org/bioperl/nucleotide_analyses.tgz >>>> but that file seems to be truncated). >>>> >>>> I have no idea how much would be salvagable. It seems to just use index to map the primers to the sequence, I guess it would make more sense to at least give the option of something more sophisticated like Primer3, BLAST or even a short read mapper. >>>> >>>> Cheers, >>>> Roy. >>>> >>>> >>>> On 27/02/2012 21:18, Fields, Christopher J wrote: >>>>> On Feb 26, 2012, at 12:44 AM, Florent Angly wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I am interested in the Bio::Tools::PCRSimulation module. Supposedly >>>>>> it was added to Bioperl 0.3 and is also mentionned in the >>>>>> Bio::PrimedSeq module. However, I cannot find in the current >>>>>> Bioperl codebase. Any idea where it went? >>>>> No idea; I can't find it anywhere in the code base either, and the >>>>> github repo contains history going back to the original CVS repo. >>>>> You can try contacting the author, possibly. >>>>> >>>>>> The reason I am asking is because I have some code to do silico PCR >>>>>> using regular expressions. I wanted to modularize my code more and >>>>>> make it into a module for Bioperl. Of course, if there is something >>>>>> similar in Bioperl already, I need to have a look at it. If there >>>>>> is nothing similar, what namespace do you suggest to use? >>>>>> Bio::Tools::AmpliconExtractor? Bio::Tools::AmpliconSearch? >>>>>> Bio::Tools::InSilicoPCR? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Florent >>>>> >>>>> Maybe the last (InSilicoPCR). >>>>> >>>>> chris >>>>> >>>>> >>>>> _______________________________________________ Bioperl-l mailing >>>>> list Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l