From skirov at utk.edu Sat Oct 1 12:50:02 2005 From: skirov at utk.edu (Stefan Kirov) Date: Sat Oct 1 12:49:57 2005 Subject: [Bioperl-l] Off topic: 'Intelligent Design' in schools in US Message-ID: <433EBE3A.6080504@utk.edu> This is off topic, but some people may want to sign... http://shovelbums.org/index.php?option=com_mospetition&Itemid=506 From hlapp at gmx.net Sat Oct 1 15:48:26 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Oct 1 15:48:34 2005 Subject: [Bioperl-l] Re: entrezgene binary ASN In-Reply-To: References: <432B16DC.6040901@utk.edu> <433006AE.5040406@utk.edu> <433D6686.3000301@gpc-biotech.com> <433D8EF5.7000003@gpc-biotech.com> Message-ID: <12f0b79067078d5a5d8b2686f5dffe75@gmx.net> I've tried to listen in on the exchange but I'm not sure I understand what the issue is. I.e., does the parser need to seek in the stream? If yes, then piping won't do any good if it works at all. If no, then the parser should be perfectly fine with the filename being output from a pipe, and possibly accept a file handle in substitution too. In that case, the caller can pipe the actual input through any commands he/she wishes by simply passing the piped command(s) (e.g. as in "gzip -d -c file.asn.gz|gene2xml|"). The parser doing this auto-magically isn't necessary and doesn't save a caller that much. Instead, it exposes the parser to liabilities like path of gzip, path of gene2xml and similar stuff which may not be identical on all platforms. What am I missing? -hilmar On Sep 30, 2005, at 12:55 PM, Michael Seewald wrote: > On 9/30/05, Mingyi Liu wrote: >> >> I didn't say indexing would break, but the performance of retrieval >> would be horrible. That's why in most situations there's no need to >> use >> pipe - after all, any one who needs to use index & ID-based retrieval >> would convert the binary ASN to text file anyway (using a script, >> hopefully). > > > Absolutely right, seeking would be horrible. > > Best wishes, > Michael > > PS: And thanks for providing this great parser!! > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From skirov at utk.edu Sat Oct 1 16:01:04 2005 From: skirov at utk.edu (Stefan Kirov) Date: Sat Oct 1 16:01:12 2005 Subject: [Bioperl-l] Re: entrezgene binary ASN In-Reply-To: <12f0b79067078d5a5d8b2686f5dffe75@gmx.net> References: <432B16DC.6040901@utk.edu> <433006AE.5040406@utk.edu> <433D6686.3000301@gpc-biotech.com> <433D8EF5.7000003@gpc-biotech.com> <12f0b79067078d5a5d8b2686f5dffe75@gmx.net> Message-ID: <433EEB00.4050800@utk.edu> Hilmar, As of now the parser does not seek through the streem, but hopefully it will as soon as I can sit down and do it (by the way it is weird but gene2xml will not parse the gunzipped file, so you should not use gzip -d). I don't think you are missing anything as far as I can tell. Stefan Hilmar Lapp wrote: > I've tried to listen in on the exchange but I'm not sure I understand > what the issue is. > > I.e., does the parser need to seek in the stream? If yes, then piping > won't do any good if it works at all. If no, then the parser should be > perfectly fine with the filename being output from a pipe, and > possibly accept a file handle in substitution too. In that case, the > caller can pipe the actual input through any commands he/she wishes by > simply passing the piped command(s) (e.g. as in "gzip -d -c > file.asn.gz|gene2xml|"). > > The parser doing this auto-magically isn't necessary and doesn't save > a caller that much. Instead, it exposes the parser to liabilities like > path of gzip, path of gene2xml and similar stuff which may not be > identical on all platforms. > > What am I missing? > > -hilmar > > On Sep 30, 2005, at 12:55 PM, Michael Seewald wrote: > >> On 9/30/05, Mingyi Liu wrote: >> >>> >>> I didn't say indexing would break, but the performance of retrieval >>> would be horrible. That's why in most situations there's no need to use >>> pipe - after all, any one who needs to use index & ID-based retrieval >>> would convert the binary ASN to text file anyway (using a script, >>> hopefully). >> >> >> >> Absolutely right, seeking would be horrible. >> >> Best wishes, >> Michael >> >> PS: And thanks for providing this great parser!! >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov From hlapp at gmx.net Sat Oct 1 21:57:03 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Oct 1 21:56:54 2005 Subject: [Bioperl-l] Re: entrezgene binary ASN In-Reply-To: <433EEB00.4050800@utk.edu> References: <432B16DC.6040901@utk.edu> <433006AE.5040406@utk.edu> <433D6686.3000301@gpc-biotech.com> <433D8EF5.7000003@gpc-biotech.com> <12f0b79067078d5a5d8b2686f5dffe75@gmx.net> <433EEB00.4050800@utk.edu> Message-ID: On Oct 1, 2005, at 1:01 PM, Stefan Kirov wrote: > Hilmar, > As of now the parser does not seek through the streem, but hopefully > it will as soon as I can sit down and do it What advantage would that have? Note that not allowing streams first off makes entrezgene different from all other formats, and second, together with the gene2xml conversion requirement would require you to call it in a different manner than all other SeqIO parsers (i.e., just passing a string, w/ or w/o trailing pipe, wouldn't suffice; you'd have to do a preprocessing step). If seeking in the file can outweigh that with some significant advantages, then great, but even then it should be optional if it can be within reason. -hilmar > (by the way it is weird but gene2xml will not parse the gunzipped > file, so you should not use gzip -d). > I don't think you are missing anything as far as I can tell. > Stefan > > Hilmar Lapp wrote: > >> I've tried to listen in on the exchange but I'm not sure I understand >> what the issue is. >> >> I.e., does the parser need to seek in the stream? If yes, then piping >> won't do any good if it works at all. If no, then the parser should >> be perfectly fine with the filename being output from a pipe, and >> possibly accept a file handle in substitution too. In that case, the >> caller can pipe the actual input through any commands he/she wishes >> by simply passing the piped command(s) (e.g. as in "gzip -d -c >> file.asn.gz|gene2xml|"). >> >> The parser doing this auto-magically isn't necessary and doesn't save >> a caller that much. Instead, it exposes the parser to liabilities >> like path of gzip, path of gene2xml and similar stuff which may not >> be identical on all platforms. >> >> What am I missing? >> >> -hilmar >> >> On Sep 30, 2005, at 12:55 PM, Michael Seewald wrote: >> >>> On 9/30/05, Mingyi Liu wrote: >>> >>>> >>>> I didn't say indexing would break, but the performance of retrieval >>>> would be horrible. That's why in most situations there's no need to >>>> use >>>> pipe - after all, any one who needs to use index & ID-based >>>> retrieval >>>> would convert the binary ASN to text file anyway (using a script, >>>> hopefully). >>> >>> >>> >>> Absolutely right, seeking would be horrible. >>> >>> Best wishes, >>> Michael >>> >>> PS: And thanks for providing this great parser!! >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > > -- > Stefan Kirov, Ph.D. > University of Tennessee/Oak Ridge National Laboratory > 5700 bldg, PO BOX 2008 MS6164 > Oak Ridge TN 37831-6164 > USA > tel +865 576 5120 > fax +865-576-5332 > e-mail: skirov@utk.edu > sao@ornl.gov > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From mseewald at gmail.com Sun Oct 2 11:27:17 2005 From: mseewald at gmail.com (Michael Seewald) Date: Sun Oct 2 11:33:26 2005 Subject: [Bioperl-l] Re: entrezgene binary ASN In-Reply-To: References: <432B16DC.6040901@utk.edu> <433006AE.5040406@utk.edu> <433D6686.3000301@gpc-biotech.com> <433D8EF5.7000003@gpc-biotech.com> <12f0b79067078d5a5d8b2686f5dffe75@gmx.net> <433EEB00.4050800@utk.edu> Message-ID: On 10/2/05, Hilmar Lapp wrote: > > > On Oct 1, 2005, at 1:01 PM, Stefan Kirov wrote: > > > Hilmar, > > As of now the parser does not seek through the streem, but hopefully > > it will as soon as I can sit down and do it > > What advantage would that have? You could create an index and seek by gene id. Could be handy in some cases, although I wouldn't use it (personally). Note that not allowing streams first off makes entrezgene different > from all other formats, and second, together with the gene2xml > conversion requirement would require you to call it in a different > manner than all other SeqIO parsers (i.e., just passing a string, w/ or > w/o trailing pipe, wouldn't suffice; you'd have to do a preprocessing > step). If seeking in the file can outweigh that with some significant > advantages, then great, but even then it should be optional if it can > be within reason. The "streaming use" shouldn't be affected. BTW, there are other bioperl modules that use indices. Best wishes, Michael From hlapp at gmx.net Sun Oct 2 15:03:57 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun Oct 2 15:04:38 2005 Subject: [Bioperl-l] Re: entrezgene binary ASN In-Reply-To: References: <432B16DC.6040901@utk.edu> <433006AE.5040406@utk.edu> <433D6686.3000301@gpc-biotech.com> <433D8EF5.7000003@gpc-biotech.com> <12f0b79067078d5a5d8b2686f5dffe75@gmx.net> <433EEB00.4050800@utk.edu> Message-ID: <6a4c5cc1e5c136969f8c5bf3b87e126c@gmx.net> On Oct 2, 2005, at 8:27 AM, Michael Seewald wrote: > On 10/2/05, Hilmar Lapp wrote: >> >> >> On Oct 1, 2005, at 1:01 PM, Stefan Kirov wrote: >> >>> Hilmar, >>> As of now the parser does not seek through the streem, but hopefully >>> it will as soon as I can sit down and do it >> >> What advantage would that have? > > > > You could create an index and seek by gene id. Could be handy in some > cases, > although I wouldn't use it (personally). > > [...] > > The "streaming use" shouldn't be affected. BTW, there are other bioperl > modules that use indices. Right, but they use a different implementation (module) for indexing, not the SeqIO parser (it only parses the entry). -hilmar > Best wishes, > Michael > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From skirov at utk.edu Sun Oct 2 15:08:42 2005 From: skirov at utk.edu (Stefan Kirov) Date: Sun Oct 2 15:09:41 2005 Subject: [Bioperl-l] Re: entrezgene binary ASN In-Reply-To: <6a4c5cc1e5c136969f8c5bf3b87e126c@gmx.net> References: <432B16DC.6040901@utk.edu> <433006AE.5040406@utk.edu> <433D6686.3000301@gpc-biotech.com> <433D8EF5.7000003@gpc-biotech.com> <12f0b79067078d5a5d8b2686f5dffe75@gmx.net> <433EEB00.4050800@utk.edu> <6a4c5cc1e5c136969f8c5bf3b87e126c@gmx.net> Message-ID: <4340303A.30103@utk.edu> Yes, Hilmar is right- there should be Bio::Index::entrezgene to do that. I agree this functionality is better kept separated from Bio::SeqIO parser. Stefan Hilmar Lapp wrote: > > On Oct 2, 2005, at 8:27 AM, Michael Seewald wrote: > >> On 10/2/05, Hilmar Lapp wrote: >> >>> >>> >>> On Oct 1, 2005, at 1:01 PM, Stefan Kirov wrote: >>> >>>> Hilmar, >>>> As of now the parser does not seek through the streem, but hopefully >>>> it will as soon as I can sit down and do it >>> >>> >>> What advantage would that have? >> >> >> >> >> You could create an index and seek by gene id. Could be handy in some >> cases, >> although I wouldn't use it (personally). >> >> [...] >> >> The "streaming use" shouldn't be affected. BTW, there are other bioperl >> modules that use indices. > > > Right, but they use a different implementation (module) for indexing, > not the SeqIO parser (it only parses the entry). > > -hilmar > > >> Best wishes, >> Michael >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From wes.barris at csiro.au Sun Oct 2 19:21:16 2005 From: wes.barris at csiro.au (Wes Barris) Date: Sun Oct 2 19:36:08 2005 Subject: [Bioperl-l] get_nof_contigs returns undef In-Reply-To: <200509260710.32624.heikki.lehvaslaiho@gmail.com> References: <433379F6.4000306@csiro.au> <200509241011.45932.heikki.lehvaslaiho@gmail.com> <17206.61348.329628.13021@satchel.alerce.com> <200509260710.32624.heikki.lehvaslaiho@gmail.com> Message-ID: <43406B6C.5050005@csiro.au> Heikki Lehvaslaiho wrote: > Ah, thanks, the extra square brackets do the trick! > > return scalar( @{[moose()]} ); > > sub get_nof_contigs { > my $self = shift; > return scalar( @{[$self->get_contig_ids()]} ); > } > > Talk about clean syntax! ;-) Yes, one might get a headache trying to figure out what that line is doing. I notice that this fix has not made it into the code yet. Should I submit a bug report to get this fix pushed through? > > Seriously, I do agree with you that it is better to be wordy and clear than > terse and hard to read. > > -Heikki > > On Sunday 25 September 2005 19:42, George Hartzell wrote: > >>Heikki Lehvaslaiho writes: >> > So the question is how to force list context for a subroutine. This is >> > something I've often wondered and have not found a clean solution. >> > [...] >> >>No, [I think] that's not the problem here. >> >>The problem is that there's a call to sort() buried in the >>get_nof_contigs return statement and sort is a Surprising function (it >>returns undef when called in a scalar context). >> >> >>The cleanest fix would be to petition the Perl community to change the >>semantics of the sort() function so that it's not so Surprising.... ;) >> >> >>In the meantime, I tend to avoid doing real work in a return >>statement, I'd do the work w/in the body of the function, assign the >>results to an array (e.g. @results), then return that. >> >>It looks a bit wordy and people think I'm paranoid, but it's just so >>much easier for me to be safe than to try to remember whether Perl's >>going to Do The Right Thing or just bite me in the ass.... >> >>If you really want to force something into a list context, the second >>paragraph of the documentation for scalar() says this: >> >> There is no equivalent operator to force an expression to be >> interpolated in list context because in practice, this is never >> needed. If you really wanted to do so, however, you could use >> the construction "@{[ (some expression) ]}", but usually a sim- >> ple "(some expression)" suffices. >> >> >>g >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Wes Barris E-Mail: Wes.Barris@csiro.au From mingyi.liu at gpc-biotech.com Mon Oct 3 08:02:07 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Mon Oct 3 08:04:09 2005 Subject: [Bioperl-l] Re: entrezgene binary ASN In-Reply-To: <4340303A.30103@utk.edu> References: <432B16DC.6040901@utk.edu> <433006AE.5040406@utk.edu> <433D6686.3000301@gpc-biotech.com> <433D8EF5.7000003@gpc-biotech.com> <12f0b79067078d5a5d8b2686f5dffe75@gmx.net> <433EEB00.4050800@utk.edu> <6a4c5cc1e5c136969f8c5bf3b87e126c@gmx.net> <4340303A.30103@utk.edu> Message-ID: <43411DBF.1040206@gpc-biotech.com> Yelp, I agree with pretty much all the points Hilmar raised. I don't think all the opinions on pipes or index/seek differ that much. Just want to note that Bio::ASN1::EntrezGene::Indexer (included in the download of Bio::ASN1::EntrezGene and a child class of Bio::Index::AbstractSeq) already does the indexing and retrieval of Bio::Seq object of entrez gene using id. And Bio::ASN1::EntrezGene returns filehandle through function fh, which can be used for seeking (and in fact, Bio::ASN1::EntrezGene::Indexer uses the function to do seeking for a non-bio-perl-standard function it provides). So seeking possibility is already provided by the fh function in the low-level parser of entrezgene, but not directly handled in parser (though used in indexer). Mingyi Stefan Kirov wrote: > Yes, Hilmar is right- there should be Bio::Index::entrezgene to do > that. I agree this functionality is better kept separated from > Bio::SeqIO parser. > Stefan From rvosa at sfu.ca Mon Oct 3 10:40:11 2005 From: rvosa at sfu.ca (Rutger Vos) Date: Mon Oct 3 11:06:55 2005 Subject: [Bioperl-l] BioCorba and phylogenetic trees Message-ID: <434142CB.1040607@sfu.ca> Dear all, I am curious about the following: * what's the status of BioCorba, and how does it relate to BioPerl * have bioperl-corba-clients and/or bioperl-corba-servers succesfully been deployed on Win32? Using CORBA::ORBit? * are there IDLs for tree objects/data structures in BioCorba? Some background: I'm "the perl guy" in this project: http://www.phylo.org We are looking to integrate phylogenetics software using corba. It'd be good if we were compatible with BioPerl, hence the questions. Cheers, Rutger -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From muratem at eng.uah.edu Mon Oct 3 12:12:07 2005 From: muratem at eng.uah.edu (Mike Muratet) Date: Mon Oct 3 12:36:47 2005 Subject: [Bioperl-l] SeqFeature::add_tag_value Message-ID: Greetings When trying to add a feature: $newcds->add_tag_value('translation', $seqobj->translate->seq()); This is exactly like the examples in the Pasteur Inst. tutorial. It prints in the output: /gene="Bio::Annotation::SimpleValue=HASH(0x889581c)" What really simple perl or bioperl point have I missed? Thanks Mike From hlapp at gmx.net Mon Oct 3 12:52:03 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon Oct 3 12:58:34 2005 Subject: [Bioperl-l] SeqFeature::add_tag_value In-Reply-To: References: Message-ID: <0ab7fb037f6e447c2f4218cb1520b905@gmx.net> The "simple point" you've missed is that this is a known 1.5 bug and you need to upgrade, either using CVS, or, probably much easier, using the latest release candidate for 1.5.1 at http://bioperl.org/DIST/bioperl-1.5.1-rc3.tar.gz -hilmar On Oct 3, 2005, at 9:12 AM, Mike Muratet wrote: > Greetings > > When trying to add a feature: > > $newcds->add_tag_value('translation', $seqobj->translate->seq()); > > This is exactly like the examples in the Pasteur Inst. tutorial. > > It prints in the output: > > /gene="Bio::Annotation::SimpleValue=HASH(0x889581c)" > > What really simple perl or bioperl point have I missed? > > Thanks > > Mike > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From heikki at ebi.ac.uk Mon Oct 3 13:00:56 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Mon Oct 3 12:59:59 2005 Subject: [Bioperl-l] bioperl-live CVS installation In-Reply-To: References: Message-ID: <200510031800.57249.heikki@ebi.ac.uk> On Thursday 29 September 2005 14:56, Paul G Cantalupo wrote: > Hello, > > I am using the CVS bioperl-live repository and trying to understand how to > properly use CVS. After I typed 'cvs update', I got the following: > > ? Makefile > ? blib > ? pm_to_blib > ? maintenance/symlink_scripts.pl > cvs update: Updating . > cvs update: Updating Bio.............. > > Is it necessary to delete the files/directories that are preceeded with a > question mark (?) before 'cvs update'? No. It just means that cvs is agnostic about those files. Since you seem to be new to cvs: you can silence cvs to some extent to better see the changes: > cat /home/heikki/.cvsrc cvs -q diff -c update -dP -d to update is important as it tells update to: "Create any directories that exist in the repository if they're missing from the working directory. Normally, update acts only on directories and files that were already enrolled in your working directory. " > Also, after I perform a 'cvs update', is it necessary to run the usual: > > perl Makefile.PL > make > make test > make install > Not necessary. The only necessary step is to point perl to the installation directory, e.g.: export PERL5LIB='$HOME/src/bioperl-live:$HOME/src/bioperl-run' (in your ~/.bach_profile) However, it is recommendable to run the first to see what CPAN modules you might be missing to use some specific functionality. (and 'make test' if you want to help testing the code, of course) -Heikki > Thank you, > > Paul > > Paul Cantalupo > Research Specialist/Systems Programmer > 559 Crawford Hall > Department of Biological Sciences > University of Pittsburgh > Pittsburgh, PA 15260 > Work: 412-624-4687 > Fax: 412-624-4759 > > Ask me about Toastmasters: www.toastmasters.org > Midday Club Treasurer > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From muratem at eng.uah.edu Mon Oct 3 13:32:47 2005 From: muratem at eng.uah.edu (Mike Muratet) Date: Mon Oct 3 13:34:41 2005 Subject: [Bioperl-l] SeqFeature::add_tag_value In-Reply-To: <0ab7fb037f6e447c2f4218cb1520b905@gmx.net> Message-ID: On Mon, 3 Oct 2005, Hilmar Lapp wrote: > The "simple point" you've missed is that this is a known 1.5 bug and > you need to upgrade, either using CVS, or, probably much easier, using > the latest release candidate for 1.5.1 at > http://bioperl.org/DIST/bioperl-1.5.1-rc3.tar.gz > Hilmar Thanks. Works fine now. Mike From heikki.lehvaslaiho at gmail.com Mon Oct 3 07:34:56 2005 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon Oct 3 14:31:49 2005 Subject: [Bioperl-l] get_nof_contigs returns undef In-Reply-To: <43406B6C.5050005@csiro.au> References: <433379F6.4000306@csiro.au> <200509260710.32624.heikki.lehvaslaiho@gmail.com> <43406B6C.5050005@csiro.au> Message-ID: <200510031234.56570.heikki.lehvaslaiho@gmail.com> It is in now. I thought that you had access to the CVS and had done it already. Sorry, -Heikki On Monday 03 October 2005 00:21, Wes Barris wrote: > Heikki Lehvaslaiho wrote: > > Ah, thanks, the extra square brackets do the trick! > > > > return scalar( @{[moose()]} ); > > > > sub get_nof_contigs { > > my $self = shift; > > return scalar( @{[$self->get_contig_ids()]} ); > > } > > > > Talk about clean syntax! ;-) > > Yes, one might get a headache trying to figure out what that line is > doing. > > I notice that this fix has not made it into the code yet. Should I > submit a bug report to get this fix pushed through? > > > Seriously, I do agree with you that it is better to be wordy and clear > > than terse and hard to read. > > > > -Heikki > > > > On Sunday 25 September 2005 19:42, George Hartzell wrote: > >>Heikki Lehvaslaiho writes: > >> > So the question is how to force list context for a subroutine. This is > >> > something I've often wondered and have not found a clean solution. > >> > [...] > >> > >>No, [I think] that's not the problem here. > >> > >>The problem is that there's a call to sort() buried in the > >>get_nof_contigs return statement and sort is a Surprising function (it > >>returns undef when called in a scalar context). > >> > >> > >>The cleanest fix would be to petition the Perl community to change the > >>semantics of the sort() function so that it's not so Surprising.... ;) > >> > >> > >>In the meantime, I tend to avoid doing real work in a return > >>statement, I'd do the work w/in the body of the function, assign the > >>results to an array (e.g. @results), then return that. > >> > >>It looks a bit wordy and people think I'm paranoid, but it's just so > >>much easier for me to be safe than to try to remember whether Perl's > >>going to Do The Right Thing or just bite me in the ass.... > >> > >>If you really want to force something into a list context, the second > >>paragraph of the documentation for scalar() says this: > >> > >> There is no equivalent operator to force an expression to be > >> interpolated in list context because in practice, this is never > >> needed. If you really wanted to do so, however, you could use > >> the construction "@{[ (some expression) ]}", but usually a sim- > >> ple "(some expression)" suffices. > >> > >> > >>g > >> > >>_______________________________________________ > >>Bioperl-l mailing list > >>Bioperl-l@portal.open-bio.org > >>http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From hlapp at gnf.org Mon Oct 3 19:59:17 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Oct 3 20:39:51 2005 Subject: [Bioperl-l] Re: [BioSQL-l] removing features from a sequence In-Reply-To: <3cfaa4040510031059l79780e7bl49686d47be9108c0@mail.gmail.com> References: <3cfaa4040510031059l79780e7bl49686d47be9108c0@mail.gmail.com> Message-ID: Yeah I guess this is one of the gotcha's that deserve better documentation. The bioperl-db adaptors will not automatically 'sync' the database with an object. The reason is that there are too many flavors of what you could possibly want as a user, so instead of making decisions for you you need to make explicit what you want; however, doing so should be reasonably simple. So here's what's going on and how you can fix it. First, $dbseq->remove_SeqFeatures() is a Bio::SeqI method present for all SeqI objects, not just persistent ones. Except in a few cases where lazy loading is already implemented, methods from the native Bioperl API are not overridden for persistent objects; i.e., you can manipulate your persistent object to your heart's content and nothing will happen to the respective row(s) in the database. You need to say $dbseq->store() to let your changes take effect. But see below! Second, $dbseq->store() will only store it; i.e., it will update the object and either update or insert all attached objects (like features, annotations, etc). If you want to delete attached objects then you need to do so explicitly by calling $pobj->remove(). For example, in your case: # ... find $dbseq ... # delete all features from the database # Note: I could use $dbseq->get_SeqFeatures() if I # wanted to keep the features on the in-memory object foreach my $pfeat ($dbseq->remove_SeqFeatures()) { $pfeat->remove(); } # now $dbseq and the object in the db don't have features Same thing for annotation. You can check out some of the sample closure implementations for merging objects provided in the scripts/biosql directory of bioperl-db, for instance freshen-annot.pl deletes all annotation (in the db) from the existing object. Hth, -hilmar On Oct 3, 2005, at 10:59 AM, Amit Indap wrote: > Hi, > > I was trying to remove features for sequences stored in my BioSQL > database. Once I run the code snippet below to remove sequence > features, I tested to see if the features really had been removed by > running a script that reterieves seq features from bioentries. > Unfortunately, the features are still there. I'm still learning my > around the Bio::DB API > > Here is my code to attempts to remove sequence features: > > foreach (@accs) { > > my $acc = $_; > my $adp = $dbadp->get_object_adaptor("Bio::SeqI"); > > my $seq = Bio::Seq->new(-accession_number => $acc, > -namespace => $namespace > ); > > > my $dbseq = $adp->find_by_unique_key($seq); > warn $acc, " not found in database $namespace" unless $dbseq; > > $dbseq->remove_SeqFeatures(); # remove seqfeatures > > $dbseq->commit(); > print LOG "removed all seq features for $acc\n"; > } > > > > -- > Amit Indap > http://www.bscb.cornell.edu/Homepages/Amit_Indap/ > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From indapa at gmail.com Tue Oct 4 14:02:24 2005 From: indapa at gmail.com (Amit Indap) Date: Tue Oct 4 14:02:20 2005 Subject: [Bioperl-l] Re: [BioSQL-l] removing features from a sequence In-Reply-To: References: <3cfaa4040510031059l79780e7bl49686d47be9108c0@mail.gmail.com> Message-ID: <3cfaa4040510041102l77fbc6d6v28968a13a9a4a515@mail.gmail.com> Thanks Hilmar, this snippet worked: foreach my $pfeat( $dbseq->remove_SeqFeatures() ) { $pfeat->remove(); } $dbseq->store(); $dbseq->commit(); > Second, $dbseq->store() will only store it; i.e., it will update the > object and either update or insert all attached objects (like features, > annotations, etc). If you want to delete attached objects then you need > to do so explicitly by calling $pobj->remove(). Thanks for clarifying this point. I was a bit confused about whether I needed to call store or remove. But you are right, I need to explicitly call remove. On 10/3/05, Hilmar Lapp wrote: > Yeah I guess this is one of the gotcha's that deserve better > documentation. The bioperl-db adaptors will not automatically 'sync' > the database with an object. The reason is that there are too many > flavors of what you could possibly want as a user, so instead of making > decisions for you you need to make explicit what you want; however, > doing so should be reasonably simple. > > So here's what's going on and how you can fix it. > > First, $dbseq->remove_SeqFeatures() is a Bio::SeqI method present for > all SeqI objects, not just persistent ones. Except in a few cases where > lazy loading is already implemented, methods from the native Bioperl > API are not overridden for persistent objects; i.e., you can manipulate > your persistent object to your heart's content and nothing will happen > to the respective row(s) in the database. You need to say > $dbseq->store() to let your changes take effect. But see below! > > Second, $dbseq->store() will only store it; i.e., it will update the > object and either update or insert all attached objects (like features, > annotations, etc). If you want to delete attached objects then you need > to do so explicitly by calling $pobj->remove(). > > For example, in your case: > > # ... find $dbseq ... > # delete all features from the database > # Note: I could use $dbseq->get_SeqFeatures() if I > # wanted to keep the features on the in-memory object > foreach my $pfeat ($dbseq->remove_SeqFeatures()) { > $pfeat->remove(); > } > # now $dbseq and the object in the db don't have features > > Same thing for annotation. You can check out some of the sample closure > implementations for merging objects provided in the scripts/biosql > directory of bioperl-db, for instance freshen-annot.pl deletes all > annotation (in the db) from the existing object. > > Hth, > > -hilmar > > On Oct 3, 2005, at 10:59 AM, Amit Indap wrote: > > > Hi, > > > > I was trying to remove features for sequences stored in my BioSQL > > database. Once I run the code snippet below to remove sequence > > features, I tested to see if the features really had been removed by > > running a script that reterieves seq features from bioentries. > > Unfortunately, the features are still there. I'm still learning my > > around the Bio::DB API > > > > Here is my code to attempts to remove sequence features: > > > > foreach (@accs) { > > > > my $acc = $_; > > my $adp = $dbadp->get_object_adaptor("Bio::SeqI"); > > > > my $seq = Bio::Seq->new(-accession_number => $acc, > > -namespace => $namespace > > ); > > > > > > my $dbseq = $adp->find_by_unique_key($seq); > > warn $acc, " not found in database $namespace" unless $dbseq; > > > > $dbseq->remove_SeqFeatures(); # remove seqfeatures > > > > $dbseq->commit(); > > print LOG "removed all seq features for $acc\n"; > > } > > > > > > > > -- > > Amit Indap > > http://www.bscb.cornell.edu/Homepages/Amit_Indap/ > > > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l@open-bio.org > > http://open-bio.org/mailman/listinfo/biosql-l > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > -- Amit Indap http://www.bscb.cornell.edu/Homepages/Amit_Indap/ From hlapp at gnf.org Tue Oct 4 14:19:28 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Oct 4 14:21:16 2005 Subject: [Bioperl-l] Re: [BioSQL-l] removing features from a sequence In-Reply-To: <3cfaa4040510041102l77fbc6d6v28968a13a9a4a515@mail.gmail.com> References: <3cfaa4040510031059l79780e7bl49686d47be9108c0@mail.gmail.com> <3cfaa4040510041102l77fbc6d6v28968a13a9a4a515@mail.gmail.com> Message-ID: On Oct 4, 2005, at 11:02 AM, Amit Indap wrote: > Thanks Hilmar, this snippet worked: > > foreach my $pfeat( $dbseq->remove_SeqFeatures() ) { > > $pfeat->remove(); > > } > $dbseq->store(); Just as a small and technical note: if the only changes you made to the object were to delete attached objects, then you don't need to call $dbseq->store(). It doesn't hurt though either because the persistence adaptor will check the is_dirty() property of a persistent object before issuing the update command, and so if you only removed attached objects then is_dirty() should still be false. -hilmar > $dbseq->commit(); > >> Second, $dbseq->store() will only store it; i.e., it will update the >> object and either update or insert all attached objects (like >> features, >> annotations, etc). If you want to delete attached objects then you >> need >> to do so explicitly by calling $pobj->remove(). > > Thanks for clarifying this point. I was a bit confused about whether I > needed to call store or remove. But you are right, I need to > explicitly call remove. > > On 10/3/05, Hilmar Lapp wrote: >> Yeah I guess this is one of the gotcha's that deserve better >> documentation. The bioperl-db adaptors will not automatically 'sync' >> the database with an object. The reason is that there are too many >> flavors of what you could possibly want as a user, so instead of >> making >> decisions for you you need to make explicit what you want; however, >> doing so should be reasonably simple. >> >> So here's what's going on and how you can fix it. >> >> First, $dbseq->remove_SeqFeatures() is a Bio::SeqI method present for >> all SeqI objects, not just persistent ones. Except in a few cases >> where >> lazy loading is already implemented, methods from the native Bioperl >> API are not overridden for persistent objects; i.e., you can >> manipulate >> your persistent object to your heart's content and nothing will happen >> to the respective row(s) in the database. You need to say >> $dbseq->store() to let your changes take effect. But see below! >> >> Second, $dbseq->store() will only store it; i.e., it will update the >> object and either update or insert all attached objects (like >> features, >> annotations, etc). If you want to delete attached objects then you >> need >> to do so explicitly by calling $pobj->remove(). >> >> For example, in your case: >> >> # ... find $dbseq ... >> # delete all features from the database >> # Note: I could use $dbseq->get_SeqFeatures() if I >> # wanted to keep the features on the in-memory object >> foreach my $pfeat ($dbseq->remove_SeqFeatures()) { >> $pfeat->remove(); >> } >> # now $dbseq and the object in the db don't have features >> >> Same thing for annotation. You can check out some of the sample >> closure >> implementations for merging objects provided in the scripts/biosql >> directory of bioperl-db, for instance freshen-annot.pl deletes all >> annotation (in the db) from the existing object. >> >> Hth, >> >> -hilmar >> >> On Oct 3, 2005, at 10:59 AM, Amit Indap wrote: >> >>> Hi, >>> >>> I was trying to remove features for sequences stored in my BioSQL >>> database. Once I run the code snippet below to remove sequence >>> features, I tested to see if the features really had been removed by >>> running a script that reterieves seq features from bioentries. >>> Unfortunately, the features are still there. I'm still learning my >>> around the Bio::DB API >>> >>> Here is my code to attempts to remove sequence features: >>> >>> foreach (@accs) { >>> >>> my $acc = $_; >>> my $adp = $dbadp->get_object_adaptor("Bio::SeqI"); >>> >>> my $seq = Bio::Seq->new(-accession_number => $acc, >>> -namespace => $namespace >>> ); >>> >>> >>> my $dbseq = $adp->find_by_unique_key($seq); >>> warn $acc, " not found in database $namespace" unless $dbseq; >>> >>> $dbseq->remove_SeqFeatures(); # remove seqfeatures >>> >>> $dbseq->commit(); >>> print LOG "removed all seq features for $acc\n"; >>> } >>> >>> >>> >>> -- >>> Amit Indap >>> http://www.bscb.cornell.edu/Homepages/Amit_Indap/ >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l@open-bio.org >>> http://open-bio.org/mailman/listinfo/biosql-l >>> >> -- >> ------------------------------------------------------------- >> Hilmar Lapp email: lapp at gnf.org >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> ------------------------------------------------------------- >> >> > > > -- > Amit Indap > http://www.bscb.cornell.edu/Homepages/Amit_Indap/ > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From heikki.lehvaslaiho at gmail.com Tue Oct 4 15:52:33 2005 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Tue Oct 4 19:32:53 2005 Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] How to obtain all x-linked genes from human genome? Message-ID: <200510042052.33739.heikki@ebi.ac.uk> ---------- Forwarded Message ---------- Subject: [Bioperl-guts-l] How to obtain all x-linked genes from human genome? Date: Tuesday 04 October 2005 18:34 From: Sally Li To: Hilmar Lapp Cc: bioperl-guts-l@bioperl.org Hi, Could some one know how to obtain all gi numbers (gene ID in GenBank) of x-linked genes from human genome? Thanks! Sally __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l ------------------------------------------------------- -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From sdavis2 at mail.nih.gov Wed Oct 5 09:25:33 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed Oct 5 10:42:39 2005 Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] How to obtain all x-linked genes from human genome? In-Reply-To: <200510042052.33739.heikki@ebi.ac.uk> Message-ID: On 10/4/05 3:52 PM, "Heikki Lehvaslaiho" wrote: > > > ---------- Forwarded Message ---------- > > Subject: [Bioperl-guts-l] How to obtain all x-linked genes from human genome? > Date: Tuesday 04 October 2005 18:34 > From: Sally Li > To: Hilmar Lapp > Cc: bioperl-guts-l@bioperl.org > > Hi, > > Could some one know how to obtain all gi numbers (gene > ID in GenBank) of x-linked genes from human genome? The file gene2refseq.gz located here: ftp://ftp.ncbi.nih.gov/gene/DATA/gene2refseq.gz contains information regarding Refseqs (and the associated Gis). The data structure is described here: ftp://ftp.ncbi.nih.gov/gene/DATA/README Human tax_id is 9606. GI numbers are there for the refseq sequences. If you need more sequences, you could get the genbank accessions from UCSC using the table browser and limiting to the x-chromosome, but I'm not sure if/where they store the GI numbers. Sean From olenka.m at gmail.com Wed Oct 5 15:35:33 2005 From: olenka.m at gmail.com (Olena Morozova) Date: Wed Oct 5 16:31:24 2005 Subject: [Bioperl-l] Connecting to EnsEMBL databases help! Message-ID: <259a224c0510051235q77ffdeabse69786ee70d85e8f@mail.gmail.com> I am using the following script to connect to the Ensembl core. I keep getting the message "unknown database homo_sapiens_core". Anyone knows a solution to this? I would really appreciate any help! Thanks Olena #!/bin/perl -w use lib 'C:/perl/lib/ensembl/modules'; use Bio::EnsEMBL::DBSQL::DBAdaptor; use Bio::EnsEMBL::DBSQL::SliceAdaptor; my $host = 'ensembldb.ensembl.org'; my $user = 'anonymous'; my $dbname = 'homo_sapiens_core'; my $db = new Bio::EnsEMBL::DBSQL::DBAdaptor( -host => $host, -user => $user, -dbname => $dbname); my $slice_adaptor = $db->get_SliceAdaptor(); my $slice = $slice_adaptor->fetch_by_region('chromosome', 'X'); From skirov at utk.edu Wed Oct 5 16:46:36 2005 From: skirov at utk.edu (Stefan Kirov) Date: Wed Oct 5 16:46:16 2005 Subject: [Bioperl-l] Connecting to EnsEMBL databases help! In-Reply-To: <259a224c0510051235q77ffdeabse69786ee70d85e8f@mail.gmail.com> References: <259a224c0510051235q77ffdeabse69786ee70d85e8f@mail.gmail.com> Message-ID: <43443BAC.2040806@utk.edu> There is no such database. It is homo_sapiens_core_34_35g. You can (if you have mysql client, or you can install mysql browser, see mysql.com): mysql -h ensembldb.ensembl.org -u anonymous show databases; to see all active databases. Also ensembl-dev is the list where you should send these questions to. Good luck. Stefan Olena Morozova wrote: >I am using the following script to connect to the Ensembl core. I keep >getting the message "unknown database homo_sapiens_core". Anyone knows >a solution to this? >I would really appreciate any help! >Thanks >Olena > >#!/bin/perl -w > >use lib 'C:/perl/lib/ensembl/modules'; >use Bio::EnsEMBL::DBSQL::DBAdaptor; >use Bio::EnsEMBL::DBSQL::SliceAdaptor; > >my $host = 'ensembldb.ensembl.org'; >my $user = 'anonymous'; >my $dbname = 'homo_sapiens_core'; >my $db = new Bio::EnsEMBL::DBSQL::DBAdaptor( > -host => $host, > -user => $user, > -dbname => $dbname); > >my $slice_adaptor = $db->get_SliceAdaptor(); >my $slice = $slice_adaptor->fetch_by_region('chromosome', >'X'); > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From venomousanimal at web.de Thu Oct 6 02:56:55 2005 From: venomousanimal at web.de (venomousanimal) Date: Thu Oct 6 04:31:20 2005 Subject: [Bioperl-l] Connecting to EnsEMBL databases help! In-Reply-To: <259a224c0510051235q77ffdeabse69786ee70d85e8f@mail.gmail.com> References: <259a224c0510051235q77ffdeabse69786ee70d85e8f@mail.gmail.com> Message-ID: <4344CAB7.5020501@web.de> Olena Morozova schrieb: >I am using the following script to connect to the Ensembl core. I keep >getting the message "unknown database homo_sapiens_core". Anyone knows >a solution to this? >I would really appreciate any help! >Thanks >Olena > >#!/bin/perl -w > >use lib 'C:/perl/lib/ensembl/modules'; >use Bio::EnsEMBL::DBSQL::DBAdaptor; >use Bio::EnsEMBL::DBSQL::SliceAdaptor; > >my $host = 'ensembldb.ensembl.org'; >my $user = 'anonymous'; >my $dbname = 'homo_sapiens_core'; >my $db = new Bio::EnsEMBL::DBSQL::DBAdaptor( > -host => $host, > -user => $user, > -dbname => $dbname); > >my $slice_adaptor = $db->get_SliceAdaptor(); >my $slice = $slice_adaptor->fetch_by_region('chromosome', >'X'); > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > u have to say which homospainscordb u want. eg homo_sapiens_core35e u can look up at ensembl homepage which one is the most recent at the moment or look in the api help good luck From ctemp2 at free.fr Thu Oct 6 11:11:55 2005 From: ctemp2 at free.fr (ctemp2@free.fr) Date: Thu Oct 6 12:08:37 2005 Subject: [Bioperl-l] Need help for a GO request Message-ID: <1128611515.43453ebb681f0@imp2-g19.free.fr> Hello, I have a list of genes (accession numbers) and I would like to find the associated informations in Gene Ontology (I work with those datas, find biological process, common node between 2 genes...). I am a newbie with Bioperl and I would like to know : - if there would be a module to send a request to the NCBI to obtain the GO informations for those genes - or if there would be a module to send the GO id to the GO site (I tried to use Bio::Ontology::SimpleGOEngine but the example doesn't work, I get an error with the 'parse' method which is not found by Perl in the module). Thanks if you can help me. Regards. C. Tobini From skirov at utk.edu Thu Oct 6 12:35:17 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Oct 6 12:35:02 2005 Subject: [Bioperl-l] Need help for a GO request In-Reply-To: <1128611515.43453ebb681f0@imp2-g19.free.fr> References: <1128611515.43453ebb681f0@imp2-g19.free.fr> Message-ID: <43455245.5040609@utk.edu> There is bunch of ways to do that, including existing tools. See genereg.ornl.gov/webgestalt and genereg.ornl.gov/gotm for example (I believe it is doing precisely what you need). You may need to convert your ids to entrezgene id: go to genereg.ornl.gov/gkdb/->Gateway->ID conversion You may also install locally the GO database from www.godatabase.org (which is mysql). Stefan ctemp2@free.fr wrote: >Hello, > >I have a list of genes (accession numbers) and I would like to find the >associated informations in Gene Ontology (I work with those datas, find >biological process, common node between 2 genes...). > >I am a newbie with Bioperl and I would like to know : > >- if there would be a module to send a request to the NCBI to obtain the GO >informations for those genes > >- or if there would be a module to send the GO id to the GO site (I tried to use >Bio::Ontology::SimpleGOEngine but the example doesn't work, I get an error with >the 'parse' method which is not found by Perl in the module). > >Thanks if you can help me. > >Regards. > >C. Tobini >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From olenka.m at gmail.com Thu Oct 6 16:47:54 2005 From: olenka.m at gmail.com (Olena Morozova) Date: Thu Oct 6 17:14:05 2005 Subject: [Bioperl-l] GeneKeyDB -- finding orthologues question Message-ID: <259a224c0510061347q573a794cte5323b2883e090a6@mail.gmail.com> I have a list of human genes, and I need to find the corresponding orthologs in the mouse. GKDB looks like a good way to do it, except that the program's output does not indicate if a particular gene does not have an ortholog. What I mean is that if I input a list of 50 genes, it will give me a list of 46 orthologs (let's say) in the mouse, and I have no way of telling which ortholog corresponds to which gene. Is there any way to get around this problem? For instance, in the output of the ID converter script, the program leaves blank spaces where it could not find the corresponding ID.. Alternatively, is there another way to find orthologs? I am still having troubles with EnsEMBL.... Thank you very much for your help, and I am sorry if my questions are stupid -- I am a total newbie here... THanks again, Olena From skirov at utk.edu Thu Oct 6 19:33:20 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Oct 6 19:33:11 2005 Subject: [Bioperl-l] bug Message-ID: <4345B440.60709@utk.edu> Matt, try this: #!/usr/bin/perl use warnings; use strict; use Bio::SeqIO; #use Benchmark; my $file = shift; #my $t0 = new Benchmark; my $s15; # get sequence { my $fasta_in = Bio::SeqIO->new( -file => "<$file", -format => 'fasta' ); my $seq_obj = $fasta_in->next_seq(); $s15 = $seq_obj->seq(); } use lib "bioperl-branch-1.4"; my $fasta_in = Bio::SeqIO->new( -file => "<$file", -format => 'fasta' ); my $seq_obj = $fasta_in->next_seq(); my $s14 = $seq_obj->seq(); print "same\n" if ($s15 eq $s14); exit(); From brian_osborne at cognia.com Thu Oct 6 21:19:44 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Oct 6 21:35:30 2005 Subject: [Bioperl-l] Need help for a GO request In-Reply-To: <1128611515.43453ebb681f0@imp2-g19.free.fr> Message-ID: C., There are different approaches, depending on what your accessions are. If your accessions are Locus/Gene ids from LocusLink/Entrez Gene then simply use the "entrezgene" format of SeqIO. If your accessions are Genbank nucleotide then you may be able to find the corresponding Entrez Gene ids by parsing the gene2accession file of Entrez Gene yourself, there is no module to do this in Bioperl but it will be easy. With these in hand you'll, again, use SeqIO and entrezgene to find the corresponding GO terms. You'll notice that my suggestions center around Genbank and Entrez Gene, but there are other databases that assign GO terms besides these. If you're new to Bioperl then taking a look at various HOWTOs could be useful (bioperl.org/HOWTOs). Brian O. On 10/6/05 11:11 AM, "ctemp2@free.fr" wrote: > Hello, > > I have a list of genes (accession numbers) and I would like to find the > associated informations in Gene Ontology (I work with those datas, find > biological process, common node between 2 genes...). > > I am a newbie with Bioperl and I would like to know : > > - if there would be a module to send a request to the NCBI to obtain the GO > informations for those genes > > - or if there would be a module to send the GO id to the GO site (I tried to > use > Bio::Ontology::SimpleGOEngine but the example doesn't work, I get an error > with > the 'parse' method which is not found by Perl in the module). > > Thanks if you can help me. > > Regards. > > C. Tobini > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Thu Oct 6 21:41:58 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Oct 6 21:46:49 2005 Subject: [Bioperl-l] GeneKeyDB -- finding orthologues question In-Reply-To: <259a224c0510061347q573a794cte5323b2883e090a6@mail.gmail.com> Message-ID: Olena, On 10/6/05 4:47 PM, "Olena Morozova" wrote: > Alternatively, is there another way to find orthologs? I am still > having troubles with EnsEMBL.... You're going to have to be more specific in order to get help. What commands have you issued using the Mysql client and what have the results been? Brian O. From radhika.narayanan at siritech.com Wed Oct 5 23:42:41 2005 From: radhika.narayanan at siritech.com (radhika.narayanan@siritech.com) Date: Fri Oct 7 09:22:06 2005 Subject: [Bioperl-l] Problem while downloading bioperl on windows 2000 Message-ID: Dear Sir, I downloaded the Active perl from http://www.activestate.com/Products/ActivePerl/ and downloaded the 5.6.1.638 ans also 5.6.1.813 with MSI for windows. I installed active perl with all default options. I opend a command prompt (Menus Start->Run and type cmd) and ran the PPM shell (C:\>ppm). Added two new PPM repositories with the following commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms ppm> rep add Bribes http://www.Bribes.org/perl/ppm After which i gave another command as mentioned in the guide ppm> search Bioperl. With this command it threw a message saying that it could not find any matches to bioperl. Hence now am unable to install bioperl for windows 2000 on my machine. i shall be greatful if you could give solutions to solve this problem asap. Regards, Radhika.N From wes.barris at csiro.au Fri Oct 7 01:31:15 2005 From: wes.barris at csiro.au (Wes Barris) Date: Fri Oct 7 09:22:10 2005 Subject: [Bioperl-l] Assembly error when parsing .ace files In-Reply-To: <43336BC5.9090304@csiro.au> References: <43335836.9050809@csiro.au> <43336BC5.9090304@csiro.au> Message-ID: <43460823.2070906@csiro.au> Wes Barris wrote: > Jason Stajich wrote: > >> Maybe you can make the changes locally, get it to work , then post a >> patch? Assuming the authors were writing it for phrap ace assemblies >> and coded it as such. > > > Ok. I'm not sure about how you want this formated but here is my fix: > > Bio/Assembly/IO/ace.pm line 384 > Old: > my ($phdfilename,$chromatfilename); > New: > my ($phdfilename,$chromatfilename) = qw(unset unset); Hi Jason, I notice that this has not yet made its way into the bioperl source. I don't have CVS access. Can you make the change? > >> >> -jason >> >> On Sep 22, 2005, at 9:19 PM, Wes Barris wrote: >> >>> Hi, >>> >>> I have a .ace (and associated files) that I want to parse with bioperl. >>> The following code produces the displayed error when run: >>> >>> #!/usr/bin/perl -w >>> use strict; >>> use Bio::Assembly::IO; >>> my $usage = "Usage: $0 \n"; >>> my $infile = shift or die $usage; >>> my $io = new Bio::Assembly::IO(-file=>$infile, -format=>'ace'); >>> my $assembly = $io->next_assembly(); >>> >>> Use of uninitialized value in concatenation (.) or string at /usr/ >>> lib/perl5/site_perl/5.8.5/Bio/Assembly/IO/ace.pm line 392, >>> line 1. >>> Use of uninitialized value in concatenation (.) or string at /usr/ >>> lib/perl5/site_perl/5.8.5/Bio/Assembly/IO/ace.pm line 392, >>> line 2. >>> >>> The problem is in the following code fragment from ace.pm: >>> >>> my ($phdfilename,$chromatfilename); >>> if ($seq->desc() =~ /PHD_FILE: (\S+)/) { >>> $phdfilename = $1; >>> } >>> if ($seq->desc() =~ /CHROMAT_FILE: (\S+)/) { >>> $chromatfilename = $1; >>> } >>> (my $phdfile = $singletsfilename) =~ s/edit_dir.*//; >>> $phdfile .= "phd_dir/$phdfilename"; >>> >>> The above code is reading the .singlets file and looking for the string >>> "PHD_FILE:" in the defline. That string does not exist in my singlets >>> deflines. My .ace (and associated files) are created using the cap3 >>> application. >>> >>> Perhaps ace.pm should be modified to gracefully handle the case where >>> singlet deflines do not contain the "PHD_FILE:" string. >>> >>> I am running a CVS version of bioperl updated on Sep 22 on a Linux >>> RHEL 4.1 system. Perl version 5.8.5. The defline of my singlet file >>> is: >>> >>> >BF654941 11920073 | BF654941.1 CLONE: unknown CLONE_LIB: MARC 3BOV >>> LEN: 491 bp FILE: 284.fa 5-PRIME DEFN: 279305 MARC 3BOV Bos taurus >>> cDNA 5', mRNA sequence. tissuetype=[pooled] organism=[Bos taurus] >>> >>> -- >>> Wes Barris >>> E-Mail: Wes.Barris@csiro.au >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12/ >> > > -- Wes Barris E-Mail: Wes.Barris@csiro.au From cjfields at uiuc.edu Fri Oct 7 10:28:55 2005 From: cjfields at uiuc.edu (Chris Fields) Date: Fri Oct 7 10:40:30 2005 Subject: [Bioperl-l] Problem while downloading bioperl on windows 2000 In-Reply-To: Message-ID: <000001c5cb4b$730329a0$15327e82@pyrimidine> I would try switching over to the newest Activestate perl 5.8 version (you're running 5.6). The newest version of perl 5.8 is 5.8.7.813. That may help a bit. I checked the ppd files at bioperl.org/DIST and the newest version look like they require 5.8, although a search with PPM shows that a number of them are available for "MSWin32-x86-multi-thread", which could be version 5.6. I'm not quite sure. You can also download a newer ppm at gmod.org (set the GMOD repository to http://www.gmod.org/ggb/ppm). It's listed as v 1.511 but really corresponds to the CVS version of bioperl checked out at that time, with all relevant updates for GBrowse. This last one is really made for compatibility with Gbrowse but works fine for everyday use; I haven't had any problems with it here. I think an official 1.51 release is also on the way (several release candidates have been sent out, but all are tarballed UNIX archives). A PPM should be available shortly after that. Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of radhika.narayanan@siritech.com Sent: Wednesday, October 05, 2005 10:43 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Problem while downloading bioperl on windows 2000 Dear Sir, I downloaded the Active perl from http://www.activestate.com/Products/ActivePerl/ and downloaded the 5.6.1.638 ans also 5.6.1.813 with MSI for windows. I installed active perl with all default options. I opend a command prompt (Menus Start->Run and type cmd) and ran the PPM shell (C:\>ppm). Added two new PPM repositories with the following commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms ppm> rep add Bribes http://www.Bribes.org/perl/ppm After which i gave another command as mentioned in the guide ppm> search Bioperl. With this command it threw a message saying that it could not find any matches to bioperl. Hence now am unable to install bioperl for windows 2000 on my machine. i shall be greatful if you could give solutions to solve this problem asap. Regards, Radhika.N _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From bmoore at genetics.utah.edu Fri Oct 7 10:56:31 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Fri Oct 7 11:04:06 2005 Subject: [Bioperl-l] Problem while downloading bioperl on windows 2000 Message-ID: Radhika- That is odd. Is is possible that your network connection is down? What happens if you do something like 'ppm>search CGI'? You could download the ppd files from their respective repositories, store them locally, and add another local ppm repository, but that would be a pita and you certainly shouldn't have to do that. Since it sounds like you've just got ActivePerl running, try installing some other packages like IO::String, Text::Wrap or HTML::Parser. Any luck with those? Finally, if you can't get that to work, you can compile and install BioPerl from source. You'll need to install the dependencies first. You can find a list of these in http://bioperl.org/Core/Latest/INSTALL. If ppm isn't working you'll have to install these from source as well. This process will take some work and patience, but theoretically it can be done on Windows - although I've never done a complete bioperl installation this way, I've install a lot of other modules this way and it does work). Basically you'll do this: 1. Get nmake the windows c comiler from: (http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake1 5.exe) and put it somewhere in your path. 2. Get the standard bioperl installation document from http://bioperl.org/Core/Latest/INSTALL 3. Installation of perl modules from source basically involves four steps: perl Makefile.pl nmake nmake test nmake install 3. Download and install each dependency from www.cpan.org. Some of them may have their own dependencies, and you so you will have to read the documentation for each one. 4. Download and install the bioperl distribution. Barry -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of radhika.narayanan@siritech.com Sent: Wednesday, October 05, 2005 9:43 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Problem while downloading bioperl on windows 2000 Dear Sir, I downloaded the Active perl from http://www.activestate.com/Products/ActivePerl/ and downloaded the 5.6.1.638 ans also 5.6.1.813 with MSI for windows. I installed active perl with all default options. I opend a command prompt (Menus Start->Run and type cmd) and ran the PPM shell (C:\>ppm). Added two new PPM repositories with the following commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms ppm> rep add Bribes http://www.Bribes.org/perl/ppm After which i gave another command as mentioned in the guide ppm> search Bioperl. With this command it threw a message saying that it could not find any matches to bioperl. Hence now am unable to install bioperl for windows 2000 on my machine. i shall be greatful if you could give solutions to solve this problem asap. Regards, Radhika.N _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Oct 7 11:22:12 2005 From: cjfields at uiuc.edu (Chris Fields) Date: Fri Oct 7 11:36:59 2005 Subject: [Bioperl-l] Problem while downloading bioperl on windows 2000 In-Reply-To: Message-ID: <000401c5cb52$e4607290$15327e82@pyrimidine> The source installation method (using nmake) still works, although don't be surprised if it decides to crash out at the end. It usually doesn't mater by that point as Bioperl will be installed. I haven't used this in a long time but I don't think anything has changed to cause problems with installation using this method. Out of curiosity, has anyone tried using MinGW's make or other Win32 make versions besides nmake to do this? I'm thinking along the lines for testing purposes using 'make test,' as nmake was always a bit testy. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Barry Moore Sent: Friday, October 07, 2005 9:57 AM To: radhika.narayanan@siritech.com; bioperl-l@bioperl.org Subject: RE: [Bioperl-l] Problem while downloading bioperl on windows 2000 Radhika- That is odd. Is is possible that your network connection is down? What happens if you do something like 'ppm>search CGI'? You could download the ppd files from their respective repositories, store them locally, and add another local ppm repository, but that would be a pita and you certainly shouldn't have to do that. Since it sounds like you've just got ActivePerl running, try installing some other packages like IO::String, Text::Wrap or HTML::Parser. Any luck with those? Finally, if you can't get that to work, you can compile and install BioPerl from source. You'll need to install the dependencies first. You can find a list of these in http://bioperl.org/Core/Latest/INSTALL. If ppm isn't working you'll have to install these from source as well. This process will take some work and patience, but theoretically it can be done on Windows - although I've never done a complete bioperl installation this way, I've install a lot of other modules this way and it does work). Basically you'll do this: 1. Get nmake the windows c comiler from: (http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake1 5.exe) and put it somewhere in your path. 2. Get the standard bioperl installation document from http://bioperl.org/Core/Latest/INSTALL 3. Installation of perl modules from source basically involves four steps: perl Makefile.pl nmake nmake test nmake install 3. Download and install each dependency from www.cpan.org. Some of them may have their own dependencies, and you so you will have to read the documentation for each one. 4. Download and install the bioperl distribution. Barry -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of radhika.narayanan@siritech.com Sent: Wednesday, October 05, 2005 9:43 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Problem while downloading bioperl on windows 2000 Dear Sir, I downloaded the Active perl from http://www.activestate.com/Products/ActivePerl/ and downloaded the 5.6.1.638 ans also 5.6.1.813 with MSI for windows. I installed active perl with all default options. I opend a command prompt (Menus Start->Run and type cmd) and ran the PPM shell (C:\>ppm). Added two new PPM repositories with the following commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms ppm> rep add Bribes http://www.Bribes.org/perl/ppm After which i gave another command as mentioned in the guide ppm> search Bioperl. With this command it threw a message saying that it could not find any matches to bioperl. Hence now am unable to install bioperl for windows 2000 on my machine. i shall be greatful if you could give solutions to solve this problem asap. Regards, Radhika.N _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Fri Oct 7 12:20:01 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Oct 7 12:18:45 2005 Subject: [Bioperl-l] Assembly error when parsing .ace files In-Reply-To: <43460823.2070906@csiro.au> References: <43335836.9050809@csiro.au> <43336BC5.9090304@csiro.au> <43460823.2070906@csiro.au> Message-ID: done. thought it had already gone in. thx. On Oct 7, 2005, at 1:31 AM, Wes Barris wrote: >> my ($phdfilename,$chromatfilename) = qw(unset unset); -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason at cgt.duhs.duke.edu Fri Oct 7 14:53:08 2005 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Oct 7 15:34:43 2005 Subject: [Bioperl-l] Fwd: Help with BPLite References: <7930EE6CD7CA354D93B444D0433C061120DE15@NIHCESMLBX6.nih.gov> Message-ID: <8E7B5246-FFFB-440A-8227-AB83EA18FB2F@cgt.duhs.duke.edu> Begin forwarded message: > From: "Li, Jianying (NIH/NIEHS)" > Date: October 7, 2005 1:58:20 PM EDT > Subject: Help with BPLite > > > Dear all, > > I am trying to use BPLite package from BioPerl on a windows > machine, and got a warning message. > > Here is the sample code from BioPerl doc: > > use Bio::Tools::BPlite; > open (FH, "blastResult.txt") || die "Can't open file for input.\n"; > my $report = new Bio::Tools::BPlite(-fh=>\*FH); > { > while(my $sbjct = $report->nextSbjct) { > print ">",$sbjct->name,"\n"; > while(my $hsp = $sbjct->nextHSP) { > print "\t",$hsp->start,"..",$hsp- > >end," ",$hsp->bits,"\n"; > } > } > } > exit 0; > > Then I tried run it on a Linux machine, and it worked. > > Could you please provide me some information how I can do it > properly in Windows system? > > Your any attention is appreciated. > > Jianying Li > Bioinformatics Scientist > NIEHS ITSS Contractor > NIEHS MicroArray Center > 111 Alexander Drive, Rm. C250 > P.O. Box 12233 > Research Triangle Park, NC 27709 > VM:919-316-4612 Fax: 919-541-1506 > li11@niehs.nih.gov > http://www.niehs.nih.gov > http://dir.niehs.nih.gov/microarray > > > > From bmoore at genetics.utah.edu Fri Oct 7 16:02:26 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Fri Oct 7 16:14:08 2005 Subject: [Bioperl-l] Fwd: Help with BPLite Message-ID: Jianying- Your script works fine for me on Windows XP, bioperl 1.4. Give us some details about your problem. What errors are you getting? What version of bioperl are you using. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason Stajich Sent: Friday, October 07, 2005 12:53 PM To: Bioperl list Cc: li11@niehs.nih.gov Subject: [Bioperl-l] Fwd: Help with BPLite Begin forwarded message: > From: "Li, Jianying (NIH/NIEHS)" > Date: October 7, 2005 1:58:20 PM EDT > Subject: Help with BPLite > > > Dear all, > > I am trying to use BPLite package from BioPerl on a windows > machine, and got a warning message. > > Here is the sample code from BioPerl doc: > > use Bio::Tools::BPlite; > open (FH, "blastResult.txt") || die "Can't open file for input.\n"; > my $report = new Bio::Tools::BPlite(-fh=>\*FH); > { > while(my $sbjct = $report->nextSbjct) { > print ">",$sbjct->name,"\n"; > while(my $hsp = $sbjct->nextHSP) { > print "\t",$hsp->start,"..",$hsp- > >end," ",$hsp->bits,"\n"; > } > } > } > exit 0; > > Then I tried run it on a Linux machine, and it worked. > > Could you please provide me some information how I can do it > properly in Windows system? > > Your any attention is appreciated. > > Jianying Li > Bioinformatics Scientist > NIEHS ITSS Contractor > NIEHS MicroArray Center > 111 Alexander Drive, Rm. C250 > P.O. Box 12233 > Research Triangle Park, NC 27709 > VM:919-316-4612 Fax: 919-541-1506 > li11@niehs.nih.gov > http://www.niehs.nih.gov > http://dir.niehs.nih.gov/microarray > > > > _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From bmoore at genetics.utah.edu Fri Oct 7 17:30:49 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Fri Oct 7 17:25:32 2005 Subject: [Bioperl-l] Fwd: Help with BPLite Message-ID: Jianying- There may be better ways, but you can check the version with perl -MBio::Root::Version -e 'print "$Bio::Root::Version::VERSION\n"'. Do you want to send the Blast report that you are having trouble parsing and I'll try it? Barry -----Original Message----- From: Li, Jianying (NIH/NIEHS) [mailto:li11@niehs.nih.gov] Sent: Friday, October 07, 2005 2:09 PM To: Barry Moore Subject: RE: [Bioperl-l] Fwd: Help with BPLite Thank you for your attention! I am using Windows XP, but I am not sure about my bioperl version. How can I test this? I used PPM to install bioperl: PPM install Bioperl, but never test version. The error message is pretty general, exactly like this: ------------------WARNING--------------------------- MSG: Possible error (1) while parsing BLAST report! ---------------------------------------------------- Thanks. Jianying Li Bioinformatics Scientist NIEHS ITSS Contractor NIEHS MicroArray Center 111 Alexander Drive, Rm. C250 P.O. Box 12233 Research Triangle Park, NC 27709 VM:919-316-4612 Fax: 919-541-1506 li11@niehs.nih.gov http://www.niehs.nih.gov http://dir.niehs.nih.gov/microarray -----Original Message----- From: Barry Moore [mailto:bmoore@genetics.utah.edu] Sent: Friday, October 07, 2005 4:02 PM To: Jason Stajich; Bioperl list Cc: Li, Jianying (NIH/NIEHS) Subject: RE: [Bioperl-l] Fwd: Help with BPLite Jianying- Your script works fine for me on Windows XP, bioperl 1.4. Give us some details about your problem. What errors are you getting? What version of bioperl are you using. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason Stajich Sent: Friday, October 07, 2005 12:53 PM To: Bioperl list Cc: li11@niehs.nih.gov Subject: [Bioperl-l] Fwd: Help with BPLite Begin forwarded message: > From: "Li, Jianying (NIH/NIEHS)" > Date: October 7, 2005 1:58:20 PM EDT > Subject: Help with BPLite > > > Dear all, > > I am trying to use BPLite package from BioPerl on a windows machine, > and got a warning message. > > Here is the sample code from BioPerl doc: > > use Bio::Tools::BPlite; > open (FH, "blastResult.txt") || die "Can't open file for input.\n"; my > $report = new Bio::Tools::BPlite(-fh=>\*FH); { > while(my $sbjct = $report->nextSbjct) { > print ">",$sbjct->name,"\n"; > while(my $hsp = $sbjct->nextHSP) { > print "\t",$hsp->start,"..",$hsp- > >end," ",$hsp->bits,"\n"; > } > } > } > exit 0; > > Then I tried run it on a Linux machine, and it worked. > > Could you please provide me some information how I can do it properly > in Windows system? > > Your any attention is appreciated. > > Jianying Li > Bioinformatics Scientist > NIEHS ITSS Contractor > NIEHS MicroArray Center > 111 Alexander Drive, Rm. C250 > P.O. Box 12233 > Research Triangle Park, NC 27709 > VM:919-316-4612 Fax: 919-541-1506 > li11@niehs.nih.gov > http://www.niehs.nih.gov > http://dir.niehs.nih.gov/microarray > > > > _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Sat Oct 8 20:02:36 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat Oct 8 21:27:39 2005 Subject: [Bioperl-l] release candidate 4 (hopefully the last!), release information Message-ID: Several bug fixes went in last week so I've made what I hope to be the last release candidate before 1.5.1 goes out. Brian and I have been in the process of updating the module documentation so that the bug-info email points to the bugzilla page only. It is boring and tedious so we've not really done all of them yet, but I don't see this as a showstopper for a developer release. Files are here: http://bioperl.org/DIST/bioperl-1.5.1-rc4.tar.gz http://bioperl.org/DIST/bioperl-1.5.1-rc4.zip http://bioperl.org/DIST/bioperl-run-1.5.1-rc4.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.1-rc4.zip http://bioperl.org/DIST/bioperl-ext-1.5.1-rc4.tar.gz http://bioperl.org/DIST/bioperl-ext-1.5.1-rc4.zip I have put a tag in on CVS called bioperl-release-1-5-1-rc4 in bioperl-live, bioperl-run-release-1-5-1-rc4 in bioperl-run, bioperl- ext-release-1-5-1-rc4 bioperl-ext. If you make any changes that should go in 1.5.1 you will let us know and we will need to make a branch and apply the changes to the branch only. I will only make the release from the branch now. This will not be the branch for 1.6 but I expect we need to discuss a plan for that. Still positions available for a release master/ pumpking for the 1.6 release. I want 1.6 to go out in Q1 2006, but willing to hear debate on the topic. Please also have a look at the Changes file, if there is something missing, add it! Hilmar, I want to make a bioperl-db release, is this from the HEAD or is there a branch? If you want to make the release files please do, otherwise let me know which tags it should have (and if you are ready). In terms of version number, I don't care too much if it is 1.5.1 or something else. We just need to document it on a website/FAQ. I guess I would lean towards not calling it 1.5.1 but I don't care that much. I will plan to make the release on Thursday or Friday of this week (13-Oct or 14-Oct) if there are no holdups. -jason -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From skirov at utk.edu Sat Oct 8 22:02:34 2005 From: skirov at utk.edu (Stefan Kirov) Date: Sat Oct 8 22:09:33 2005 Subject: [Bioperl-l] release candidate 4 (hopefully the last!), release information In-Reply-To: References: Message-ID: <43487A3A.1000705@utk.edu> Jason, I added Bio::SeqIO::entrezgene in Changes and listed Bio::ASN1::EntrezGene as a dependency for Bio::SeqIO::entrezgene in Makefile.PL. My guess is both should go to 1.5.1 Stefan Jason Stajich wrote: > Several bug fixes went in last week so I've made what I hope to be > the last release candidate before 1.5.1 goes out. Brian and I have > been in the process of updating the module documentation so that the > bug-info email points to the bugzilla page only. It is boring and > tedious so we've not really done all of them yet, but I don't see > this as a showstopper for a developer release. > > Files are here: > > http://bioperl.org/DIST/bioperl-1.5.1-rc4.tar.gz > http://bioperl.org/DIST/bioperl-1.5.1-rc4.zip > > http://bioperl.org/DIST/bioperl-run-1.5.1-rc4.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.1-rc4.zip > > http://bioperl.org/DIST/bioperl-ext-1.5.1-rc4.tar.gz > http://bioperl.org/DIST/bioperl-ext-1.5.1-rc4.zip > > I have put a tag in on CVS called bioperl-release-1-5-1-rc4 in > bioperl-live, bioperl-run-release-1-5-1-rc4 in bioperl-run, bioperl- > ext-release-1-5-1-rc4 bioperl-ext. If you make any changes that > should go in 1.5.1 you will let us know and we will need to make a > branch and apply the changes to the branch only. I will only make > the release from the branch now. > > This will not be the branch for 1.6 but I expect we need to discuss a > plan for that. Still positions available for a release master/ > pumpking for the 1.6 release. I want 1.6 to go out in Q1 2006, but > willing to hear debate on the topic. > > Please also have a look at the Changes file, if there is something > missing, add it! > > Hilmar, I want to make a bioperl-db release, is this from the HEAD or > is there a branch? If you want to make the release files please do, > otherwise let me know which tags it should have (and if you are ready). > > In terms of version number, I don't care too much if it is 1.5.1 or > something else. We just need to document it on a website/FAQ. I > guess I would lean towards not calling it 1.5.1 but I don't care that > much. > > I will plan to make the release on Thursday or Friday of this week > (13-Oct or 14-Oct) if there are no holdups. > > -jason > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From jason at portal.open-bio.org Sat Oct 8 18:39:24 2005 From: jason at portal.open-bio.org (Jason Stajich) Date: Sat Oct 8 22:57:59 2005 Subject: [Bioperl-l] perlcast Message-ID: <9995F05D-E1A9-4B8C-8964-B4DB3F06C220@bioperl.org> Hi folks - At some point some of us will answer questions on the Perlcast podcast (http://perlcast.com/). If you have some burning questions that you wanted us to answer in digital audio form (instead of digital email) about the project, future, development, etc feel free to post them on Josh's blog. http://perlcast.com/2005/09/14/questions-suggestions-for-bio-perl/ -jason From hlapp at gmx.net Sat Oct 8 22:32:44 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Oct 8 22:58:57 2005 Subject: [Bioperl-l] release candidate 4 (hopefully the last!), release information In-Reply-To: References: Message-ID: <57d1c596c0fa63c4954711bb6124d528@gmx.net> On Oct 8, 2005, at 5:02 PM, Jason Stajich wrote: > Hilmar, I want to make a bioperl-db release, is this from the HEAD or > is there a branch? If you want to make the release files please do, > otherwise let me know which tags it should have (and if you are > ready). I mentioned the caveats that apply - primarily documentation. So I guess it is ready for release. There is no particular tag yet, but I guess once released we will want to branch-tag for the release as well as alias-tag for the bioperl release name. If you you can include bioperl-db in one sweep then please go ahead; I may otherwise just hold things up until I understand each step. And thanks in advance, I do appreciate you jumping in and taking care of this. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jason.stajich at duke.edu Sun Oct 9 12:08:31 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun Oct 9 13:03:47 2005 Subject: [Bioperl-l] release candidate 4 (hopefully the last!), release information In-Reply-To: <200510091647.38703.heikki.lehvaslaiho@gmail.com> References: <200510091647.38703.heikki.lehvaslaiho@gmail.com> Message-ID: Wow thank you! and thank goodness for boring and tedious people... ;-) I'll sync them to the branch I just made. To do this I'll check out the branch and merge the changes. So everyone can see how it is done: $ cvs -d:ext:.... -r bioperl-branch-1-5-1 -d bioperl-1.5.1-branch bioperl $ cd bioperl-1.5.1-branch $ cvs update -j HEAD $ cvs commit -m "merged changes from HEAD to 1.5.1 branch" And so on for bioperl-run and bioperl-ext. On Oct 9, 2005, at 11:47 AM, Heikki Lehvaslaiho wrote: > Jason, > > On Sunday 09 October 2005 01:02, Jason Stajich wrote: > >> Several bug fixes went in last week so I've made what I hope to be >> the last release candidate before 1.5.1 goes out. Brian and I have >> been in the process of updating the module documentation so that the >> bug-info email points to the bugzilla page only. It is boring and >> tedious so we've not really done all of them yet, but I don't see >> this as a showstopper for a developer release. >> > > I agree it is a boring and tedious work but it is done in all three > repositories. However, being a boring and tedious person, I did not > read the > rest of your mail before making all those changes into cvs head. ;-( > > Could you put in new tags and include these changes into the release? > I did not touch any perl code. > > -Heikki > > P.S. Everyone: Your next 'cvs update' will be HUGE so make sure > you'll have a > fast net connection and allow plenty of time. ;-) > > > >> Files are here: >> >> http://bioperl.org/DIST/bioperl-1.5.1-rc4.tar.gz >> http://bioperl.org/DIST/bioperl-1.5.1-rc4.zip >> >> http://bioperl.org/DIST/bioperl-run-1.5.1-rc4.tar.gz >> http://bioperl.org/DIST/bioperl-run-1.5.1-rc4.zip >> >> http://bioperl.org/DIST/bioperl-ext-1.5.1-rc4.tar.gz >> http://bioperl.org/DIST/bioperl-ext-1.5.1-rc4.zip >> >> I have put a tag in on CVS called bioperl-release-1-5-1-rc4 in >> bioperl-live, bioperl-run-release-1-5-1-rc4 in bioperl-run, bioperl- >> ext-release-1-5-1-rc4 bioperl-ext. If you make any changes that >> should go in 1.5.1 you will let us know and we will need to make a >> branch and apply the changes to the branch only. I will only make >> the release from the branch now. >> >> This will not be the branch for 1.6 but I expect we need to discuss a >> plan for that. Still positions available for a release master/ >> pumpking for the 1.6 release. I want 1.6 to go out in Q1 2006, but >> willing to hear debate on the topic. >> >> Please also have a look at the Changes file, if there is something >> missing, add it! >> >> Hilmar, I want to make a bioperl-db release, is this from the HEAD or >> is there a branch? If you want to make the release files please do, >> otherwise let me know which tags it should have (and if you are >> ready). >> >> In terms of version number, I don't care too much if it is 1.5.1 or >> something else. We just need to document it on a website/FAQ. I >> guess I would lean towards not calling it 1.5.1 but I don't care that >> much. >> >> I will plan to make the release on Thursday or Friday of this week >> (13-Oct or 14-Oct) if there are no holdups. >> >> -jason >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12/ >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambridge, CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From heikki.lehvaslaiho at gmail.com Sun Oct 9 11:47:38 2005 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Sun Oct 9 21:30:01 2005 Subject: [Bioperl-l] release candidate 4 (hopefully the last!), release information In-Reply-To: References: Message-ID: <200510091647.38703.heikki.lehvaslaiho@gmail.com> Jason, On Sunday 09 October 2005 01:02, Jason Stajich wrote: > Several bug fixes went in last week so I've made what I hope to be > the last release candidate before 1.5.1 goes out. Brian and I have > been in the process of updating the module documentation so that the > bug-info email points to the bugzilla page only. It is boring and > tedious so we've not really done all of them yet, but I don't see > this as a showstopper for a developer release. I agree it is a boring and tedious work but it is done in all three repositories. However, being a boring and tedious person, I did not read the rest of your mail before making all those changes into cvs head. ;-( Could you put in new tags and include these changes into the release? I did not touch any perl code. -Heikki P.S. Everyone: Your next 'cvs update' will be HUGE so make sure you'll have a fast net connection and allow plenty of time. ;-) > Files are here: > > http://bioperl.org/DIST/bioperl-1.5.1-rc4.tar.gz > http://bioperl.org/DIST/bioperl-1.5.1-rc4.zip > > http://bioperl.org/DIST/bioperl-run-1.5.1-rc4.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.1-rc4.zip > > http://bioperl.org/DIST/bioperl-ext-1.5.1-rc4.tar.gz > http://bioperl.org/DIST/bioperl-ext-1.5.1-rc4.zip > > I have put a tag in on CVS called bioperl-release-1-5-1-rc4 in > bioperl-live, bioperl-run-release-1-5-1-rc4 in bioperl-run, bioperl- > ext-release-1-5-1-rc4 bioperl-ext. If you make any changes that > should go in 1.5.1 you will let us know and we will need to make a > branch and apply the changes to the branch only. I will only make > the release from the branch now. > > This will not be the branch for 1.6 but I expect we need to discuss a > plan for that. Still positions available for a release master/ > pumpking for the 1.6 release. I want 1.6 to go out in Q1 2006, but > willing to hear debate on the topic. > > Please also have a look at the Changes file, if there is something > missing, add it! > > Hilmar, I want to make a bioperl-db release, is this from the HEAD or > is there a branch? If you want to make the release files please do, > otherwise let me know which tags it should have (and if you are ready). > > In terms of version number, I don't care too much if it is 1.5.1 or > something else. We just need to document it on a website/FAQ. I > guess I would lean towards not calling it 1.5.1 but I don't care that > much. > > I will plan to make the release on Thursday or Friday of this week > (13-Oct or 14-Oct) if there are no holdups. > > -jason > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From jason at cgt.duhs.duke.edu Tue Oct 11 13:52:30 2005 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Oct 11 15:09:31 2005 Subject: [Bioperl-l] Re: bioperl-live make test failure In-Reply-To: <1129048757.3929.56.camel@localhost.localdomain> References: <1129048757.3929.56.camel@localhost.localdomain> Message-ID: <1C14B8CD-86E3-4DC5-8654-679BE6E098DB@cgt.duhs.duke.edu> fixed. On Oct 11, 2005, at 12:39 PM, Scott Cain wrote: > Hi Jason, > > You are still working on the 1.5.1 release from HEAD, right? I > just did > a cvs update -d, and make test fails thusly: > > t/tigrxml....................ok 1/48Bio::SeqIO: tigrxml cannot be > found > Exception > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Failed to load module Bio::SeqIO::tigrxml. Can't locate XML/ > SAX/Writer.pm in @INC (@INC contains: t /Users/cain/cvs_stuff/ > bioperl-live/blib/lib /Users/cain/cvs_stuff/bioperl-live/blib/arch / > sw/lib/perl5 /sw/lib/perl5/darwin /usr/local/lib/perl5/5.8.4/ > darwin-2level /usr/local/lib/perl5/5.8.4 /usr/local/lib/perl5/ > site_perl/5.8.4/darwin-2level /usr/local/lib/perl5/site_perl/5.8.4 / > usr/local/lib/perl5/site_perl .) at /Users/cain/cvs_stuff/bioperl- > live/blib/lib/Bio/SeqIO/tigrxml.pm line 71. > BEGIN failed--compilation aborted at /Users/cain/cvs_stuff/bioperl- > live/blib/lib/Bio/SeqIO/tigrxml.pm line 71. > Compilation failed in require at /Users/cain/cvs_stuff/bioperl-live/ > blib/lib/Bio/Root/Root.pm line 396. > > STACK: Error::throw > STACK: Bio::Root::Root::throw /Users/cain/cvs_stuff/bioperl-live/ > blib/lib/Bio/Root/Root.pm:328 > STACK: Bio::Root::Root::_load_module /Users/cain/cvs_stuff/bioperl- > live/blib/lib/Bio/Root/Root.pm:398 > STACK: Bio::SeqIO::_load_format_module /Users/cain/cvs_stuff/ > bioperl-live/blib/lib/Bio/SeqIO.pm:552 > STACK: Bio::SeqIO::new /Users/cain/cvs_stuff/bioperl-live/blib/lib/ > Bio/SeqIO.pm:379 > STACK: t/tigrxml.t:22 > ----------------------------------------------------------- > > For more information about the SeqIO system please see the SeqIO docs. > This includes ways of checking for formats at compile time, not run > time > Can't call method "verbose" on an undefined value at t/tigrxml.t > line 26. > t/tigrxml....................dubious > Test returned status 255 (wstat 65280, 0xff00) > DIED. FAILED tests 2-48 > Failed 47/48 tests, 2.08% okay > > > This is on Mac OSX 10.3, perl 5.8.4 built from source. > > Thanks, > Scott > > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain@cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > From arareko at campus.iztacala.unam.mx Wed Oct 12 03:05:18 2005 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed Oct 12 03:31:20 2005 Subject: [Bioperl-l] Proposal for AI methods in Bioperl Message-ID: <434CB5AE.3040307@campus.iztacala.unam.mx> Hi all, As far as I know, there aren't any modules in Bioperl providing access to Artificial Intelligence (AI) methods such as neural networks, genetic algorithms, cellular automata, among others. At the moment, I'm interested in developing some interface for using genetic algorithms. By using the interface, users could access these methods for generating optimized schemes based on sets of homologous sequences. I think that the generated results could be a nice alternative for motif/pattern searching in diverse novel sequences (instead of regular expressions, HMM's and other methods). What do you think? Any guidance would be really helpful. If you know of any external application that you can recomend for using with the interface, a suggested location in the code tree, etc., please let me know. I've thought of something like a Bio::AI family that can be included in the bioperl-live or maybe as a bioperl-ai package. I'm open to all kind of thoughts. Thanks in advance. Regards, Mauricio. -- MAURICIO HERRERA CUADRA arareko@campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From arareko at campus.iztacala.unam.mx Wed Oct 12 03:29:23 2005 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed Oct 12 03:32:56 2005 Subject: [Bioperl-l] FreeBSD ports In-Reply-To: <20050929183054.GF61107@iib.unsam.edu.ar> References: <433C16A0.2010901@campus.iztacala.unam.mx> <17212.7697.291176.750558@satchel.alerce.com> <20050929183054.GF61107@iib.unsam.edu.ar> Message-ID: <434CBB53.4040000@campus.iztacala.unam.mx> Hi folks, Just a quick mail to tell you that now I'm porting p5-Bio-ASN1-EntrezGene for having it ready for the bioperl-live future port update (Fernan is the kind maintainer of it). What have you thought about bringing back to life freebsd-bio@ ? Regards, Mauricio. -- MAURICIO HERRERA CUADRA arareko@campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From thechans at citiz.net Wed Oct 12 09:06:38 2005 From: thechans at citiz.net (Zhe Chen) Date: Wed Oct 12 09:13:26 2005 Subject: [Bioperl-l] strange problem about seq->alphabet Message-ID: <200510121313.j9CDDInq010721@portal.open-bio.org> i have some problem on extracting info. about seq->alphabet from gb file.Because some sequences are mRNA while others are DNA, but after running the code. it ignores mRNA but report all are DNA. why it happen and how to solve it? ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡Zhe Chen ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡thechans@citiz.net ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡2005-10-12 From mseewald at gmail.com Wed Oct 12 03:53:53 2005 From: mseewald at gmail.com (Michael Seewald) Date: Wed Oct 12 09:38:39 2005 Subject: [Bioperl-l] Proposal for AI methods in Bioperl In-Reply-To: <434CB5AE.3040307@campus.iztacala.unam.mx> References: <434CB5AE.3040307@campus.iztacala.unam.mx> Message-ID: Hi Mauricio, You might want to check out R (r-project.org ) and its biostatistical modules (bioconductor.org ). As much as I love perl/bioperl, R is the place to go for OS biostatistics computing. You will duplicate a lot of functionality, if you start implementing e.g. neural networking in perl/bioperl. It *may* be worthwile, however, to take some of the C code from R in order to use it in perl calls. I would consider this only, if perl code/speed/parsing outweighs the R code you need. Michael On 10/12/05, Mauricio Herrera Cuadra wrote: > > Hi all, > > As far as I know, there aren't any modules in Bioperl providing access > to Artificial Intelligence (AI) methods such as neural networks, genetic > algorithms, cellular automata, among others. > > At the moment, I'm interested in developing some interface for using > genetic algorithms. By using the interface, users could access these > methods for generating optimized schemes based on sets of homologous > sequences. I think that the generated results could be a nice > alternative for motif/pattern searching in diverse novel sequences > (instead of regular expressions, HMM's and other methods). What do you > think? > > Any guidance would be really helpful. If you know of any external > application that you can recomend for using with the interface, a > suggested location in the code tree, etc., please let me know. I've > thought of something like a Bio::AI family that can be included in the > bioperl-live or maybe as a bioperl-ai package. I'm open to all kind of > thoughts. Thanks in advance. > > Regards, > Mauricio. > -- > MAURICIO HERRERA CUADRA > arareko@campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From Marc.Logghe at DEVGEN.com Wed Oct 12 09:48:30 2005 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Wed Oct 12 10:00:05 2005 Subject: [Bioperl-l] strange problem about seq->alphabet Message-ID: <0C528E3670D8CE4B8E013F6749231AA62F563F@ANTARESIA.be.devgen.com> > i have some problem on extracting info. about seq->alphabet > from gb file.Because some sequences are mRNA while others are > DNA, but after running the code. it ignores mRNA but report > all are DNA. why it happen and how to solve it? Hi, You need to call $seq->molecule instead. A call to alphabet will return the sequence alphabet, the valid types being 'dna', 'rna' and 'protein'. In case the sequence of the genbank record actually contains Us instead of Ts, then alphabet *should* return 'rna'. Molecule returns the molecule property extracted from the genbank LOCUS line. HTH, Marc From hlapp at gmx.net Wed Oct 12 12:09:30 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Oct 12 12:15:34 2005 Subject: [Bioperl-l] Proposal for AI methods in Bioperl In-Reply-To: <434CB5AE.3040307@campus.iztacala.unam.mx> References: <434CB5AE.3040307@campus.iztacala.unam.mx> Message-ID: You might want to look at the WEKA project for some design guidance and inspiration. The home page is at http://www.cs.waikato.ac.nz/ml/weka/ and the API is documented at http://weka.sourceforge.net/doc/ Something that's equally well structured I think could be a generally useful contribution. I'd expect that to use a lot of compiled code though, number crunching just isn't what perl is best at. Also, it may be worth checking out the dp package in Biojava for dynamic programming algorithms. -hilmar On Oct 12, 2005, at 12:05 AM, Mauricio Herrera Cuadra wrote: > Hi all, > > As far as I know, there aren't any modules in Bioperl providing access > to Artificial Intelligence (AI) methods such as neural networks, > genetic algorithms, cellular automata, among others. > > At the moment, I'm interested in developing some interface for using > genetic algorithms. By using the interface, users could access these > methods for generating optimized schemes based on sets of homologous > sequences. I think that the generated results could be a nice > alternative for motif/pattern searching in diverse novel sequences > (instead of regular expressions, HMM's and other methods). What do you > think? > > Any guidance would be really helpful. If you know of any external > application that you can recomend for using with the interface, a > suggested location in the code tree, etc., please let me know. I've > thought of something like a Bio::AI family that can be included in the > bioperl-live or maybe as a bioperl-ai package. I'm open to all kind of > thoughts. Thanks in advance. > > Regards, > Mauricio. > -- > MAURICIO HERRERA CUADRA > arareko@campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From sdavis2 at mail.nih.gov Wed Oct 12 12:40:40 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed Oct 12 12:46:26 2005 Subject: [Bioperl-l] Proposal for AI methods in Bioperl In-Reply-To: Message-ID: On 10/12/05 12:09 PM, "Hilmar Lapp" wrote: > You might want to look at the WEKA project for some design guidance and > inspiration. The home page is at http://www.cs.waikato.ac.nz/ml/weka/ > and the API is documented at http://weka.sourceforge.net/doc/ > > Something that's equally well structured I think could be a generally > useful contribution. I'd expect that to use a lot of compiled code > though, number crunching just isn't what perl is best at. > > Also, it may be worth checking out the dp package in Biojava for > dynamic programming algorithms. Moving slightly farther away from the original topic, but the bioconductor project, and R more generally, offer many machine-learning tools and packages. http://www.bioconductor.org http://www.r-project.org Sean From palmeida at igc.gulbenkian.pt Wed Oct 12 12:59:57 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Wed Oct 12 13:16:57 2005 Subject: [Bioperl-l] Proposal for AI methods in Bioperl In-Reply-To: References: <434CB5AE.3040307@campus.iztacala.unam.mx> Message-ID: <200510121759.58082.palmeida@igc.gulbenkian.pt> I have used GALib in the past, a C++ library for Genetic Algorithms, which could give you some ideas. It is located here: http://lancet.mit.edu/ga/ There is an overview here: http://lancet.mit.edu/galib-2.4/Overview.html -Paulo On Wednesday 12 October 2005 17:09, Hilmar Lapp wrote: > You might want to look at the WEKA project for some design guidance and > inspiration. The home page is at http://www.cs.waikato.ac.nz/ml/weka/ > and the API is documented at http://weka.sourceforge.net/doc/ > > Something that's equally well structured I think could be a generally > useful contribution. I'd expect that to use a lot of compiled code > though, number crunching just isn't what perl is best at. > > Also, it may be worth checking out the dp package in Biojava for > dynamic programming algorithms. > > -hilmar > > On Oct 12, 2005, at 12:05 AM, Mauricio Herrera Cuadra wrote: > > Hi all, > > > > As far as I know, there aren't any modules in Bioperl providing access > > to Artificial Intelligence (AI) methods such as neural networks, > > genetic algorithms, cellular automata, among others. > > > > At the moment, I'm interested in developing some interface for using > > genetic algorithms. By using the interface, users could access these > > methods for generating optimized schemes based on sets of homologous > > sequences. I think that the generated results could be a nice > > alternative for motif/pattern searching in diverse novel sequences > > (instead of regular expressions, HMM's and other methods). What do you > > think? > > > > Any guidance would be really helpful. If you know of any external > > application that you can recomend for using with the interface, a > > suggested location in the code tree, etc., please let me know. I've > > thought of something like a Bio::AI family that can be included in the > > bioperl-live or maybe as a bioperl-ai package. I'm open to all kind of > > thoughts. Thanks in advance. > > > > Regards, > > Mauricio. > > -- > > MAURICIO HERRERA CUADRA > > arareko@campus.iztacala.unam.mx > > Laboratorio de Gen?tica > > Unidad de Morfofisiolog?a y Funci?n > > Facultad de Estudios Superiores Iztacala, UNAM > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Paulo Almeida Tel: +351 21 4464635, Fax: +351 21 4407970 Instituto Gulbenkian de Ci?ncia Rua da Quinta Grande, 6 P-2780-156 Oeiras Portugal http://www.igc.gulbenkian.pt From harish85_bt at yahoo.com Wed Oct 12 14:02:00 2005 From: harish85_bt at yahoo.com (Harish S) Date: Wed Oct 12 14:07:54 2005 Subject: [Bioperl-l] Batch retrieval of seq from swiss-prot Message-ID: <20051012180200.13650.qmail@web51415.mail.yahoo.com> Hi gurus, I am a newbie to this grp using bioperl version 1.5.0. Like i am trying to retrieve a list of swiss prot seqs from swiss prot.The file sample1.eco has one swiss-prot id per line. The code: ---- open(SEQID,'sample1.eco') || die 'Cannot open file',$!; @seqids=; for ($i=0;$i<@seqids;$i++) { use Bio::DB::SwissProt; $database= new Bio::DB::SwissProt; $seq = $database->get_Seq_by_id($seqids[$i]); print $seq->seq(), "\n\n"; } ---- works out, but this takes a long time...as it is retrieving one by one. so i tried to use get_Stream_by_batch but it gave me an error saying its deprecated and suggested me to use get_Stream_by_id. So tried this out.. ---- open(SEQID,'sample1.eco') || die 'Cannot open file',$!; @seqids=; $ref=\@seqids; for ($i=0;$i<@seqids;$i++) { use Bio::DB::SwissProt; $database= new Bio::DB::SwissProt; $seq = $database->get_Stream_by_id($ref); print $seq->seq(), "\n\n"; } ---- but this gave me an error saying Can't locate object method "seq" via package "Bio::SeqIO::swiss" How do i proceed...? Thanks in Advance. HARISH.S __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com From jason.stajich at duke.edu Wed Oct 12 14:11:14 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Oct 12 14:09:27 2005 Subject: [Bioperl-l] BioCorba and phylogenetic trees In-Reply-To: <434142CB.1040607@sfu.ca> References: <434142CB.1040607@sfu.ca> Message-ID: <671E19E6-4BA2-494B-BCB0-EFAA03B81E86@duke.edu> On Oct 3, 2005, at 10:40 AM, Rutger Vos wrote: > Dear all, > > I am curious about the following: > > * what's the status of BioCorba, and how does it relate to BioPerl > It was a good idea but CORBA + WAN just doesn't work. And really nobody really ended up ever having a real use case where they wanted to have BioJava server but wanted to run Bioperl on it or vice versa (or s/Perl/Python/ in there). There are client/server libraries for wrapping and accessing objects in the three languages, but that development stopped years ago and the APIs of the projects have continued to evolve so I don't know if the libraries are compatible any more. > * have bioperl-corba-clients and/or bioperl-corba-servers > succesfully been deployed on Win32? Using CORBA::ORBit? > Yes i believe so, but don't quote me on that, it was years ago when we actually went for full testing client/server stuff. CORBA::ORBit on linux of course worked fine. > * are there IDLs for tree objects/data structures in BioCorba? > When we started there were no tree objects in BioPerl, I don't know if the OMG project ended up with that or not. I have stepped away from all of this two or three years ago so I don't really know what has been going on. I think a lot of people would argue for SOAP over CORBA -- this certainly seems to be the trend in what is going on. But I really don't do distributed/web services so am not the right person to comment. > Some background: I'm "the perl guy" in this project: http:// > www.phylo.org > We are looking to integrate phylogenetics software using corba. > It'd be good if we were compatible with BioPerl, hence the questions. > Cheers, > > Rutger > > -- > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > Rutger Vos, PhD. candidate > Department of Biological Sciences > Simon Fraser University > 8888 University Drive > Burnaby, BC, V5A1S6 > Phone: 604-291-5625 Fax: 604-291-3496 > Personal site: http://www.sfu.ca/~rvosa > FAB* lab: http://www.sfu.ca/~fabstar > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Wed Oct 12 14:27:11 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Oct 12 14:25:24 2005 Subject: [Bioperl-l] Batch retrieval of seq from swiss-prot In-Reply-To: <20051012180200.13650.qmail@web51415.mail.yahoo.com> References: <20051012180200.13650.qmail@web51415.mail.yahoo.com> Message-ID: <6F24587C-7F61-4123-95F9-8D7B1C7B09DD@duke.edu> Well I'm sure this isn't the major cause of your slowness, but you are re-initializing the db handle in your loop each time, Code like this #!/usr/bin/perl -w use strict; use Bio::DB::SwissProt; my $database= new Bio::DB::SwissProt; open(SEQIDS, 'sample1.eco') || die "$!"; while() { my $seqid = $_; chomp($seqid); my $seq = $database->get_Seq_by_acc($seqid); print $seq->seq(), "\n\n"; } You can also switch the swissprot provider to a local mirror. $database->hostlocation('australia') if you are in australia for example. You have to read the code to see what are the available mirrors for now: here is what I've defined in the module, there may be more mirrors now, I'm not sure: hosts' => { 'switzerland' => 'ch.expasy.org', 'canada' => 'ca.expasy.org', 'china' => 'cn.expasy.org', 'taiwan' => 'tw.expasy.org', 'australia' => 'au.expasy.org', 'korea' => 'kr.expasy.org', 'us' => 'us.expasy.org', }, You can see the module's code here: perldoc -m Bio::DB::Swissprot The real problem is that the swissprot web interface doesn't support a batch mode - this is the data provider limitation and bioperl has control over this. Your options are (if you want fully annotated proteins and not just a fasta file): a) download swissprot and use Bio::Index::Swissprot to index it locally and get superfast access b) use Bio::DB::GenPept and get back NCBI-ized Swissprot records (batch mode does work for NBCI) if you just want fasta either use Bio::DB::GenPept or download the swissprot db from NCBI and index it with Bio::Index::Fasta or Bio::DB::Fasta. Good luck, -jason On Oct 12, 2005, at 2:02 PM, Harish S wrote: > Hi gurus, > I am a newbie to this grp using bioperl version 1.5.0. > Like i am trying to retrieve a list of swiss prot seqs > from swiss prot.The file sample1.eco has one > swiss-prot id per line. > The code: > ---- > open(SEQID,'sample1.eco') || die 'Cannot open > file',$!; > @seqids=; > for ($i=0;$i<@seqids;$i++) > { > use Bio::DB::SwissProt; > $database= new Bio::DB::SwissProt; > $seq = $database->get_Seq_by_id($seqids[$i]); > print $seq->seq(), "\n\n"; > } > ---- > works out, but this takes a long time...as it is > retrieving one by one. > > so i tried to use get_Stream_by_batch but it gave me > an error saying its deprecated and suggested me to use > get_Stream_by_id. > > So tried this out.. > ---- > open(SEQID,'sample1.eco') || die 'Cannot open > file',$!; > @seqids=; > $ref=\@seqids; > for ($i=0;$i<@seqids;$i++) > { > use Bio::DB::SwissProt; > $database= new Bio::DB::SwissProt; > $seq = $database->get_Stream_by_id($ref); > print $seq->seq(), "\n\n"; > } > ---- > but this gave me an error saying > Can't locate object method "seq" via package > "Bio::SeqIO::swiss" > > How do i proceed...? > Thanks in Advance. > > HARISH.S > > > > > > > __________________________________ > Yahoo! Mail - PC Magazine Editors' Choice 2005 > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From harish85_bt at yahoo.com Wed Oct 12 14:26:05 2005 From: harish85_bt at yahoo.com (Harish S) Date: Wed Oct 12 14:32:03 2005 Subject: [Bioperl-l] IRC for Bioperl Message-ID: <20051012182606.83386.qmail@web51410.mail.yahoo.com> Hi Does anyone know a channel in IRC for bioperl?? So that i can post my queries.. Rgds Harish HARISH.S __________________________________ Yahoo! Music Unlimited Access over 1 million songs. Try it free. http://music.yahoo.com/unlimited/ From heikki.lehvaslaiho at gmail.com Wed Oct 12 16:24:43 2005 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Wed Oct 12 19:08:38 2005 Subject: [Bioperl-l] IRC for Bioperl In-Reply-To: <20051012182606.83386.qmail@web51410.mail.yahoo.com> References: <20051012182606.83386.qmail@web51410.mail.yahoo.com> Message-ID: <200510122124.43723.heikki.lehvaslaiho@gmail.com> Sorry Harish, There is no IRC for bioperl. Since there is only a handful of people answering questions and everyone is busy doing something else most of the time, email is the most interactive way of communication we can manage. I am happy to be proven wrong, but I think most bioperlers will agree with me. Patience, please! ... and besides, In my opinion, getting a reply from Jason in 25 min 11 sec is pretty good service! :-) -Heikki On Wednesday 12 October 2005 19:26, Harish S wrote: > Hi > > Does anyone know a channel in IRC for bioperl?? > So that i can post my queries.. > > Rgds > Harish > > HARISH.S > > > > > > __________________________________ > Yahoo! Music Unlimited > Access over 1 million songs. Try it free. > http://music.yahoo.com/unlimited/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From allenday at ucla.edu Wed Oct 12 19:37:11 2005 From: allenday at ucla.edu (Allen Day) Date: Wed Oct 12 19:59:33 2005 Subject: [Bioperl-l] IRC for Bioperl In-Reply-To: <200510122124.43723.heikki.lehvaslaiho@gmail.com> References: <20051012182606.83386.qmail@web51410.mail.yahoo.com> <200510122124.43723.heikki.lehvaslaiho@gmail.com> Message-ID: sounds like an intersting idea, i'll try idling on irc. how about #bioperl on irc.freenode.net? i'm there now. no promises to do much more than lurk though! -allen On Wed, 12 Oct 2005, Heikki Lehvaslaiho wrote: > Sorry Harish, > > There is no IRC for bioperl. Since there is only a handful of people answering > questions and everyone is busy doing something else most of the time, email > is the most interactive way of communication we can manage. I am happy to be > proven wrong, but I think most bioperlers will agree with me. > > Patience, please! > > ... and besides, In my opinion, getting a reply from Jason in 25 min 11 sec is > pretty good service! :-) > > -Heikki > > > On Wednesday 12 October 2005 19:26, Harish S wrote: > > Hi > > > > Does anyone know a channel in IRC for bioperl?? > > So that i can post my queries.. > > > > Rgds > > Harish > > > > HARISH.S > > > > > > > > > > > > __________________________________ > > Yahoo! Music Unlimited > > Access over 1 million songs. Try it free. > > http://music.yahoo.com/unlimited/ > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From rvosa at sfu.ca Wed Oct 12 20:20:47 2005 From: rvosa at sfu.ca (Rutger Vos) Date: Wed Oct 12 20:46:59 2005 Subject: [Bioperl-l] BioCorba and phylogenetic trees In-Reply-To: <671E19E6-4BA2-494B-BCB0-EFAA03B81E86@duke.edu> References: <434142CB.1040607@sfu.ca> <671E19E6-4BA2-494B-BCB0-EFAA03B81E86@duke.edu> Message-ID: <434DA85F.5000404@sfu.ca> Too bad... it was kind of my impression already (by the way confirmed by Ewan Birney). It seems to me that corba is much more scalable than soap, but somehow soap is "winning". It's betamax all over again, I'm tellin' ya. Thanks though! Rutger Jason Stajich wrote: > > On Oct 3, 2005, at 10:40 AM, Rutger Vos wrote: > >> Dear all, >> >> I am curious about the following: >> >> * what's the status of BioCorba, and how does it relate to BioPerl >> > It was a good idea but CORBA + WAN just doesn't work. And really > nobody really ended up ever having a real use case where they wanted > to have BioJava server but wanted to run Bioperl on it or vice versa > (or s/Perl/Python/ in there). There are client/server libraries for > wrapping and accessing objects in the three languages, but that > development stopped years ago and the APIs of the projects have > continued to evolve so I don't know if the libraries are compatible > any more. > >> * have bioperl-corba-clients and/or bioperl-corba-servers >> succesfully been deployed on Win32? Using CORBA::ORBit? >> > Yes i believe so, but don't quote me on that, it was years ago when > we actually went for full testing client/server stuff. CORBA::ORBit > on linux of course worked fine. > >> * are there IDLs for tree objects/data structures in BioCorba? >> > When we started there were no tree objects in BioPerl, I don't know > if the OMG project ended up with that or not. I have stepped away > from all of this two or three years ago so I don't really know what > has been going on. > > I think a lot of people would argue for SOAP over CORBA -- this > certainly seems to be the trend in what is going on. But I really > don't do distributed/web services so am not the right person to comment. > >> Some background: I'm "the perl guy" in this project: http:// >> www.phylo.org >> We are looking to integrate phylogenetics software using corba. It'd >> be good if we were compatible with BioPerl, hence the questions. >> Cheers, >> >> Rutger >> >> -- >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Rutger Vos, PhD. candidate >> Department of Biological Sciences >> Simon Fraser University >> 8888 University Drive >> Burnaby, BC, V5A1S6 >> Phone: 604-291-5625 Fax: 604-291-3496 >> Personal site: http://www.sfu.ca/~rvosa >> FAB* lab: http://www.sfu.ca/~fabstar >> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > > > -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From saldroubi at yahoo.com Thu Oct 13 11:25:02 2005 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Thu Oct 13 11:39:17 2005 Subject: [Bioperl-l] How to extract promoter region seq from genbank or another source? Message-ID: <20051013152502.1441.qmail@web34301.mail.mud.yahoo.com> Hello, I am totally new to BioPerl. I was able to install it and retrieve data from GenBank. I have a list of accession numbers for genes but I want to use BioPerl to get the promoter region (1000 bp before the start of the gene). Can someone point me in the right direction on how to accomplish this. Tech info: Using bioperl-1.5 on SuSE 9.3 professional machine. Thank you. Sincerely, Sam Al-Droubi, M.S. saldroubi@yahoo.com From angshu96 at gmail.com Thu Oct 13 19:28:14 2005 From: angshu96 at gmail.com (Angshu Kar) Date: Thu Oct 13 19:27:28 2005 Subject: [Bioperl-l] needs help... Message-ID: Hi , I need your help in solving a bioinformatics implementation problem. I'm stating my problem stepwise below: 1. There are amino acid sequences stored (clustered) in the database (postgres) in raw format. e.g. Cluster A has 5 sequences, Cluster B has 3 and so on. 2. Each of those sequences (from one cluster at time, say, we pick the 5 sequences from Cluster A, one by one) needs to be picked up and converted to fasta format. 3. These fasta files then need to be fed into a multiple sequence alignment tool (e.g. CLUSTALW). 4. The tool needs to be executed and the results will be displayed. 5. The output of the tool (aligned sequences) then needs to be fed into a scoring tool ( e.g. BLOSUM). 6. The scoring tool needs to be run and the scores need to be displayed in histogram format (say Cluster A has score 5.5, Cluster B has score 3.3 and so on). All these needs to be done using perl (or bioperl) and I'm completely new to this. Could you please suggest me how to implement step 3-5? Thanking you, Angshu From angshu96 at gmail.com Wed Oct 12 15:48:27 2005 From: angshu96 at gmail.com (Angshu Kar) Date: Thu Oct 13 20:59:21 2005 Subject: [Bioperl-l] Please help In-Reply-To: References: Message-ID: Hi , I'm completely new to perl. I'm have to work in biology using perl, postgresql (as database) and clustalw(as the alignment tool). I'm stating my problem briefly: In the postgresql db the data is clustered using complete linkage clustering. I've to connect to that db, fetch those data, feed it to the multiple alignment tool, run it and show the results.Again, feed those alignments into a scoring tool and show the results in form of a histogram.All these needs to be automated using perl. I've installed bio-perl but can't get how to write code using it. I'll be obliged if you help me with this by providing the idea of how to feed the db data (in fasta format) into clustalw. Thanks, Angshu From Matthew.Betts at bccs.uib.no Thu Oct 13 03:07:41 2005 From: Matthew.Betts at bccs.uib.no (Matthew Betts) Date: Thu Oct 13 20:59:22 2005 Subject: [Bioperl-l] Proposal for AI methods in Bioperl Message-ID: There is also a perl library for genetic programming: http://perlgp.org/ > Date: Wed, 12 Oct 2005 17:59:57 +0100 > From: Paulo Almeida > Subject: Re: [Bioperl-l] Proposal for AI methods in Bioperl > To: bioperl-l@portal.open-bio.org > Message-ID: <200510121759.58082.palmeida@igc.gulbenkian.pt> > Content-Type: text/plain; charset="iso-8859-1" > > I have used GALib in the past, a C++ library for Genetic Algorithms, which > could give you some ideas. > > It is located here: http://lancet.mit.edu/ga/ > There is an overview here: http://lancet.mit.edu/galib-2.4/Overview.html > > -Paulo > > On Wednesday 12 October 2005 17:09, Hilmar Lapp wrote: > > You might want to look at the WEKA project for some design guidance and > > inspiration. The home page is at http://www.cs.waikato.ac.nz/ml/weka/ > > and the API is documented at http://weka.sourceforge.net/doc/ > > > > Something that's equally well structured I think could be a generally > > useful contribution. I'd expect that to use a lot of compiled code > > though, number crunching just isn't what perl is best at. > > > > Also, it may be worth checking out the dp package in Biojava for > > dynamic programming algorithms. > > > > -hilmar > > > > On Oct 12, 2005, at 12:05 AM, Mauricio Herrera Cuadra wrote: > > > Hi all, > > > > > > As far as I know, there aren't any modules in Bioperl providing access > > > to Artificial Intelligence (AI) methods such as neural networks, > > > genetic algorithms, cellular automata, among others. > > > > > > At the moment, I'm interested in developing some interface for using > > > genetic algorithms. By using the interface, users could access these > > > methods for generating optimized schemes based on sets of homologous > > > sequences. I think that the generated results could be a nice > > > alternative for motif/pattern searching in diverse novel sequences > > > (instead of regular expressions, HMM's and other methods). What do you > > > think? > > > > > > Any guidance would be really helpful. If you know of any external > > > application that you can recomend for using with the interface, a > > > suggested location in the code tree, etc., please let me know. I've > > > thought of something like a Bio::AI family that can be included in the > > > bioperl-live or maybe as a bioperl-ai package. I'm open to all kind of > > > thoughts. Thanks in advance. > > > > > > Regards, > > > Mauricio. > > > -- > > > MAURICIO HERRERA CUADRA > > > arareko@campus.iztacala.unam.mx > > > Laboratorio de Gen?tica > > > Unidad de Morfofisiolog?a y Funci?n > > > Facultad de Estudios Superiores Iztacala, UNAM > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > Paulo Almeida > Tel: +351 21 4464635, Fax: +351 21 4407970 > Instituto Gulbenkian de Ci?ncia > Rua da Quinta Grande, 6 > P-2780-156 Oeiras > Portugal > http://www.igc.gulbenkian.pt From s19109014 at ym.edu.tw Thu Oct 13 05:09:54 2005 From: s19109014 at ym.edu.tw (=?Big5?B?tqep+qXNrOy0v7S6wkU=?=) Date: Thu Oct 13 20:59:23 2005 Subject: [Bioperl-l] A question about rpsblast Message-ID: <434E2462.2060304@ym.edu.tw> Dear sir, Sorry for this unexpectd mail. Recently, I have tried to set up a web page in PHP for calling *rpsblast* program to searchconserved domains of proteins. When I tried to used the *php* *system* function "system("/xxx/blast-2.2.12/bin/rpsblast -d /xxx/cdd/CDD/Cdd -i $uploadfile -e 0.001-o /xxx/$filename.rpsout");", the command doesn't work. I even tried to use PHP's system to call a perl script that does the rpsblast search, but still no output. However, using the same method, I am able to get BLAST programs using *blastall* to work. What canI doto make rpsblast to work in a web service environment? Is this a rpsblast bug oris there another way to solve this problem to get the rpsblast output file? Thanks for your time and look forward your reply. PS: I have sent this problem to NCBI (blast-help@ncbi.nlm.nih.gov), but they have not replyed yet. Ching-Hung Tzeng Undergraduate student, Department of Life Sciences, National Yang-Ming University. From jason.stajich at duke.edu Thu Oct 13 21:08:19 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Oct 13 21:06:28 2005 Subject: [Bioperl-l] 1.5.1 info for the devs Message-ID: I put in a release tag in each of bioperl-live,bioperl-run,bioperl-ext bioperl-release-1-5-1 bioperl-run-release-1-5-1 bioperl-ext-release-1-5-1 -jason -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Thu Oct 13 21:00:38 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Oct 13 21:10:07 2005 Subject: [Bioperl-l] bioperl 1.5.1 release Message-ID: <99C2FCCA-9282-43B0-8C0A-2DA3495DDFF1@duke.edu> I am EXTREMELY pleased to announce the release of Bioperl 1.5.1. http://bioperl.org/DIST/bioperl-1.5.1.tar.gz http://bioperl.org/DIST/bioperl-1.5.1.tar.bz2 http://bioperl.org/DIST/bioperl-1.5.1.zip http://bioperl.org/DIST/bioperl-run-1.5.1.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.1.tar.bz2 http://bioperl.org/DIST/bioperl-run-1.5.1.zip http://bioperl.org/DIST/bioperl-ext-1.5.1.tar.gz http://bioperl.org/DIST/bioperl-ext-1.5.1.tar.bz2 http://bioperl.org/DIST/bioperl-ext-1.5.1.zip MD5sums are here for those who like that sort of thing: http://bioperl.org/DIST/SIGNATURES.md5 This release represents a developer release which has been thoroughly tested and is very much a warm up for a bioperl 1.6.0 release. This release replaces bioperl 1.5.0 which had several bugs that made it incompatible with several bioperl-1.4.0 scripts. The current release should reverse these incompatibilities as well fix several outstanding bugs (See Changelog below and Changes file in the distribution for more information. We also re-sync things with the NCBI RemoteBlast server so the SearchIO blast parser and RemoteBlast client so this should work now. This has been tested on the latest Gbrowse candidate and it all works great. Please see the HOWTO documents provided in doc/howto/pdf or on the website http://bioperl.org/HOWTOs/ for information on using the modules as well as the API documentation http://doc.bioperl.org. The code from this release is currently provided as an ever updating link under the bioperl-live and bioperl-run and bioperl-ext links under the Active Code link. We will make another doc package for the 1.6 release. I am also releasing bioperl-run, bioperl-ext packages labeled with the 1.5.1 version as these should all be used in sync. I have not released bioperl-db (bioperl BioSQL implementation) at this time but I expect with Hilmar's help we will do that release in the next few weeks. Some thank yous: Heikki Lehvaslaiho and Brian Osborne for many helpful last minute bug fixes, POD fixes, and generally finding and keeping things consistent. Brian also was instrumental in getting the Docbook to work properly and converting us over to XML from my initial SGML template. I also want to thank several new developers who have pitched in lately to both help with the details and give big picture views on things: Albert Villa, Stefan Kirov, George Hartzell. Thanks to Koen van der Drift for providing the OSX port of bioperl on fink and to George and Mauricio for working to support bioperl on freebsd. I hope the linux aficionados will also continue to help put RPMs up for the toolkit. Also thanks to Sean Davis, Barry Moore, and Marc Logghe who have been extremely helpful at answering questions on the list. We are grateful to all the bug-reports that have come in as well this has continued to make this a more robust and tested release. Would you like to see your name in lights (or at least in the thank you above; I'm sure this email is forwarded around the world...=)? Consider helping out the development team. We need help adding information to the FAQ. If you read the mailing list YOU can help with the FAQ distilling the various Q & A on the list into useful questions in the FAQ. On behalf of Bioperl developers, Jason Stajich I have appended the Change log from Bioperl core compents below 1.5.1 Developer release o Major problem with how Annotations were written out with Bio::Seq is fixed by reverting to old behavior for Bio::Annotation objects. o Bio::SeqIO - genbank.pm * bug #1871; REFLOOP' parsing loop, I changed the pattern to expect at l east 9 spaces at the beginning of a line to indicate line wrapping. * Treat multi-line SOURCE sections correctly, this defect broke both common_name() and classification() * parse swissprot fields in genpept file * parse WGS genbank records - embl.pm * Changed regexp for ID line. The capturing parentheses are the same, the difference is an optional repeated-not-semi- colon expression following the captured \S+. This means the regexp works when the division looks like /PRO;/ or when the division looks like /ANG ;/ - the latter is from EMBL repbase * fix ID line parsing: the molecule string can have spaces in it. Like: "genomic DNA" - swiss.pm: bugs #1727, #1734 - entrezgene.pm * Added parser for entrezgene ASN1 (text format) files. Uses Bio::ASN1::EntrezGene as a low level parser (get it from CPAN) o Bio::AlignIO - maf.pm coordinate problem fixed o Bio::Taxonomy and Bio::DB::Taxonomy - Parse NCBI XML now so that nearly all the taxonomy up-and-down can be done via Web without downloading all the sequence. o Bio::Tools::Run::RemoteBlast supports more options and complies to changes to the NCBI interface. It is reccomended that you retrieve the data in XML instead of plain-text BLAST report to insure proper parsing and retrieval of all information as NCBI fully expects to change things in the future. o Bio::Tree and Bio::TreeIO - Fixes so that re-rooting a tree works properly - Writing out nhx format from a newick/nexus file will properly output bootstrap information. The use must move the internal node labels over to bootstraps. for my $node ( grep { ! $_->is_Leaf } $tree->get_nodes ) { $node->bootstrap($node->id); $node->id(''); } - Nexus parsing is much more flexible now, does not care about LF. - Cladogram drawing module in Bio::Tree::Draw - Node height and depth now properly calculated - fix tree pruning algorithm so that node with 1 child gets merged o Graphics tweaks. Glyph::xyplot improved. Many other small- medium sized bugs and improvements were added, see Gbrowse mailing list for most of these. o Bio::DB::GFF partially supports GFF3. See information about gff3_munge flag in scripts/Bio-DB-GFF/bulk_load_gff.pl. o Better location parsing in Bio::Factory::FTLocationFactory - this is part of the engine for parsing EMBL/GenBank feature table locations. Nested join/order-by/complement are allowed now o Bio::PrimarySeqI->translate now takes named parameters o Bio::Tools::Phylo::PAML - parsing RST (ancestral sequence reconstruction) is now supported. Parsing different models and branch specific parametes are now supported. o Bio::Factory::FTLocationFactory - parse hierarchical locations (joins of joins) o Bio::Matrix::DistanceMatrix returns arrayrefs instead of arrays for getter/setter functions o Bio::SearchIO - blast bug #1739; match scientific notation in score and possible e+ values - blast.pm reads more WU-BLAST parameters and parameters, match a full database pathname, - Handle NCBI WEB and newer BLAST formats specifically (Query|Sbjct:) match in alignment blocks can now be (Query| Sbjct). - psl off-by-one error fixed - exonerate parsing much improved, CIGAR and VULGAR can be parsed and HSPs can be constructed from them. - HSPs query/hit now have a seqdesc field filled out (this was always available via $hit->description and $result->query_description - hmmer.pm can parse -A0 hmmpfam files - Writer::GbrowseGFF more customizeable. o Bio::Tools::Hmmpfam make e-value default score displayed in gff, rather than raw score allow parse of multiple records -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cain at cshl.edu Tue Oct 11 12:39:17 2005 From: cain at cshl.edu (Scott Cain) Date: Thu Oct 13 22:06:12 2005 Subject: [Bioperl-l] bioperl-live make test failure Message-ID: <1129048757.3929.56.camel@localhost.localdomain> Hi Jason, You are still working on the 1.5.1 release from HEAD, right? I just did a cvs update -d, and make test fails thusly: t/tigrxml....................ok 1/48Bio::SeqIO: tigrxml cannot be found Exception ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Failed to load module Bio::SeqIO::tigrxml. Can't locate XML/SAX/Writer.pm in @INC (@INC contains: t /Users/cain/cvs_stuff/bioperl-live/blib/lib /Users/cain/cvs_stuff/bioperl-live/blib/arch /sw/lib/perl5 /sw/lib/perl5/darwin /usr/local/lib/perl5/5.8.4/darwin-2level /usr/local/lib/perl5/5.8.4 /usr/local/lib/perl5/site_perl/5.8.4/darwin-2level /usr/local/lib/perl5/site_perl/5.8.4 /usr/local/lib/perl5/site_perl .) at /Users/cain/cvs_stuff/bioperl-live/blib/lib/Bio/SeqIO/tigrxml.pm line 71. BEGIN failed--compilation aborted at /Users/cain/cvs_stuff/bioperl-live/blib/lib/Bio/SeqIO/tigrxml.pm line 71. Compilation failed in require at /Users/cain/cvs_stuff/bioperl-live/blib/lib/Bio/Root/Root.pm line 396. STACK: Error::throw STACK: Bio::Root::Root::throw /Users/cain/cvs_stuff/bioperl-live/blib/lib/Bio/Root/Root.pm:328 STACK: Bio::Root::Root::_load_module /Users/cain/cvs_stuff/bioperl-live/blib/lib/Bio/Root/Root.pm:398 STACK: Bio::SeqIO::_load_format_module /Users/cain/cvs_stuff/bioperl-live/blib/lib/Bio/SeqIO.pm:552 STACK: Bio::SeqIO::new /Users/cain/cvs_stuff/bioperl-live/blib/lib/Bio/SeqIO.pm:379 STACK: t/tigrxml.t:22 ----------------------------------------------------------- For more information about the SeqIO system please see the SeqIO docs. This includes ways of checking for formats at compile time, not run time Can't call method "verbose" on an undefined value at t/tigrxml.t line 26. t/tigrxml....................dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 2-48 Failed 47/48 tests, 2.08% okay This is on Mac OSX 10.3, perl 5.8.4 built from source. Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From brian_osborne at cognia.com Thu Oct 13 21:59:44 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Oct 13 22:06:21 2005 Subject: [Bioperl-l] Please help In-Reply-To: Message-ID: Angshu, You need to study Bioperl's documentation. Take a look at relevant HOWTOs (Beginners, SeqIO, SearchIO perhaps). Also, install the bioperl-run package and look at the Bio::Tools::Run::Clustalw module. Brian O. On 10/12/05 3:48 PM, "Angshu Kar" wrote: > Hi , > > I'm completely new to perl. I'm have to work in biology using perl, > postgresql (as database) and clustalw(as the alignment > tool). I'm stating my problem briefly: > > In the postgresql db the data is clustered using complete linkage > clustering. I've to connect to that db, fetch those data, feed it to > the multiple alignment tool, run it and show the results.Again, > feed those alignments into a scoring tool and show the results in > form of a histogram.All these needs to be automated using perl. > I've installed bio-perl but can't get how to write code using it. > > I'll be obliged if you help me with this by providing the idea of how > to feed the db data (in fasta format) into clustalw. > > Thanks, > Angshu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From bmoore at genetics.utah.edu Thu Oct 13 23:01:16 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Thu Oct 13 22:58:22 2005 Subject: [Bioperl-l] Please help Message-ID: Angshu- Brain is right, you need to read the documentation and ask a more specific question on the list. I'll suggest a few extra modules for you to study along with the ones Brian told you about. Look for the non-bio modules on CPAN http://search.cpan.org/. BTW, the project you've described and the modules that you will need to use will be a challenge to jump straight into if you've never used perl at all before. You will almost certainly want to start by reading some introductory and intermediate perl books (or the free perl documentation) The standard books for perl beginners is Learning Perl by Randal Schwartz et al. and the standard perl Bible is Programming Perl by Larry Wall et al. You will want read at least the first one and have the second one handy as you jump into the documentation below. DBI, DBD::Pg and perhaps Class::DBI - For perl communicating with your database. Bio::SeqIO - For reading your sequences into Bioperl Bio::Tools::Run::Alignment::Clustalw - For aligning your sequences Bio::AlignIO::clustalw - For simple writing of the alignments. Depending on what you had in mind for scoring you may have to write your own code for that and the histogram display. Look at the GD module for your graphics needs. You mention several times that you want to display your results, and I'm going to guess you mean via a web site. I think CGI::Application is a great way to build the kind of database driven websites that I think you are considering here. Barry -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Brian Osborne Sent: Thursday, October 13, 2005 8:00 PM To: Angshu Kar; bioperl-l Subject: Re: [Bioperl-l] Please help Angshu, You need to study Bioperl's documentation. Take a look at relevant HOWTOs (Beginners, SeqIO, SearchIO perhaps). Also, install the bioperl-run package and look at the Bio::Tools::Run::Clustalw module. Brian O. On 10/12/05 3:48 PM, "Angshu Kar" wrote: > Hi , > > I'm completely new to perl. I'm have to work in biology using perl, > postgresql (as database) and clustalw(as the alignment > tool). I'm stating my problem briefly: > > In the postgresql db the data is clustered using complete linkage > clustering. I've to connect to that db, fetch those data, feed it to > the multiple alignment tool, run it and show the results.Again, > feed those alignments into a scoring tool and show the results in > form of a histogram.All these needs to be automated using perl. > I've installed bio-perl but can't get how to write code using it. > > I'll be obliged if you help me with this by providing the idea of how > to feed the db data (in fasta format) into clustalw. > > Thanks, > Angshu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From senger at ebi.ac.uk Thu Oct 13 22:56:33 2005 From: senger at ebi.ac.uk (Martin Senger) Date: Thu Oct 13 23:12:16 2005 Subject: [Bioperl-l] Re: BioCorba and phylogenetic trees Message-ID: > It seems to me that corba is much more scalable than soap, but somehow > soap is "winning". It's betamax all over again, I'm tellin' ya. > Exactly! You said it all. BTW, I am still using CORBA, but I hide it in internal implementation - the outside word sees just SOAP messages. Marketing (and firewalls) make the word. Cheers, Martin -- Martin Senger email: martin.senger@gmail.com skype: martinsenger consulting for: International Rice Research Institute Biometrics and Bioinformatics Unit DAPO BOX 7777, Metro Manila Philippines, phone: +63-2-580-5600 (ext.2324) From avilella at gmail.com Fri Oct 14 04:39:44 2005 From: avilella at gmail.com (Albert Vilella) Date: Fri Oct 14 09:17:18 2005 Subject: [Bioperl-l] SimpleAlign add_seq ("%s/%d-%d",$id,$start,$end) Message-ID: <1129279184.7466.6.camel@localhost.localdomain> Hi all, I was wondering why the add_seq method in SimpleAlign will always give the name of the sequence accompanied by a start and end tag. SimpleAlign.pm line 256: $name = sprintf("%s/%d-%d",$id,$start,$end); In one of my scripts I am using this method by I would not like to have this "/%d-%d" tail in the sequences' names. How do you think this should be addressed? Am I using a bad combination of method calls for my script? Bests, Albert. From zhangchn2004 at gmail.com Fri Oct 14 04:07:03 2005 From: zhangchn2004 at gmail.com (Zhang Chen) Date: Fri Oct 14 10:06:25 2005 Subject: [Bioperl-l] A question about rpsblast In-Reply-To: <434E2462.2060304@ym.edu.tw> References: <434E2462.2060304@ym.edu.tw> Message-ID: <24301eb00510140107l759e0edfu@mail.gmail.com> Firstly, are you sure your php script have the permission to run the script?Are you sure your php/apache process have the correct uid/pid to write theblast output?Maybe you need to change some of the permission of the outputdirectory(perhaps /tmp/)?or you can try to setuid the script? 2005/10/13, ê–Ã÷Éú¿ÆÔø¾°ø™ :>> Dear sir,>> Sorry for this unexpectd mail.>> Recently, I have tried to set up a web page in PHP for calling> *rpsblast* program to searchconserved domains of proteins.> When I tried to used the *php* *system* function> "system("/xxx/blast-2.2.12/bin/rpsblast -d /xxx/cdd/CDD/Cdd -i> $uploadfile -e 0.001-o /xxx/$filename.rpsout");", the command doesn't> work. I even tried to use PHP's system to call a perl script that does> the rpsblast search, but still no output. However, using the same> method, I am able to get BLAST programs using *blastall* to work.> What canI doto make rpsblast to work in a web service environment? Is> this a rpsblast bug oris there another way to solve this problem to get> the rpsblast output file?> Thanks for your time and look forward your reply.> PS: I have sent this problem to NCBI (blast-help@ncbi.nlm.nih.gov), but> they have not replyed yet. > Ching-Hung Tzeng> Undergraduate student,> Department of Life Sciences, National Yang-Ming University.> _______________________________________________> Bioperl-l mailing list> Bioperl-l@portal.open-bio.org> http://portal.open-bio.org/mailman/listinfo/bioperl-l> From angshu96 at gmail.com Fri Oct 14 16:47:51 2005 From: angshu96 at gmail.com (Angshu Kar) Date: Fri Oct 14 16:47:05 2005 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::Clustalw Message-ID: Hi, Has anyone used Bio::Tools::Run::Alignment::Clustalw in Windows? If so, could you please let me know the steps for running the tool for a seq.fafile? Thanks, Angshu From smarkel at scitegic.com Fri Oct 14 17:44:33 2005 From: smarkel at scitegic.com (Scott Markel) Date: Fri Oct 14 17:57:24 2005 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::Clustalw In-Reply-To: References: Message-ID: <435026C1.5000309@scitegic.com> Angshu, I run ClustalW on Windows via BioPerl, but inside our data pipelining product. The BioPerl-related lines that I use are $ENV{CLUSTALDIR} = $clustalDirectory; my @parameters = ("dnamatrix" => $dnaMatrix, "gapopen" => $gapOpeningPenalty, "gapext" => $gapExtensionPenalty, "matrix" => $proteinMatrix, "outfile" => $outputFile); my $clustalFactory = Bio::Tools::Run::Alignment::Clustalw->new(@parameters); my $alignment = $clustalFactory->align($fastaFile); Let me know if you need more context. Scott Angshu Kar wrote: > Hi, > Has anyone used Bio::Tools::Run::Alignment::Clustalw in Windows? If so, > could you please let me know the steps for running the tool for a seq.fafile? > Thanks, > Angshu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel@scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From angshu96 at gmail.com Fri Oct 14 19:00:46 2005 From: angshu96 at gmail.com (Angshu Kar) Date: Fri Oct 14 20:42:54 2005 Subject: [Bioperl-l] running clustalw Message-ID: Could anyone please provide me with bioperl script to run clustalw (with parameters) in a windows m/c? Thanking you, Angshu From Steve_Chervitz at affymetrix.com Fri Oct 14 21:24:11 2005 From: Steve_Chervitz at affymetrix.com (Chervitz, Steve) Date: Fri Oct 14 21:34:49 2005 Subject: [Bioperl-l] Re: BioCorba and phylogenetic trees In-Reply-To: Message-ID: > From: Martin Senger > Date: Fri, 14 Oct 2005 03:56:33 +0100 (BST) > > On Oct 12, 2005, Rutger Vos wrote: >> >> It seems to me that corba is much more scalable than soap, but somehow >> soap is "winning". It's betamax all over again, I'm tellin' ya. >> > Exactly! You said it all. BTW, I am still using CORBA, but I hide it in > internal implementation - the outside word sees just SOAP > messages. Marketing (and firewalls) make the word. CORBA does seem to have an image problem, deservedly or not. Here are some articles that provide some insight: SOAP raises the bar for CORBA: http://bdn.borland.com/article/0,1410,28737,00.html Web Services/SOAP and CORBA http://www.xs4all.nl/~irmen/comp/CORBA_vs_SOAP.html Web services: Is it CORBA redux? http://news.com.com/2010-1071-954808.html Alternatives to CORBA and SOAP: Second Generation Web Services http://webservices.xml.com/pub/a/ws/2002/02/06/rest.html Internet Communication Engine: http://www.zeroc.com/ Also worth noting here, the open-bio community has it's own aspect of distributed computing called DAS, which employs a REST-ful architecture: http://biodas.org/ Steve From brian_osborne at cognia.com Fri Oct 14 21:28:56 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Oct 14 21:43:11 2005 Subject: FW: [Bioperl-l] How to extract promoter region seq from genbank or another source? In-Reply-To: <20051014210538.15794.qmail@web34309.mail.mud.yahoo.com> Message-ID: ENSEMBL experts? ------ Forwarded Message From: Sam Al-Droubi Date: Fri, 14 Oct 2005 14:05:38 -0700 (PDT) To: Brian Osborne Subject: Re: [Bioperl-l] How to extract promoter region seq from genbank or another source? Hi Brian, Thank you for the response. I looked at it but it seems that enembl does not use accession numbers. It seems that they have their own numbering scheme. If so how do I get the mapping between the two. If I can't get the promoter region sequence then do you know if there is a way I can get the entire chromosome sequence? If so, I can then try to find the gene within it and then grab the promoter region. I am new to all this so I am sorry if I sound ignorant in this area. On the surface, it seems that one should be able to do this easily but it has not been easy so far. Thank you. Brian Osborne wrote: > Sam, > > ensembl may be one solution, I think it provides a good API for these sorts > of queries. See the ensembl API documentation for more information > (http://www.ensembl.org/info/software/core/core_tutorial.html). > > Brian O. > > > > On 10/13/05 11:25 AM, "Sam Al-Droubi" wrote: > >> > Hello, >> > >> > I am totally new to BioPerl. I was able to install it and retrieve data >> from >> > GenBank. I have a list of accession numbers for genes but I want to use >> > BioPerl to get the promoter region (1000 bp before the start of the gene). >> > Can someone point me in the right direction on how to accomplish this. >> > >> > Tech info: Using bioperl-1.5 on SuSE 9.3 professional machine. >> > >> > Thank you. >> > >> > >> > >> > >> > Sincerely, >> > Sam Al-Droubi, M.S. >> > saldroubi@yahoo.com >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l@portal.open-bio.org >> > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > Sincerely, Sam Al-Droubi, M.S. saldroubi@yahoo.com ------ End of Forwarded Message From skirov at utk.edu Fri Oct 14 22:46:44 2005 From: skirov at utk.edu (Stefan Kirov) Date: Fri Oct 14 22:46:36 2005 Subject: FW: [Bioperl-l] How to extract promoter region seq from genbank or another source? In-Reply-To: References: Message-ID: <43506D94.8030909@utk.edu> Sam, You can use MART to convert to ensembl id (in most cases). I don't think they support genebank. You can try to use genekeydb (genereg.ornl.gov/gkdb), either download it or use the online converter, but my guess is you are not going to get too many ids. One thing I may fix in the future, but right now... Still may be worth a try. Look at seqhound too (http://www.blueprint.org/seqhound/index.html). Stefan Brian Osborne wrote: >ENSEMBL experts? > >------ Forwarded Message >From: Sam Al-Droubi >Date: Fri, 14 Oct 2005 14:05:38 -0700 (PDT) >To: Brian Osborne >Subject: Re: [Bioperl-l] How to extract promoter region seq from genbank or >another source? > >Hi Brian, > >Thank you for the response. I looked at it but it seems that enembl does >not use accession numbers. It seems that they have their own numbering >scheme. If so how do I get the mapping between the two. If I can't get the >promoter region sequence then do you know if there is a way I can get the >entire chromosome sequence? If so, I can then try to find the gene within >it and then grab the promoter region. >I am new to all this so I am sorry if I sound ignorant in this area. > >On the surface, it seems that one should be able to do this easily but it >has not been easy so far. > >Thank you. > > >Brian Osborne wrote: > > >>Sam, >> >>ensembl may be one solution, I think it provides a good API for these sorts >>of queries. See the ensembl API documentation for more information >>(http://www.ensembl.org/info/software/core/core_tutorial.html). >> >>Brian O. >> >> >> >>On 10/13/05 11:25 AM, "Sam Al-Droubi" wrote: >> >> >> >>>>Hello, >>>> >>>>I am totally new to BioPerl. I was able to install it and retrieve data >>>> >>>> >>>from >>> >>> >>>>GenBank. I have a list of accession numbers for genes but I want to use >>>>BioPerl to get the promoter region (1000 bp before the start of the gene). >>>>Can someone point me in the right direction on how to accomplish this. >>>> >>>>Tech info: Using bioperl-1.5 on SuSE 9.3 professional machine. >>>> >>>>Thank you. >>>> >>>> >>>> >>>> >>>>Sincerely, >>>>Sam Al-Droubi, M.S. >>>>saldroubi@yahoo.com >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l@portal.open-bio.org >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >> >> > > >Sincerely, >Sam Al-Droubi, M.S. >saldroubi@yahoo.com > >------ End of Forwarded Message > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From sdavis2 at mail.nih.gov Sat Oct 15 09:57:01 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sat Oct 15 10:02:08 2005 Subject: [Bioperl-l] How to extract promoter region seq from genbank or another source? In-Reply-To: <43506D94.8030909@utk.edu> Message-ID: Sam, One of the simplest ways to do this is to use the UCSC table browser. Go to: http://genome.ucsc.edu/cgi-bin/hgTables Choose your organism and assembly of choice. Choose group "mRNA and EST tracks" and then choose either track Human ESTs (if your species is human, as an examlpe) or Human mRNAs. Click region "genome". Click "paste list" or "upload list" to give your accessions. Then choose output format "sequence". Click "get output". You will get a new page. Choose "promoter/Upstream by" and choose the number of bases you want. You can also deselect the other things on the page if you don't want other sequence. You get the idea. If you started with ESTs, then go back and do the same with the mRNA table. I don't think you need to split your accession list--just use the same both times. After this, you will have two fasta files (which you can combine into one). Then use Bio::SeqIO to read them in. Note that many accessions will align to multiple regions of the genome and will, therefore, be represented by multiple promoter regions. You may want to filter these accessions out. Sean You can download the tables or batch query them with On 10/14/05 10:46 PM, "Stefan Kirov" wrote: > Sam, > You can use MART to convert to ensembl id (in most cases). I don't think > they support genebank. You can try to use genekeydb > (genereg.ornl.gov/gkdb), either download it or use the online converter, > but my guess is you are not going to get too many ids. One thing I may > fix in the future, but right now... Still may be worth a try. Look at > seqhound too (http://www.blueprint.org/seqhound/index.html). > Stefan > > Brian Osborne wrote: > >> ENSEMBL experts? >> >> ------ Forwarded Message >> From: Sam Al-Droubi >> Date: Fri, 14 Oct 2005 14:05:38 -0700 (PDT) >> To: Brian Osborne >> Subject: Re: [Bioperl-l] How to extract promoter region seq from genbank or >> another source? >> >> Hi Brian, >> >> Thank you for the response. I looked at it but it seems that enembl does >> not use accession numbers. It seems that they have their own numbering >> scheme. If so how do I get the mapping between the two. If I can't get the >> promoter region sequence then do you know if there is a way I can get the >> entire chromosome sequence? If so, I can then try to find the gene within >> it and then grab the promoter region. >> I am new to all this so I am sorry if I sound ignorant in this area. >> >> On the surface, it seems that one should be able to do this easily but it >> has not been easy so far. >> >> Thank you. >> >> >> Brian Osborne wrote: >> >> >>> Sam, >>> >>> ensembl may be one solution, I think it provides a good API for these sorts >>> of queries. See the ensembl API documentation for more information >>> (http://www.ensembl.org/info/software/core/core_tutorial.html). >>> >>> Brian O. >>> >>> >>> >>> On 10/13/05 11:25 AM, "Sam Al-Droubi" wrote: >>> >>> >>> >>>>> Hello, >>>>> >>>>> I am totally new to BioPerl. I was able to install it and retrieve data >>>>> >>>>> >>>> from >>>> >>>> >>>>> GenBank. I have a list of accession numbers for genes but I want to use >>>>> BioPerl to get the promoter region (1000 bp before the start of the gene). >>>>> Can someone point me in the right direction on how to accomplish this. >>>>> >>>>> Tech info: Using bioperl-1.5 on SuSE 9.3 professional machine. >>>>> >>>>> Thank you. >>>>> >>>>> >>>>> >>>>> >>>>> Sincerely, >>>>> Sam Al-Droubi, M.S. >>>>> saldroubi@yahoo.com >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@portal.open-bio.org >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>> >>> >> >> >> Sincerely, >> Sam Al-Droubi, M.S. >> saldroubi@yahoo.com >> >> ------ End of Forwarded Message >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> From bmoore at genetics.utah.edu Sat Oct 15 11:09:04 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Sat Oct 15 11:06:12 2005 Subject: FW: [Bioperl-l] Please help Message-ID: -----Original Message----- From: Angshu Kar [mailto:angshu96@gmail.com] Sent: Friday, October 14, 2005 1:38 PM To: Barry Moore Subject: Re: [Bioperl-l] Please help Hi Barry, I've written the follwing code: use DBI; use Bio::SeqIO; my $db_host = 'YYYYYY; my $db_user = 'WWWW'; my $db_pass = 'XXXXXX'; my $db_name = ''ZZZZZ'; # Connect to a PostgreSQL database. my $db = "dbi:PgPP:dbname=${db_name};host=${db_host}"; # Connect to database. $dbh = DBI->connect($db, $db_user, $db_pass, { RaiseError => 1, AutoCommit => 0 } ) || die "Error connecting to the database: $DBI::errstr\n"; # My query. my $query = "select seq from biosequence where biosequence_id = 12"; # Run query. $seq = $dbh->selectcol_arrayref($query); # Convert the raw string to fasta $in = <$seq>; #$in = Bio::SeqIO->new($seq, # -format => 'raw'); $out = Bio::SeqIO->new($seq, -seq_name => "ABCDE01", -format => 'Fasta'); # Print results. print join("\n", @$seq); Everything was running fine before I tried to convert the raw sequecne into fasta using the code in bold. Could you please let me know where I'm going wrong? Thanks, Angshu On 10/13/05, Barry Moore wrote: Angshu- Brain is right, you need to read the documentation and ask a more specific question on the list. I'll suggest a few extra modules for you to study along with the ones Brian told you about. Look for the non-bio modules on CPAN http://search.cpan.org/. BTW, the project you've described and the modules that you will need to use will be a challenge to jump straight into if you've never used perl at all before. You will almost certainly want to start by reading some introductory and intermediate perl books (or the free perl documentation) The standard books for perl beginners is Learning Perl by Randal Schwartz et al. and the standard perl Bible is Programming Perl by Larry Wall et al. You will want read at least the first one and have the second one handy as you jump into the documentation below. DBI, DBD::Pg and perhaps Class::DBI - For perl communicating with your database. Bio::SeqIO - For reading your sequences into Bioperl Bio::Tools::Run::Alignment::Clustalw - For aligning your sequences Bio::AlignIO::clustalw - For simple writing of the alignments. Depending on what you had in mind for scoring you may have to write your own code for that and the histogram display. Look at the GD module for your graphics needs. You mention several times that you want to display your results, and I'm going to guess you mean via a web site. I think CGI::Application is a great way to build the kind of database driven websites that I think you are considering here. Barry -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto: bioperl-l-bounces@portal.open-bio.org ] On Behalf Of Brian Osborne Sent: Thursday, October 13, 2005 8:00 PM To: Angshu Kar; bioperl-l Subject: Re: [Bioperl-l] Please help Angshu, You need to study Bioperl's documentation. Take a look at relevant HOWTOs (Beginners, SeqIO, SearchIO perhaps). Also, install the bioperl-run package and look at the Bio::Tools::Run::Clustalw module. Brian O. On 10/12/05 3:48 PM, "Angshu Kar" < angshu96@gmail.com> wrote: > Hi , > > I'm completely new to perl. I'm have to work in biology using perl, > postgresql (as database) and clustalw(as the alignment > tool). I'm stating my problem briefly: > > In the postgresql db the data is clustered using complete linkage > clustering. I've to connect to that db, fetch those data, feed it to > the multiple alignment tool, run it and show the results.Again, > feed those alignments into a scoring tool and show the results in > form of a histogram.All these needs to be automated using perl. > I've installed bio-perl but can't get how to write code using it. > > I'll be obliged if you help me with this by providing the idea of how > to feed the db data (in fasta format) into clustalw. > > Thanks, > Angshu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Sat Oct 15 11:34:17 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Sat Oct 15 11:39:02 2005 Subject: [Bioperl-l] Please help In-Reply-To: Message-ID: Angshu, Take a look at the documentation on selectcol_arrayref() ("perldoc DBI"). $seq is a reference to an array, in order to get to the records or data in that array you need to dereference it, something like: for $sequence ( @{$seq} ) { # SeqIO stuff } Please include error messages in your emails. Brian O. On 10/15/05 11:09 AM, "Barry Moore" wrote: > > > > > -----Original Message----- > From: Angshu Kar [mailto:angshu96@gmail.com] > Sent: Friday, October 14, 2005 1:38 PM > To: Barry Moore > Subject: Re: [Bioperl-l] Please help > > > > Hi Barry, > > I've written the follwing code: > > use DBI; > use Bio::SeqIO; > > my $db_host = 'YYYYYY; > my $db_user = 'WWWW'; > my $db_pass = 'XXXXXX'; > my $db_name = ''ZZZZZ'; > > # Connect to a PostgreSQL database. > my $db = "dbi:PgPP:dbname=${db_name};host=${db_host}"; > > # Connect to database. > $dbh = DBI->connect($db, $db_user, $db_pass, > { RaiseError => 1, AutoCommit => 0 } > ) || die "Error connecting to the database: $DBI::errstr\n"; > > # My query. > my $query = "select seq from biosequence where biosequence_id = 12"; > > # Run query. > $seq = $dbh->selectcol_arrayref($query); > > # Convert the raw string to fasta > $in = <$seq>; > > #$in = Bio::SeqIO->new($seq, > # -format => 'raw'); > $out = Bio::SeqIO->new($seq, > -seq_name => "ABCDE01", > -format => 'Fasta'); > > # Print results. > print join("\n", @$seq); > > Everything was running fine before I tried to convert the raw sequecne > into fasta using the code in bold. Could you please let me know where > I'm going wrong? > > > > Thanks, > Angshu > > > > On 10/13/05, Barry Moore wrote: > > Angshu- > > Brain is right, you need to read the documentation and ask a more > specific question on the list. I'll suggest a few extra modules for you > > to study along with the ones Brian told you about. Look for the non-bio > modules on CPAN http://search.cpan.org/. BTW, the project you've > described and the modules that you will need to use will be a challenge > to jump straight into if you've never used perl at all before. You will > almost certainly want to start by reading some introductory and > intermediate perl books (or the free perl documentation) The standard > books for perl beginners is Learning Perl by Randal Schwartz et al. and > the standard perl Bible is Programming Perl by Larry Wall et al. You > will want read at least the first one and have the second one handy as > you jump into the documentation below. > > DBI, DBD::Pg and perhaps Class::DBI - For perl communicating with your > database. > Bio::SeqIO - For reading your sequences into Bioperl > Bio::Tools::Run::Alignment::Clustalw - For aligning your sequences > Bio::AlignIO::clustalw - For simple writing of the alignments. > > Depending on what you had in mind for scoring you may have to write your > own code for that and the histogram display. Look at the GD module for > your graphics needs. You mention several times that you want to display > your results, and I'm going to guess you mean via a web site. I think > CGI::Application is a great way to build the kind of database driven > websites that I think you are considering here. > > Barry > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto: bioperl-l-bounces@portal.open-bio.org > ] On Behalf Of Brian > Osborne > Sent: Thursday, October 13, 2005 8:00 PM > To: Angshu Kar; bioperl-l > Subject: Re: [Bioperl-l] Please help > > Angshu, > > You need to study Bioperl's documentation. Take a look at relevant > HOWTOs > (Beginners, SeqIO, SearchIO perhaps). Also, install the bioperl-run > package > and look at the Bio::Tools::Run::Clustalw module. > > Brian O. > > > On 10/12/05 3:48 PM, "Angshu Kar" < angshu96@gmail.com> wrote: > >> Hi , >> >> I'm completely new to perl. I'm have to work in biology using perl, >> postgresql (as database) and clustalw(as the alignment >> tool). I'm stating my problem briefly: >> >> In the postgresql db the data is clustered using complete linkage >> clustering. I've to connect to that db, fetch those data, feed it to >> the multiple alignment tool, run it and show the results.Again, >> feed those alignments into a scoring tool and show the results in >> form of a histogram.All these needs to be automated using perl. >> I've installed bio-perl but can't get how to write code using it. >> >> I'll be obliged if you help me with this by providing the idea of how >> to feed the db data (in fasta format) into clustalw. >> >> Thanks, >> Angshu >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From bmoore at genetics.utah.edu Sat Oct 15 11:54:45 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Sat Oct 15 11:51:28 2005 Subject: [Bioperl-l] Please help Message-ID: Angshu, A couple of suggestions: It's best to keep these mails on the list so that others can contribute to and benefit from the discussion, and more detail on what's not working would help (i.e. are you getting error messages or just no output.) Are you getting an error like, "Not a GLOB referenece..."? Your angle operator in this command $in = <$seq> shouldn't work. The angle operator is intended for handles. Try it like this; $in = shift @$seq or if you query returns more than one sequence wrap your Bio::Seq calls in a foreach my $in (@$seq) {} loop. Barry -----Original Message----- From: Angshu Kar [mailto:angshu96@gmail.com] Sent: Friday, October 14, 2005 1:38 PM To: Barry Moore Subject: Re: [Bioperl-l] Please help Hi Barry, I've written the follwing code: use DBI; use Bio::SeqIO; my $db_host = 'YYYYYY; my $db_user = 'WWWW'; my $db_pass = 'XXXXXX'; my $db_name = ''ZZZZZ'; # Connect to a PostgreSQL database. my $db = "dbi:PgPP:dbname=${db_name};host=${db_host}"; # Connect to database. $dbh = DBI->connect($db, $db_user, $db_pass, { RaiseError => 1, AutoCommit => 0 } ) || die "Error connecting to the database: $DBI::errstr\n"; # My query. my $query = "select seq from biosequence where biosequence_id = 12"; # Run query. $seq = $dbh->selectcol_arrayref($query); # Convert the raw string to fasta $in = <$seq>; #$in = Bio::SeqIO->new($seq, # -format => 'raw'); $out = Bio::SeqIO->new($seq, -seq_name => "ABCDE01", -format => 'Fasta'); # Print results. print join("\n", @$seq); Everything was running fine before I tried to convert the raw sequecne into fasta using the code in bold. Could you please let me know where I'm going wrong? Thanks, Angshu On 10/13/05, Barry Moore wrote: Angshu- Brain is right, you need to read the documentation and ask a more specific question on the list. I'll suggest a few extra modules for you to study along with the ones Brian told you about. Look for the non-bio modules on CPAN http://search.cpan.org/. BTW, the project you've described and the modules that you will need to use will be a challenge to jump straight into if you've never used perl at all before. You will almost certainly want to start by reading some introductory and intermediate perl books (or the free perl documentation) The standard books for perl beginners is Learning Perl by Randal Schwartz et al. and the standard perl Bible is Programming Perl by Larry Wall et al. You will want read at least the first one and have the second one handy as you jump into the documentation below. DBI, DBD::Pg and perhaps Class::DBI - For perl communicating with your database. Bio::SeqIO - For reading your sequences into Bioperl Bio::Tools::Run::Alignment::Clustalw - For aligning your sequences Bio::AlignIO::clustalw - For simple writing of the alignments. Depending on what you had in mind for scoring you may have to write your own code for that and the histogram display. Look at the GD module for your graphics needs. You mention several times that you want to display your results, and I'm going to guess you mean via a web site. I think CGI::Application is a great way to build the kind of database driven websites that I think you are considering here. Barry -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto: bioperl-l-bounces@portal.open-bio.org ] On Behalf Of Brian Osborne Sent: Thursday, October 13, 2005 8:00 PM To: Angshu Kar; bioperl-l Subject: Re: [Bioperl-l] Please help Angshu, You need to study Bioperl's documentation. Take a look at relevant HOWTOs (Beginners, SeqIO, SearchIO perhaps). Also, install the bioperl-run package and look at the Bio::Tools::Run::Clustalw module. Brian O. On 10/12/05 3:48 PM, "Angshu Kar" < angshu96@gmail.com> wrote: > Hi , > > I'm completely new to perl. I'm have to work in biology using perl, > postgresql (as database) and clustalw(as the alignment > tool). I'm stating my problem briefly: > > In the postgresql db the data is clustered using complete linkage > clustering. I've to connect to that db, fetch those data, feed it to > the multiple alignment tool, run it and show the results.Again, > feed those alignments into a scoring tool and show the results in > form of a histogram.All these needs to be automated using perl. > I've installed bio-perl but can't get how to write code using it. > > I'll be obliged if you help me with this by providing the idea of how > to feed the db data (in fasta format) into clustalw. > > Thanks, > Angshu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From bmoore at genetics.utah.edu Sat Oct 15 12:32:48 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Sat Oct 15 12:29:55 2005 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::Clustalw Message-ID: Angshu- I was under the mistaken impression that clustalw wasn't ported to Windows (I thought clustalx was the only Windows option). Clearly this is not the case, so disregard my previous reply to your e-mail. If you don't already have them, the binaries for clustalw for Windows are here: ftp://ftp.ebi.ac.uk/pub/software/dos/clustalw/. Barry -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Scott Markel Sent: Friday, October 14, 2005 3:45 PM To: Angshu Kar Cc: bioperl-l@bioperl.org Subject: Re: [Bioperl-l] Bio::Tools::Run::Alignment::Clustalw Angshu, I run ClustalW on Windows via BioPerl, but inside our data pipelining product. The BioPerl-related lines that I use are $ENV{CLUSTALDIR} = $clustalDirectory; my @parameters = ("dnamatrix" => $dnaMatrix, "gapopen" => $gapOpeningPenalty, "gapext" => $gapExtensionPenalty, "matrix" => $proteinMatrix, "outfile" => $outputFile); my $clustalFactory = Bio::Tools::Run::Alignment::Clustalw->new(@parameters); my $alignment = $clustalFactory->align($fastaFile); Let me know if you need more context. Scott Angshu Kar wrote: > Hi, > Has anyone used Bio::Tools::Run::Alignment::Clustalw in Windows? If so, > could you please let me know the steps for running the tool for a seq.fafile? > Thanks, > Angshu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel@scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Sat Oct 15 14:44:47 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat Oct 15 14:42:58 2005 Subject: [Bioperl-l] SimpleAlign add_seq ("%s/%d-%d",$id,$start,$end) In-Reply-To: <1129279184.7466.6.camel@localhost.localdomain> References: <1129279184.7466.6.camel@localhost.localdomain> Message-ID: <9ABF7867-841D-4B2D-A083-EB01B3D52E9F@duke.edu> I assume this is in the context of writing out the alignment? Do this before you write out the alignment: $aln->set_displayname_flat(1); -jason On Oct 14, 2005, at 4:39 AM, Albert Vilella wrote: > Hi all, > > I was wondering why the add_seq method in SimpleAlign will always give > the name of the sequence accompanied by a start and end tag. > > SimpleAlign.pm > line 256: > $name = sprintf("%s/%d-%d",$id,$start,$end); > > In one of my scripts I am using this method by I would not like to > have > this "/%d-%d" tail in the sequences' names. > > How do you think this should be addressed? Am I using a bad > combination > of method calls for my script? > > Bests, > > Albert. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From saldroubi at yahoo.com Sat Oct 15 22:54:22 2005 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Sat Oct 15 23:00:45 2005 Subject: [Bioperl-l] bioperl genbank output is different than web? Message-ID: <20051016025423.90068.qmail@web34314.mail.mud.yahoo.com> All, I am new to this so please forgive me for any dump quetsions. The problem: I am getting the genbank data for a gene using the simple code below and writing it to a file. The problem is that output is not the same as when I do a web search via ncbi entrez. I want to parse out on which chromosome gene is on. The web output gives me a correct chromosome number (i.e. chromosome="3") while the output from this code gives me (i.e. /chromosome="Bio::Annotation::SimpleValue=HASH(0x88be668)") Can someone help me figure out what is wrong please? The Code: -------------------- use Bio::DB::GenBank; use Bio::SeqIO; use Bio::Seq; $db_obj = Bio::DB::GenBank->new; $seq_obj = $db_obj->get_Seq_by_acc("NM_011674"); $seqio_obj = Bio::SeqIO->new(-file =>">data/genes/NM_011674.fa", -format=>"fasta"); $seqio_obj = Bio::SeqIO->new(-file =>">data/genes/NM_011674.gb", -format=>"genbank"); $seqio_obj->write_seq($seq_obj); --------------------- Sincerely, Sam Al-Droubi, M.S. saldroubi@yahoo.com From bmoore at genetics.utah.edu Sun Oct 16 07:21:50 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Sun Oct 16 07:30:43 2005 Subject: [Bioperl-l] bioperl genbank output is different than web? Message-ID: Sam- This is a known bug in bioperl 1.5 that has been fixed in the latest release. Get the new release (bioperl-1.5.1-rc3.tar.gz) here: http://bioperl.org/DIST and upgrade. Barry -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Sam Al-Droubi Sent: Saturday, October 15, 2005 8:54 PM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] bioperl genbank output is different than web? All, I am new to this so please forgive me for any dump quetsions. The problem: I am getting the genbank data for a gene using the simple code below and writing it to a file. The problem is that output is not the same as when I do a web search via ncbi entrez. I want to parse out on which chromosome gene is on. The web output gives me a correct chromosome number (i.e. chromosome="3") while the output from this code gives me (i.e. /chromosome="Bio::Annotation::SimpleValue=HASH(0x88be668)") Can someone help me figure out what is wrong please? The Code: -------------------- use Bio::DB::GenBank; use Bio::SeqIO; use Bio::Seq; $db_obj = Bio::DB::GenBank->new; $seq_obj = $db_obj->get_Seq_by_acc("NM_011674"); $seqio_obj = Bio::SeqIO->new(-file =>">data/genes/NM_011674.fa", -format=>"fasta"); $seqio_obj = Bio::SeqIO->new(-file =>">data/genes/NM_011674.gb", -format=>"genbank"); $seqio_obj->write_seq($seq_obj); --------------------- Sincerely, Sam Al-Droubi, M.S. saldroubi@yahoo.com _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From james.wasmuth at ed.ac.uk Sun Oct 16 06:47:32 2005 From: james.wasmuth at ed.ac.uk (James Wasmuth) Date: Sun Oct 16 07:38:12 2005 Subject: [Bioperl-l] bioperl genbank output is different than web? In-Reply-To: <20051016025423.90068.qmail@web34314.mail.mud.yahoo.com> References: <20051016025423.90068.qmail@web34314.mail.mud.yahoo.com> Message-ID: <43522FC4.8050600@ed.ac.uk> Hi Sam, your code works fine for me. > > /mol_type="mRNA" > /db_xref="taxon:10090" > /strain="FVB/N" > /chromosome="3" > /organism="Mus musculus" What version are you running? If you just want the chromosome number then have a look at the Feature Annotation HOWTOs: http://bioperl.org/HOWTOs/html/Feature-Annotation.html The code below is adopted from that. #!/usr/bin/perl -w use strict; use Bio::DB::RefSeq; use Bio::SeqIO; use Bio::Seq; my $db_obj = Bio::DB::RefSeq->new; my $seq_obj = $db_obj->get_Seq_by_acc("NM_011674"); for my $feat_object ($seq_obj->get_SeqFeatures) { next unless $feat_object->primary_tag eq 'source'; for my $tag ($feat_object->get_all_tags) { if ($tag eq 'chromosome') { for my $value ($feat_object->get_tag_values($tag)) { print "chromosome =", $value, "\n"; } } } } -james Sam Al-Droubi wrote: >All, > >I am new to this so please forgive me for any dump >quetsions. > >The problem: I am getting the genbank data for a gene >using the simple code below and writing it to a file. >The problem is that output is not the same as when I >do a web search via ncbi entrez. I want to parse out >on which chromosome gene is on. The web output gives >me a correct chromosome number (i.e. chromosome="3") >while the output from this code gives me (i.e. >/chromosome="Bio::Annotation::SimpleValue=HASH(0x88be668)") > >Can someone help me figure out what is wrong please? > >The Code: >-------------------- >use Bio::DB::GenBank; >use Bio::SeqIO; >use Bio::Seq; > >$db_obj = Bio::DB::GenBank->new; >$seq_obj = $db_obj->get_Seq_by_acc("NM_011674"); >$seqio_obj = Bio::SeqIO->new(-file >=>">data/genes/NM_011674.fa", -format=>"fasta"); >$seqio_obj = Bio::SeqIO->new(-file >=>">data/genes/NM_011674.gb", -format=>"genbank"); >$seqio_obj->write_seq($seq_obj); >--------------------- > > > > > > >Sincerely, >Sam Al-Droubi, M.S. >saldroubi@yahoo.com >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- "The best model of a cat is another cat or, better, the cat itself" -Norbert Wiener Blaxter Nematode Genomics Group | Institute of Evolutionary Biology | Ashworth Laboratories, KB | tel: +44 131 650 7403 University of Edinburgh | web: www.nematodes.org/~james Edinburgh | EH9 3JT | UK | From saldroubi at yahoo.com Sun Oct 16 09:42:37 2005 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Sun Oct 16 09:48:32 2005 Subject: [Bioperl-l] Can't find gene sequence in choromosome sequence Message-ID: <20051016134237.67298.qmail@web34308.mail.mud.yahoo.com> All, I downloaded the fasta sequence for a mouse gene from genbank with accession number NM_01167. I also downloaded the Mouse chromosome 3 fasta file from from ncbi (ftp://ftp.ncbi.nlm.nih.gov/genomes/M_musculus/Assembled_chromosomes/mm_chr3.fa.gz). The problem is that I can not find the gene sequence in chromosome sequence. I used Perl index($chr_obj->seq,$seq_obj->seq) and I get -1, meaning no match. I then searched by hand using grep and emacs and to my surprise, the gene sequence is not in the mm_chr3.fa file. What am I doing wrong? Do I have the wrong chromosome file? I am positive that this gene is in this chromosome according to genbank. By the way, I am doing this so that I can extract the promoter region right before the gene starts on the chromosome. Thank you in advance. Sincerely, Sam Al-Droubi, M.S. saldroubi@yahoo.com From pavlidis at imbb.forth.gr Sun Oct 16 12:37:51 2005 From: pavlidis at imbb.forth.gr (pavlidis@imbb.forth.gr) Date: Sun Oct 16 13:31:22 2005 Subject: [Bioperl-l] question about bioperl and microarrays Message-ID: <1129480671.435281dfd04e6@webmail.imbb.forth.gr> Hi, I would like to ask if there is any way using bioperl to retrieve microarray datasets from GEO or Stanford Database. I am a newbie in bioperl and perl so I torture myself for days. Specifically I would like to retrieve all the datasets which have a particular gene in them. Thank you very much Pavlos From olenka.m at gmail.com Sun Oct 16 20:17:05 2005 From: olenka.m at gmail.com (Olena Morozova) Date: Sun Oct 16 20:41:21 2005 Subject: [Bioperl-l] Re: grouping sequences by DNA-binding domains -- elaboration Message-ID: <259a224c0510161717u1591de3ayc01f83008dbd5d98@mail.gmail.com> Hi agian, I just figured out how to obtain a list of conserved domains for a given sequence using the SeqHound.pm module available at http://www.blueprint.org/seqhound/apifunctslist.html Now I have a list of conserved domains for a given sequence and I need to extract information as to what these domains are and which ones are DNA-binding. Any help on this will be greatly appreciated Thanks again, Olena On 10/16/05, Olena Morozova wrote: > I have a list of transcription factor sequences, and I need to group > them according to the DNA-binding domains based on the classification > by TRANSFAC or any other database. Basically, I just need to extract > the DNA-binding domain information for a particular TF from a database > like TRANSFAC (I don't know what other databases would have this > information, but any will do) Anyone has any idea how to do this? > Thank you very much for your help and time > > Olena > From bmoore at genetics.utah.edu Mon Oct 17 00:20:10 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Mon Oct 17 00:16:50 2005 Subject: [Bioperl-l] Can't find gene sequence in choromosome sequence Message-ID: Sam, Go to UCSC Genome database (http://genome.ucsc.edu). Click on Table Browser, Fill in the fields. Your accession number goes under region: position, and you want to select sequence as the output. Click on get output, choose genomic and then you'll get the option to specify how much upstream sequence you'd like to include. Barry -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Sam Al-Droubi Sent: Sunday, October 16, 2005 7:43 AM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] Can't find gene sequence in choromosome sequence All, I downloaded the fasta sequence for a mouse gene from genbank with accession number NM_01167. I also downloaded the Mouse chromosome 3 fasta file from from ncbi (ftp://ftp.ncbi.nlm.nih.gov/genomes/M_musculus/Assembled_chromosomes/mm_ chr3.fa.gz). The problem is that I can not find the gene sequence in chromosome sequence. I used Perl index($chr_obj->seq,$seq_obj->seq) and I get -1, meaning no match. I then searched by hand using grep and emacs and to my surprise, the gene sequence is not in the mm_chr3.fa file. What am I doing wrong? Do I have the wrong chromosome file? I am positive that this gene is in this chromosome according to genbank. By the way, I am doing this so that I can extract the promoter region right before the gene starts on the chromosome. Thank you in advance. Sincerely, Sam Al-Droubi, M.S. saldroubi@yahoo.com _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From olenka.m at gmail.com Sun Oct 16 19:17:12 2005 From: olenka.m at gmail.com (Olena Morozova) Date: Mon Oct 17 00:31:15 2005 Subject: [Bioperl-l] grouping sequences by DNA-binding domains Message-ID: <259a224c0510161617k163c3e98qfa5256bee84a79b2@mail.gmail.com> I have a list of transcription factor sequences, and I need to group them according to the DNA-binding domains based on the classification by TRANSFAC or any other database. Basically, I just need to extract the DNA-binding domain information for a particular TF from a database like TRANSFAC (I don't know what other databases would have this information, but any will do) Anyone has any idea how to do this? Thank you very much for your help and time Olena From arareko at campus.iztacala.unam.mx Mon Oct 17 01:21:53 2005 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon Oct 17 01:24:48 2005 Subject: [Bioperl-l] Re: Proposal for AI methods in Bioperl In-Reply-To: <434CB5AE.3040307@campus.iztacala.unam.mx> References: <434CB5AE.3040307@campus.iztacala.unam.mx> Message-ID: <435334F1.1080008@campus.iztacala.unam.mx> Thank you all (Michael, Hilmar, Sean, Paulo, Matthew & Rutger) for your kind suggestions. I'll take a look into all the software you recommend. Surely I'll find some excellent code to begin with this implementation experiment. Maybe I can get something useful to share with all of you. As soon as I finish the design I'll let you know to get further feedback. Thanks again for your time. Regards, Mauricio. -- MAURICIO HERRERA CUADRA arareko@campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From jason.stajich at duke.edu Mon Oct 17 11:45:18 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Oct 17 12:36:42 2005 Subject: [Bioperl-l] bioperl genbank output is different than web? In-Reply-To: References: Message-ID: <7BA62208-AD3F-46A3-8DA3-7D455D094C9B@duke.edu> Actually grab the official bioperl 1.5.1 release. ReleaseCandidates are archived but not in the main DIST directory anymore: http://news.open-bio.org/archives/2005_10.html#000084 On Oct 16, 2005, at 7:21 AM, Barry Moore wrote: > Sam- > > This is a known bug in bioperl 1.5 that has been fixed in the latest > release. Get the new release (bioperl-1.5.1-rc3.tar.gz) here: > http://bioperl.org/DIST and upgrade. > > Barry > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Sam > Al-Droubi > Sent: Saturday, October 15, 2005 8:54 PM > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] bioperl genbank output is different than web? > > All, > > I am new to this so please forgive me for any dump > quetsions. > > The problem: I am getting the genbank data for a gene > using the simple code below and writing it to a file. > The problem is that output is not the same as when I > do a web search via ncbi entrez. I want to parse out > on which chromosome gene is on. The web output gives > me a correct chromosome number (i.e. chromosome="3") > while the output from this code gives me (i.e. > /chromosome="Bio::Annotation::SimpleValue=HASH(0x88be668)") > > Can someone help me figure out what is wrong please? > > The Code: > -------------------- > use Bio::DB::GenBank; > use Bio::SeqIO; > use Bio::Seq; > > $db_obj = Bio::DB::GenBank->new; > $seq_obj = $db_obj->get_Seq_by_acc("NM_011674"); > $seqio_obj = Bio::SeqIO->new(-file > =>">data/genes/NM_011674.fa", -format=>"fasta"); > $seqio_obj = Bio::SeqIO->new(-file > =>">data/genes/NM_011674.gb", -format=>"genbank"); > $seqio_obj->write_seq($seq_obj); > --------------------- > > > > > > > Sincerely, > Sam Al-Droubi, M.S. > saldroubi@yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From saldroubi at yahoo.com Mon Oct 17 15:40:46 2005 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Mon Oct 17 15:46:33 2005 Subject: [Bioperl-l] HOWTO Beginners: formatdb program/indexing a database sequence file? Message-ID: <20051017194046.82515.qmail@web34304.mail.mud.yahoo.com> All, Under the BLAST section, the Beginners HOWTO (http://bioperl.org/HOWTOs/html/Beginners.html ) says "The example code assumes that you used the formatdb program to index the database sequence file "db.fa"." Could someone tell me who to go about creating the db.fa file. looked on the webiste but there was no reference to a formatdb program. My goal is: I have a gene sequence and I want to align it again the chromosome sequence, both are fasta format. Can I do this with the blast program on my own computer? Thank you in advance. Sincerely, Sam Al-Droubi, M.S. saldroubi@yahoo.com From Peter.Robinson at t-online.de Mon Oct 17 15:27:06 2005 From: Peter.Robinson at t-online.de (Peter.Robinson@t-online.de) Date: Mon Oct 17 16:27:22 2005 Subject: [Bioperl-l] FASTA 2 GenBank In-Reply-To: <2d4f32050926162040f2d8d3@mail.gmail.com> References: <932ed26e050921064073bc5936@mail.gmail.com> <778CAFE2-2112-4561-B1E8-465250ECDAD5@duke.edu> <2d4f32050926162040f2d8d3@mail.gmail.com> Message-ID: <20051017192706.GA6558@anna> Dear bioperlers, forgive what may be a simple question, but consulting the howtos and Google did not reveal an answer to me. I am in the process of analyzing ESTs from a nonmodel organism and would like to build GenBank style files for the contig sequences by adding in information about sequence features. I would like to start by adding info about the presumed ORF as follows: ## 1) This is the 'new' sequence my $seqio = new Bio::SeqIO('-file' => $inname , '-format' => 'fasta'); my $seq = $seqio->next_seq(); ## 2) This is the feature I would like to add, with $startpos ## and $endpos being the start/end of the ORF based on translations ## and alignments my $feat = new Bio::SeqFeature::Generic ( -start => $startpos, -end => $endpos, -strand => 1, -primary => 'CDS', -source => 'Manual annotation of CDS', ); $seq->add_SeqFeature($feat); ## 3) Here I would like to output the sequence in GenBank format my $out = Bio::SeqIO->new(-file => ">$outputfilename", -format => 'EMBL'); $out->write_seq($seq); ### However, I get this: ID ABC2002.1 standard; DNA; UNK; 5914 BP. XX AC unknown; XX DE /early=858 /middle=1093 /late=436 XX FH Key Location/Qualifiers FH FT CDS 104..4501 XX SQ Sequence 5914 BP; 1088 A; 1893 C; 1748 G; 1174 T; 11 other; acgt.... But I would like to get something like this: LOCUS XM_213440 5804 bp mRNA linear ROD 15-APR-2005 DEFINITION PREDICTED: Rattus norvegicus collagen, type 1, alpha 1 (Col1a1), mRNA. ACCESSION XM_213440 VERSION XM_213440.3 GI:62656859 KEYWORDS . SOURCE Rattus norvegicus (Norway rat) ORGANISM Rattus norvegicus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Rattus. COMMENT MODEL REFSEQ: This record is predicted by automated computational analysis. This record is derived from an annotated genomic sequence (NW_047337) using gene prediction method: GNOMON, supported by mRNA and EST evidence. Also see: Documentation of NCBI's Annotation Process On Apr 15, 2005 this sequence version replaced gi:34873454. FEATURES Location/Qualifiers source 1..5804 /organism="Rattus norvegicus" /mol_type="mRNA" /strain="BN/SsNHsdMCW" /db_xref="taxon:10116" /chromosome="10" gene 1..5804 /gene="Col1a1" /note="Derived by automated computational analysis using gene prediction method: GNOMON. Supporting evidence includes similarity to: 2 mRNAs, 48 ESTs, 1 Protein" /db_xref="GeneID:29393" /db_xref="RGD:61817" CDS 95..4456 /gene="Col1a1" /codon_start=1 /product="similar to Collagen alpha1" /protein_id="XP_213440.1" /db_xref="GI:27688933" /db_xref="GeneID:29393" /db_xref="RGD:61817" /translation="MFSFVDLRLLLLLGATALLTHGQEDIPEVSCIHNGLRVPNGETW KPDVCLICICHNGTAVCDGVLCKEDLDCPNPQKREGECCPFCPEEYVSPDAEVIGVEG etc " ORIGIN 1 gacggagcag gaggcacacg gagtgaggcc acgcatgagc cgaagctaac cccccacccc 61 agccgcaaag agtctacatg tctagggtct agacatgttc a I would be happy if I could get the CDS bit right and very happy if I could add some further information in the above style. At the moment some downstream applications are not working because the GenBank format is incorrect. Thanks , Peter From jason.stajich at duke.edu Mon Oct 17 16:39:52 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Oct 17 16:37:49 2005 Subject: [Bioperl-l] FASTA 2 GenBank In-Reply-To: <20051017192706.GA6558@anna> References: <932ed26e050921064073bc5936@mail.gmail.com> <778CAFE2-2112-4561-B1E8-465250ECDAD5@duke.edu> <2d4f32050926162040f2d8d3@mail.gmail.com> <20051017192706.GA6558@anna> Message-ID: <0DF1B7AA-65D6-4F7F-9593-78491F7CC43C@duke.edu> Well asking for 'EMBL' format instead of 'genbank' is your first problem, you'll not get very far without specifying the right format. As for all the rest of the information, you need to add it to the sequence object yourself. See the feature-annotation HOWTO. You'll want to add annotations and features. This is described here: http://bioperl.org/HOWTOs/html/Feature-Annotation.html#annotation -jason On Oct 17, 2005, at 3:27 PM, Peter.Robinson@t-online.de wrote: > Dear bioperlers, > > forgive what may be a simple question, but consulting the howtos > and Google did not reveal an answer to me. > I am in the process of analyzing ESTs from a nonmodel organism and > would like to build GenBank style files for the contig sequences by > adding in information about sequence features. I would like to > start by adding info about the presumed ORF as follows: > > > ## 1) This is the 'new' sequence > my $seqio = new Bio::SeqIO('-file' => $inname , '-format' => > 'fasta'); > my $seq = $seqio->next_seq(); > > ## 2) This is the feature I would like to add, with $startpos > ## and $endpos being the start/end of the ORF based on translations > ## and alignments > my $feat = new Bio::SeqFeature::Generic ( -start => $startpos, > -end => $endpos, > -strand => 1, > -primary => 'CDS', > -source => 'Manual annotation of CDS', > ); > $seq->add_SeqFeature($feat); > ## 3) Here I would like to output the sequence in GenBank format > my $out = Bio::SeqIO->new(-file => ">$outputfilename", > -format => 'EMBL'); > $out->write_seq($seq); > > > ### However, I get this: > > ID ABC2002.1 standard; DNA; UNK; 5914 BP. > XX > AC unknown; > XX > DE /early=858 /middle=1093 /late=436 > XX > FH Key Location/Qualifiers > FH > FT CDS 104..4501 > XX > SQ Sequence 5914 BP; 1088 A; 1893 C; 1748 G; 1174 T; 11 other; > acgt.... > > But I would like to get something like this: > > LOCUS XM_213440 5804 bp mRNA linear ROD > 15-APR-2005 > DEFINITION PREDICTED: Rattus norvegicus collagen, type 1, alpha 1 > (Col1a1), > mRNA. > ACCESSION XM_213440 > VERSION XM_213440.3 GI:62656859 > KEYWORDS . > SOURCE Rattus norvegicus (Norway rat) > ORGANISM Rattus norvegicus > Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; > Euteleostomi; > Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; > Sciurognathi; Muroidea; Muridae; Murinae; Rattus. > COMMENT MODEL REFSEQ: This record is predicted by automated > computational > analysis. This record is derived from an annotated > genomic sequence > (NW_047337) using gene prediction method: GNOMON, > supported by mRNA > and EST evidence. > Also see: > Documentation of NCBI's Annotation Process > > On Apr 15, 2005 this sequence version replaced gi: > 34873454. > FEATURES Location/Qualifiers > source 1..5804 > /organism="Rattus norvegicus" > /mol_type="mRNA" > /strain="BN/SsNHsdMCW" > /db_xref="taxon:10116" > /chromosome="10" > gene 1..5804 > /gene="Col1a1" > /note="Derived by automated computational > analysis using > gene prediction method: GNOMON. Supporting > evidence > includes similarity to: 2 mRNAs, 48 ESTs, 1 > Protein" > /db_xref="GeneID:29393" > /db_xref="RGD:61817" > CDS 95..4456 > /gene="Col1a1" > /codon_start=1 > /product="similar to Collagen alpha1" > /protein_id="XP_213440.1" > /db_xref="GI:27688933" > /db_xref="GeneID:29393" > /db_xref="RGD:61817" > / > translation="MFSFVDLRLLLLLGATALLTHGQEDIPEVSCIHNGLRVPNGETW > > KPDVCLICICHNGTAVCDGVLCKEDLDCPNPQKREGECCPFCPEEYVSPDAEVIGVEG > etc " > ORIGIN > 1 gacggagcag gaggcacacg gagtgaggcc acgcatgagc cgaagctaac > cccccacccc > 61 agccgcaaag agtctacatg tctagggtct agacatgttc a > > > I would be happy if I could get the CDS bit right and very happy if > I could add some further information in the above style. At the > moment some downstream applications are not working because the > GenBank format is incorrect. > > Thanks , > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From bmoore at genetics.utah.edu Mon Oct 17 17:14:55 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Mon Oct 17 17:11:30 2005 Subject: [Bioperl-l] HOWTO Beginners: formatdb program/indexing a databasesequence file? Message-ID: Sam- You can run blast on your own computer. formatdb is a program that comes with the blast distribution that formats sequences in a fasta file into a blast database for use in blast searches. If you have installed BLAST locally, then you have formatdb. Documentation for formatdb is found with man formatdb on a linux system, or here for an online source: ftp://ftp.ncbi.nih.gov/blast/documents/ The db.fa file is any fasta file that you want to create a blast database out of with formatdb. A couple of your questions suggest that you would greatly benefit from exploring the UCSC genome browser at http://genome.ucsc.edu/. In this case you have a gene sequence that you want to align to a genome. BLAST will get you started, but you're better off with BLAT for that job. If you're doing this for one of the many genomes covered in the UCSC genome browser, then they've probably already done this for you. If you're doing it for something new or for a lot of genes then you can run BLAT locally, and Bioperl can help you with the that: Bio::Tools::Run::Alignment::Blat.html Bio::Tools::Blat.html Barry -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Sam Al-Droubi Sent: Monday, October 17, 2005 1:41 PM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] HOWTO Beginners: formatdb program/indexing a databasesequence file? All, Under the BLAST section, the Beginners HOWTO (http://bioperl.org/HOWTOs/html/Beginners.html ) says "The example code assumes that you used the formatdb program to index the database sequence file "db.fa"." Could someone tell me who to go about creating the db.fa file. looked on the webiste but there was no reference to a formatdb program. My goal is: I have a gene sequence and I want to align it again the chromosome sequence, both are fasta format. Can I do this with the blast program on my own computer? Thank you in advance. Sincerely, Sam Al-Droubi, M.S. saldroubi@yahoo.com _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From golharam at umdnj.edu Mon Oct 17 16:11:07 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon Oct 17 17:12:09 2005 Subject: [Bioperl-l] HOWTO Beginners: formatdb program/indexing a databasesequence file? In-Reply-To: <20051017194046.82515.qmail@web34304.mail.mud.yahoo.com> Message-ID: <006901c5d356$e94dded0$2f01a8c0@GOLHARMOBILE1> db.fa is a fasta file containing the sequences you want to make a blast'able database out of. formatdb is a program that comes with BLAST or the NCBI Toolkit. You need to download it from NCBI. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Sam Al-Droubi Sent: Monday, October 17, 2005 3:41 PM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] HOWTO Beginners: formatdb program/indexing a databasesequence file? All, Under the BLAST section, the Beginners HOWTO (http://bioperl.org/HOWTOs/html/Beginners.html ) says "The example code assumes that you used the formatdb program to index the database sequence file "db.fa"." Could someone tell me who to go about creating the db.fa file. looked on the webiste but there was no reference to a formatdb program. My goal is: I have a gene sequence and I want to align it again the chromosome sequence, both are fasta format. Can I do this with the blast program on my own computer? Thank you in advance. Sincerely, Sam Al-Droubi, M.S. saldroubi@yahoo.com _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From bmoore at genetics.utah.edu Mon Oct 17 19:00:46 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Mon Oct 17 18:57:28 2005 Subject: [Bioperl-l] HOWTO Beginners: formatdb program/indexing a databasesequence file? Message-ID: Sam- I appreciate your plight. I've been there myself many times. If wish I could offer more help, but I don't really know what the answer to your problem is. The reason I am trying to keep these questions on the bioperl list and on the bioperl topic is that those of us answering questions on this list have to balance our time between answering questions and getting our own work done. I actually tried to recreate your error message on my system, and the only way I could do it was to remove the read permissions on the file. I can see from your directory listing that that isn't your problem. Here are some things to try: Try specifying the full path to your ecoli.nt in your -d flag Set the environment variable BLASTDB to point to your db directory If you have root access change the ownership of those files to you. If you don't have root access talk to your SA. Write to the NCBI help desk (they're sometimes slow, but they should answer) Try some of the bioinformatics forums at bioinformatics.org They are more general and might be able to help. Barry > -----Original Message----- > From: Sam Al-Droubi [mailto:saldroubi@yahoo.com] > Sent: Monday, October 17, 2005 4:00 PM > To: Barry Moore > Subject: RE: [Bioperl-l] HOWTO Beginners: formatdb program/indexing a > databasesequence file? > > > Barry, > > I did check the file and it is there and the > permissions are read for all and I even made them > owned by oracle (the unix user). The reason I > installed blast is to use it via bioperl. I don't > know who else I can turn to for help. I tried google > but it didn't come up with anything useful. So I > really appreciate any help. > > > The directory list is below: > m1:/usr/src/blast/blast-2.2.12/data # ls -l > total 6707 > drwxr-xr-x 2 3755 5333 1096 Oct 17 22:57 . > drwxr-xr-x 5 3755 5333 144 Aug 28 10:38 .. > -rw-r--r-- 1 oracle users 2122 Aug 28 10:38 > BLOSUM45 > -rw-r--r-- 1 oracle users 2122 Aug 28 10:38 > BLOSUM62 > -rw-r--r-- 1 oracle users 2124 Aug 28 10:38 > BLOSUM80 > -rw-r--r-- 1 oracle users 1521 Aug 28 10:38 > KSat.flt > -rw-r--r-- 1 oracle users 736 Aug 28 10:38 > KSchoth.flt > -rw-r--r-- 1 oracle users 1521 Aug 28 10:38 > KSgc.flt > -rw-r--r-- 1 oracle users 852 Aug 28 10:38 > KShopp.flt > -rw-r--r-- 1 oracle users 940 Aug 28 10:38 > KSkyte.flt > -rw-r--r-- 1 oracle users 1320 Aug 28 10:38 > KSpcc.mat > -rw-r--r-- 1 oracle users 1521 Aug 28 10:38 > KSpur.flt > -rw-r--r-- 1 oracle users 1521 Aug 28 10:38 > KSpyr.flt > -rw-r--r-- 1 oracle users 2666 Aug 28 10:38 PAM30 > -rw-r--r-- 1 oracle users 2666 Aug 28 10:38 PAM70 > -rw-r--r-- 1 oracle users 72720 Aug 28 10:38 > asn2ff.prt > -rw-r--r-- 1 oracle users 148763 Aug 28 10:38 > bstdt.val > -rw-r--r-- 1 oracle users 4763013 Oct 17 22:55 > ecoli.nt > -rw-r--r-- 1 oracle users 60292 Oct 17 22:57 > ecoli.nt.nhr > -rw-r--r-- 1 oracle users 4876 Oct 17 22:57 > ecoli.nt.nin > -rw-r--r-- 1 oracle users 3200 Oct 17 22:57 > ecoli.nt.nnd > -rw-r--r-- 1 oracle users 60 Oct 17 22:57 > ecoli.nt.nni > -rw-r--r-- 1 oracle users 58320 Oct 17 22:57 > ecoli.nt.nsd > -rw-r--r-- 1 oracle users 1264 Oct 17 22:57 > ecoli.nt.nsi > -rw-r--r-- 1 oracle users 1165813 Oct 17 22:57 > ecoli.nt.nsq > -rw-r--r-- 1 oracle users 7708 Aug 28 10:38 > featdef.val > -rw-r--r-- 1 oracle users 173 Oct 17 22:57 > formatdb.log > -rw-r--r-- 1 oracle users 3297 Aug 28 10:38 gc.val > -rw-r--r-- 1 oracle users 50078 Aug 28 10:38 > humrep.fsa > -rw-r--r-- 1 oracle users 109993 Aug 28 10:38 > lineages.txt > -rw-r--r-- 1 oracle users 64663 Aug 28 10:38 > makerpt.prt > -rw-r--r-- 1 oracle users 49810 Aug 28 10:38 > objprt.prt > -rw-r--r-- 1 oracle users 1112 Aug 28 10:38 > pubkey.enc > -rw-r--r-- 1 oracle users 6034 Aug 28 10:38 > seqcode.val > -rw-r--r-- 1 oracle users 154962 Aug 28 10:38 > sequin.hlp > -rw-r--r-- 1 oracle users 480 Aug 28 10:38 > sgmlbb.ent > -rw-r--r-- 1 oracle users 33461 Aug 28 10:38 > taxlist.txt > > > > --- Barry Moore wrote: > > > Sam- > > > > We're getting a bit off topic here for the bioperl > > list. You've checked > > that ecoli.nt.nin is there and has the correct > > permissions. > > > > Barry > > > > > -----Original Message----- > > > From: Sam Al-Droubi [mailto:saldroubi@yahoo.com] > > > Sent: Monday, October 17, 2005 3:38 PM > > > To: Barry Moore > > > Subject: RE: [Bioperl-l] HOWTO Beginners: formatdb > > program/indexing a > > > databasesequence file? > > > > > > Hi Barry, > > > > > > Thank you very very much for this info. I > > downloaded > > > blast and installed it following the directions. > > > > > > I am trying to run the ecoli test as suggested but > > it > > > does not seem that blastall knows where the data > > > directory is even though it is specified in the > > > .ncbirc file as follows: > > > > > > [NCBI] > > > Data="/usr/src/blast/blast-2.2.12/data/" > > > > > > I am getting this error: > > > > > > blastall -p blastn -d ecoli.nt -i test.txt -o > > test.out > > > [blastall] WARNING: Test: Unable to open > > > > > > Any idea what could be wrong? > > > > > > Thank you again. > > > > > > --- Barry Moore wrote: > > > > > > > Sam- > > > > > > > > You can run blast on your own computer. > > formatdb is > > > > a program that > > > > comes with the blast distribution that formats > > > > sequences in a fasta file > > > > into a blast database for use in blast searches. > > If > > > > you have installed > > > > BLAST locally, then you have formatdb. > > > > Documentation for formatdb is > > > > found with man formatdb on a linux system, or > > here > > > > for an online source: > > > > ftp://ftp.ncbi.nih.gov/blast/documents/ > > > > > > > > The db.fa file is any fasta file that you want > > to > > > > create a blast > > > > database out of with formatdb. > > > > > > > > A couple of your questions suggest that you > > would > > > > greatly benefit from > > > > exploring the UCSC genome browser at > > > > http://genome.ucsc.edu/. In this > > > > case you have a gene sequence that you want to > > align > > > > to a genome. BLAST > > > > will get you started, but you're better off with > > > > BLAT for that job. If > > > > you're doing this for one of the many genomes > > > > covered in the UCSC genome > > > > browser, then they've probably already done this > > for > > > > you. If you're > > > > doing it for something new or for a lot of genes > > > > then you can run BLAT > > > > locally, and Bioperl can help you with the that: > > > > > > > > Bio::Tools::Run::Alignment::Blat.html > > > > Bio::Tools::Blat.html > > > > > > > > Barry > > > > > > > > -----Original Message----- > > > > From: bioperl-l-bounces@portal.open-bio.org > > > > [mailto:bioperl-l-bounces@portal.open-bio.org] > > On > > > > Behalf Of Sam > > > > Al-Droubi > > > > Sent: Monday, October 17, 2005 1:41 PM > > > > To: bioperl-l@portal.open-bio.org > > > > Subject: [Bioperl-l] HOWTO Beginners: formatdb > > > > program/indexing a > > > > databasesequence file? > > > > > > > > All, > > > > > > > > Under the BLAST section, the Beginners HOWTO > > > > (http://bioperl.org/HOWTOs/html/Beginners.html > > ) > > > > says > > > > > > > > "The example code assumes that you used the > > formatdb > > > > program to index > > > > the database sequence file "db.fa"." > > > > > > > > Could someone tell me who to go about creating > > the > > > > db.fa file. looked > > > > on the webiste but there was no reference to a > > > > formatdb program. > > > > > > > > My goal is: I have a gene sequence and I want > > to > > > > align it again the > > > > chromosome sequence, both are fasta format. Can > > I > > > > do this with the > > > > blast program on my own computer? > > > > > > > > > > > > Thank you in advance. > > > > > > > > > > > > > > > > Sincerely, > > > > Sam Al-Droubi, M.S. > > > > saldroubi@yahoo.com > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > > > > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > Sincerely, > > > Sam Al-Droubi, M.S. > > > saldroubi@yahoo.com > > > > > Sincerely, > Sam Al-Droubi, M.S. > saldroubi@yahoo.com From golharam at umdnj.edu Mon Oct 17 21:34:34 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon Oct 17 22:33:42 2005 Subject: [Bioperl-l] HOWTO Beginners: formatdb program/indexing adatabasesequence file? In-Reply-To: Message-ID: <009001c5d384$18d2f730$2f01a8c0@GOLHARMOBILE1> NCBI is pretty good at answering questions. Try emailing NCBI. I forget their email address, but they give it on the web page. Also, I've built an RPM for the NCBI toolkit available at http://serine.umdnj.edu/~golharam/biorpms that installs the toolkit in /usr/local/ncbi. It places a shell script in /etc/profile.d to automatically set necessary environment variables. You can download the RPM and install it, then put your database files in /usr/local/ncbi/db. That should be it. Ryan -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Barry Moore Sent: Monday, October 17, 2005 7:01 PM To: Sam Al-Droubi; bioperl-l@portal.open-bio.org Subject: RE: [Bioperl-l] HOWTO Beginners: formatdb program/indexing adatabasesequence file? Sam- I appreciate your plight. I've been there myself many times. If wish I could offer more help, but I don't really know what the answer to your problem is. The reason I am trying to keep these questions on the bioperl list and on the bioperl topic is that those of us answering questions on this list have to balance our time between answering questions and getting our own work done. I actually tried to recreate your error message on my system, and the only way I could do it was to remove the read permissions on the file. I can see from your directory listing that that isn't your problem. Here are some things to try: Try specifying the full path to your ecoli.nt in your -d flag Set the environment variable BLASTDB to point to your db directory If you have root access change the ownership of those files to you. If you don't have root access talk to your SA. Write to the NCBI help desk (they're sometimes slow, but they should answer) Try some of the bioinformatics forums at bioinformatics.org They are more general and might be able to help. Barry > -----Original Message----- > From: Sam Al-Droubi [mailto:saldroubi@yahoo.com] > Sent: Monday, October 17, 2005 4:00 PM > To: Barry Moore > Subject: RE: [Bioperl-l] HOWTO Beginners: formatdb program/indexing a > databasesequence file? > > > Barry, > > I did check the file and it is there and the > permissions are read for all and I even made them > owned by oracle (the unix user). The reason I > installed blast is to use it via bioperl. I don't > know who else I can turn to for help. I tried google > but it didn't come up with anything useful. So I > really appreciate any help. > > > The directory list is below: m1:/usr/src/blast/blast-2.2.12/data # ls > -l total 6707 > drwxr-xr-x 2 3755 5333 1096 Oct 17 22:57 . > drwxr-xr-x 5 3755 5333 144 Aug 28 10:38 .. > -rw-r--r-- 1 oracle users 2122 Aug 28 10:38 > BLOSUM45 > -rw-r--r-- 1 oracle users 2122 Aug 28 10:38 > BLOSUM62 > -rw-r--r-- 1 oracle users 2124 Aug 28 10:38 > BLOSUM80 > -rw-r--r-- 1 oracle users 1521 Aug 28 10:38 > KSat.flt > -rw-r--r-- 1 oracle users 736 Aug 28 10:38 > KSchoth.flt > -rw-r--r-- 1 oracle users 1521 Aug 28 10:38 > KSgc.flt > -rw-r--r-- 1 oracle users 852 Aug 28 10:38 > KShopp.flt > -rw-r--r-- 1 oracle users 940 Aug 28 10:38 > KSkyte.flt > -rw-r--r-- 1 oracle users 1320 Aug 28 10:38 > KSpcc.mat > -rw-r--r-- 1 oracle users 1521 Aug 28 10:38 > KSpur.flt > -rw-r--r-- 1 oracle users 1521 Aug 28 10:38 > KSpyr.flt > -rw-r--r-- 1 oracle users 2666 Aug 28 10:38 PAM30 > -rw-r--r-- 1 oracle users 2666 Aug 28 10:38 PAM70 > -rw-r--r-- 1 oracle users 72720 Aug 28 10:38 > asn2ff.prt > -rw-r--r-- 1 oracle users 148763 Aug 28 10:38 > bstdt.val > -rw-r--r-- 1 oracle users 4763013 Oct 17 22:55 > ecoli.nt > -rw-r--r-- 1 oracle users 60292 Oct 17 22:57 > ecoli.nt.nhr > -rw-r--r-- 1 oracle users 4876 Oct 17 22:57 > ecoli.nt.nin > -rw-r--r-- 1 oracle users 3200 Oct 17 22:57 > ecoli.nt.nnd > -rw-r--r-- 1 oracle users 60 Oct 17 22:57 > ecoli.nt.nni > -rw-r--r-- 1 oracle users 58320 Oct 17 22:57 > ecoli.nt.nsd > -rw-r--r-- 1 oracle users 1264 Oct 17 22:57 > ecoli.nt.nsi > -rw-r--r-- 1 oracle users 1165813 Oct 17 22:57 > ecoli.nt.nsq > -rw-r--r-- 1 oracle users 7708 Aug 28 10:38 > featdef.val > -rw-r--r-- 1 oracle users 173 Oct 17 22:57 > formatdb.log > -rw-r--r-- 1 oracle users 3297 Aug 28 10:38 gc.val > -rw-r--r-- 1 oracle users 50078 Aug 28 10:38 > humrep.fsa > -rw-r--r-- 1 oracle users 109993 Aug 28 10:38 > lineages.txt > -rw-r--r-- 1 oracle users 64663 Aug 28 10:38 > makerpt.prt > -rw-r--r-- 1 oracle users 49810 Aug 28 10:38 > objprt.prt > -rw-r--r-- 1 oracle users 1112 Aug 28 10:38 > pubkey.enc > -rw-r--r-- 1 oracle users 6034 Aug 28 10:38 > seqcode.val > -rw-r--r-- 1 oracle users 154962 Aug 28 10:38 > sequin.hlp > -rw-r--r-- 1 oracle users 480 Aug 28 10:38 > sgmlbb.ent > -rw-r--r-- 1 oracle users 33461 Aug 28 10:38 > taxlist.txt > > > > --- Barry Moore wrote: > > > Sam- > > > > We're getting a bit off topic here for the bioperl > > list. You've checked > > that ecoli.nt.nin is there and has the correct > > permissions. > > > > Barry > > > > > -----Original Message----- > > > From: Sam Al-Droubi [mailto:saldroubi@yahoo.com] > > > Sent: Monday, October 17, 2005 3:38 PM > > > To: Barry Moore > > > Subject: RE: [Bioperl-l] HOWTO Beginners: formatdb > > program/indexing a > > > databasesequence file? > > > > > > Hi Barry, > > > > > > Thank you very very much for this info. I > > downloaded > > > blast and installed it following the directions. > > > > > > I am trying to run the ecoli test as suggested but > > it > > > does not seem that blastall knows where the data directory is even > > > though it is specified in the .ncbirc file as follows: > > > > > > [NCBI] > > > Data="/usr/src/blast/blast-2.2.12/data/" > > > > > > I am getting this error: > > > > > > blastall -p blastn -d ecoli.nt -i test.txt -o > > test.out > > > [blastall] WARNING: Test: Unable to open > > > > > > Any idea what could be wrong? > > > > > > Thank you again. > > > > > > --- Barry Moore wrote: > > > > > > > Sam- > > > > > > > > You can run blast on your own computer. > > formatdb is > > > > a program that > > > > comes with the blast distribution that formats sequences in a > > > > fasta file into a blast database for use in blast searches. > > If > > > > you have installed > > > > BLAST locally, then you have formatdb. > > > > Documentation for formatdb is > > > > found with man formatdb on a linux system, or > > here > > > > for an online source: ftp://ftp.ncbi.nih.gov/blast/documents/ > > > > > > > > The db.fa file is any fasta file that you want > > to > > > > create a blast > > > > database out of with formatdb. > > > > > > > > A couple of your questions suggest that you > > would > > > > greatly benefit from > > > > exploring the UCSC genome browser at http://genome.ucsc.edu/. > > > > In this case you have a gene sequence that you want to > > align > > > > to a genome. BLAST > > > > will get you started, but you're better off with > > > > BLAT for that job. If > > > > you're doing this for one of the many genomes > > > > covered in the UCSC genome > > > > browser, then they've probably already done this > > for > > > > you. If you're > > > > doing it for something new or for a lot of genes > > > > then you can run BLAT > > > > locally, and Bioperl can help you with the that: > > > > > > > > Bio::Tools::Run::Alignment::Blat.html > > > > Bio::Tools::Blat.html > > > > > > > > Barry > > > > > > > > -----Original Message----- > > > > From: bioperl-l-bounces@portal.open-bio.org > > > > [mailto:bioperl-l-bounces@portal.open-bio.org] > > On > > > > Behalf Of Sam > > > > Al-Droubi > > > > Sent: Monday, October 17, 2005 1:41 PM > > > > To: bioperl-l@portal.open-bio.org > > > > Subject: [Bioperl-l] HOWTO Beginners: formatdb program/indexing > > > > a databasesequence file? > > > > > > > > All, > > > > > > > > Under the BLAST section, the Beginners HOWTO > > > > (http://bioperl.org/HOWTOs/html/Beginners.html > > ) > > > > says > > > > > > > > "The example code assumes that you used the > > formatdb > > > > program to index > > > > the database sequence file "db.fa"." > > > > > > > > Could someone tell me who to go about creating > > the > > > > db.fa file. looked > > > > on the webiste but there was no reference to a > > > > formatdb program. > > > > > > > > My goal is: I have a gene sequence and I want > > to > > > > align it again the > > > > chromosome sequence, both are fasta format. Can > > I > > > > do this with the > > > > blast program on my own computer? > > > > > > > > > > > > Thank you in advance. > > > > > > > > > > > > > > > > Sincerely, > > > > Sam Al-Droubi, M.S. > > > > saldroubi@yahoo.com > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > > > > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > Sincerely, > > > Sam Al-Droubi, M.S. > > > saldroubi@yahoo.com > > > > > Sincerely, > Sam Al-Droubi, M.S. > saldroubi@yahoo.com _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From fangl at genomics.org.cn Tue Oct 18 01:43:34 2005 From: fangl at genomics.org.cn (Magic Fang) Date: Tue Oct 18 02:01:15 2005 Subject: [Bioperl-l] can i get phrap assmebly error rate? Message-ID: <43548B86.8010805@genomics.org.cn> hi, if i can use assembly package to get the phrap assmebly error rate from a ace file? an error rate should like the one shown in consed as err/10k etc. thank you. From dbichko at aveopharma.com Tue Oct 18 03:49:10 2005 From: dbichko at aveopharma.com (Dmitri Bichko) Date: Tue Oct 18 03:51:12 2005 Subject: [Bioperl-l] Alphabet guessing Message-ID: Hi, Is being unable to guess the sequence alphabet really an unrecoverable error? I'm referring to this bit in PrimarySeq.pm: my $str = $self->seq(); $str =~ s/[-.?x]//gi; my $total = CORE::length($str); if( $total == 0 ) { $self->throw("Got a sequence with no letters in it ". "cannot guess alphabet [$str]"); } Problem is that if you happen on a seq that's all X's, you get a fatal exception, which can be very annoying when you are in the middle of a 15 million sequence fasta stream (where you don't care about, nor even expect the alphabet type; and the docs suggest that you can't necessarily recover after catching exceptions). Might not something along these lines make more sense: if( $total == 0 ) { $self->warn("Got a sequence with no letters in it, assuming 'dna' alphabet."); $self->alphabet('dna'); return 'dna'; } Or should the seqio factories catch the guessing exceptions? Thanks, Dmitri From jason.stajich at duke.edu Tue Oct 18 08:06:41 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Oct 18 08:05:13 2005 Subject: [Bioperl-l] Alphabet guessing In-Reply-To: References: Message-ID: <10830F5B-46C3-4CCE-AB0F-8D492F1B45E6@duke.edu> From the Bio::SeqIO documentation -alphabet Sets the alphabet ('dna', 'rna', or 'protein'). When the alphabet is set then Bioperl will not attempt to guess what the alphabet is. This may be important because Bioperl does not always guess correctly. You can pre-specify the alphabet: $seqio = Bio::SeqIO->new(-format => 'fasta', -file => "fifteen_million_sequence_file.fa", -alphabet => 'dna'); -jason On Oct 18, 2005, at 3:49 AM, Dmitri Bichko wrote: > Hi, > > Is being unable to guess the sequence alphabet really an unrecoverable > error? I'm referring to this bit in PrimarySeq.pm: > > my $str = $self->seq(); > $str =~ s/[-.?x]//gi; > my $total = CORE::length($str); > if( $total == 0 ) { > $self->throw("Got a sequence with no letters in it ". > "cannot guess alphabet [$str]"); > } > > Problem is that if you happen on a seq that's all X's, you get a fatal > exception, which can be very annoying when you are in the middle of > a 15 > million sequence fasta stream (where you don't care about, nor even > expect the alphabet type; and the docs suggest that you can't > necessarily recover after catching exceptions). > > Might not something along these lines make more sense: > > if( $total == 0 ) { > $self->warn("Got a sequence with no letters in it, assuming 'dna' > alphabet."); > $self->alphabet('dna'); > return 'dna'; > } > > Or should the seqio factories catch the guessing exceptions? > > Thanks, > Dmitri > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From brian_osborne at cognia.com Tue Oct 18 09:45:48 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Oct 18 09:50:23 2005 Subject: [Bioperl-l] Re: grouping sequences by DNA-binding domains -- elaboration In-Reply-To: <259a224c0510161717u1591de3ayc01f83008dbd5d98@mail.gmail.com> Message-ID: Olena, What database contains the information you're looking for? Brian O. On 10/16/05 8:17 PM, "Olena Morozova" wrote: > Hi agian, > > I just figured out how to obtain a list of conserved domains for a > given sequence using the SeqHound.pm module available at > http://www.blueprint.org/seqhound/apifunctslist.html > > Now I have a list of conserved domains for a given sequence and I need > to extract information as to what these domains are and which ones are > DNA-binding. Any help on this will be greatly appreciated > > Thanks again, > Olena > > > On 10/16/05, Olena Morozova wrote: >> I have a list of transcription factor sequences, and I need to group >> them according to the DNA-binding domains based on the classification >> by TRANSFAC or any other database. Basically, I just need to extract >> the DNA-binding domain information for a particular TF from a database >> like TRANSFAC (I don't know what other databases would have this >> information, but any will do) Anyone has any idea how to do this? >> Thank you very much for your help and time >> >> Olena >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gnf.org Tue Oct 18 12:55:56 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Oct 18 13:40:46 2005 Subject: [Bioperl-l] Re: Problem with Bio::DB::DBI::Pg after upgrading to core-1.5.1 and latest cvs In-Reply-To: <4354C0FD.1070904@biologie.uni-freiburg.de> References: <4354C0FD.1070904@biologie.uni-freiburg.de> Message-ID: <504d32ec41926e5f89b8c454535a6e37@gnf.org> Bioperl-db required some changes to work fine with the 1.5.x releases, so it is critical that you upgrade bioperl-db as well if you upgrade Bioperl to 1.5. I believe the reason you're getting the error is a version incompatibility. SimpleDBContext does implement dsn(), and Pg.pm doesn't use SimpleDBContext as a literal anywhere. However, you're saying that you are using the latest cvs update of bioperl-db, so maybe you haven't installed your upgraded bioperl-db version but did install the previous one? You can check for individual versions of files by grep'ing for the $Id tag. Here's what you should see: reigen: 9:49 19>perldoc -m Bio::DB::SimpleDBContext.pm | grep '$Id' # $Id: SimpleDBContext.pm,v 1.5 2005/08/26 19:34:14 lapp Exp $ reigen: 9:51 20>perldoc -m Bio::DB::DBI::Pg.pm | grep '$Id' # $Id: Pg.pm,v 1.2 2005/08/26 19:34:14 lapp Exp $ reigen: 9:52 21> Did you run the tests? Was there a problem? If the tests run fine (they should) then it is almost certainly older modules installed somewhere else in your @INC that interfere with the new ones. -hilmar On Oct 18, 2005, at 2:31 AM, Daniel Lang wrote: > Hi, > > I?ve just upgraded my bioperl-cvs-version(december 2004, the one right > before 1.5 and the trouble with how Annotations were written out using > Bio::Seq) to bioperl-1.5.1 in order to see if the features are now > written out correctly. (See also "[BioSQL-l] strange error after > changing to RC1.5" 09.03.2005) > > Seems like the core code is now working like it used to. But now the > bioperl-db code for Pg has problems:( > > When I try to retrieve a sequence from a Pg-biosql db I receive the > following error: > > Can't locate object method "dsn" via package "Bio::DB::SimpleDBContext" > (perhaps you forgot to load "Bio::DB::SimpleDBContext"?) at > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/DBI/Pg.pm line 221.! > > I'm using the cvs-version (today) of bioperl-db with the 1.5.1 of core > and run. > > I tried to use SimpleDBContext in Pg.pm - no effect. > > I don?t really get it what is happening there... > > Thanks in advance. > > Regards Daniel:) > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From Steve_Chervitz at affymetrix.com Tue Oct 18 14:16:42 2005 From: Steve_Chervitz at affymetrix.com (Chervitz, Steve) Date: Tue Oct 18 14:15:03 2005 Subject: [Bioperl-l] Alphabet guessing In-Reply-To: <10830F5B-46C3-4CCE-AB0F-8D492F1B45E6@duke.edu> Message-ID: Back in the Bio::Root::Object days, one could decide to take one's fate in one's own hands and have all throw() calls converted to warn() using $object->strict(-2), see http://doc.bioperl.org/bioperl-live/Bio/Root/Object.html#POD18 But you can't do this with current bioperl objects which are based on Bio::Root::Root which lacks strict(). I suppose we left it out of Root.pm to ensure it wouldn't fall into the wrong hands. What do people think about reviving strict() for those 'damn the torpedoes' situations, where you don't want to be interrupted by any unforeseen exception and you're willing to assume the risk for any consequences? Perhaps bioperl strict() could be responsive to the 'use strict' pragma so that bioperl could become more strict when people turn on perl strictness (as well they should most of the time). Of course it wouldn't be advertised in the general docs, but only in the POD. Steve > From: Jason Stajich > Date: Tue, 18 Oct 2005 08:06:41 -0400 > To: Dmitri Bichko > Cc: > Subject: Re: [Bioperl-l] Alphabet guessing > > From the Bio::SeqIO documentation > > -alphabet > > Sets the alphabet ('dna', 'rna', or 'protein'). When the alphabet is > set then Bioperl will not attempt to guess what the alphabet is. This > may be important because Bioperl does not always guess correctly. > > > You can pre-specify the alphabet: > > $seqio = Bio::SeqIO->new(-format => 'fasta', > -file => > "fifteen_million_sequence_file.fa", > -alphabet => 'dna'); > > -jason > On Oct 18, 2005, at 3:49 AM, Dmitri Bichko wrote: > >> Hi, >> >> Is being unable to guess the sequence alphabet really an unrecoverable >> error? I'm referring to this bit in PrimarySeq.pm: >> >> my $str = $self->seq(); >> $str =~ s/[-.?x]//gi; >> my $total = CORE::length($str); >> if( $total == 0 ) { >> $self->throw("Got a sequence with no letters in it ". >> "cannot guess alphabet [$str]"); >> } >> >> Problem is that if you happen on a seq that's all X's, you get a fatal >> exception, which can be very annoying when you are in the middle of >> a 15 >> million sequence fasta stream (where you don't care about, nor even >> expect the alphabet type; and the docs suggest that you can't >> necessarily recover after catching exceptions). >> >> Might not something along these lines make more sense: >> >> if( $total == 0 ) { >> $self->warn("Got a sequence with no letters in it, assuming 'dna' >> alphabet."); >> $self->alphabet('dna'); >> return 'dna'; >> } >> >> Or should the seqio factories catch the guessing exceptions? >> >> Thanks, >> Dmitri >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From olenka.m at gmail.com Tue Oct 18 14:26:58 2005 From: olenka.m at gmail.com (Olena Morozova) Date: Tue Oct 18 14:26:07 2005 Subject: [Bioperl-l] Re: grouping sequences by DNA-binding domains -- elaboration In-Reply-To: References: <259a224c0510161717u1591de3ayc01f83008dbd5d98@mail.gmail.com> Message-ID: <259a224c0510181126u598d8a6ew5e132c9f1ec18c87@mail.gmail.com> Hi Brian, Thank you for your reply. It is the CDD (Conserved Domain Database) on the NCBI web site. Olena On 10/18/05, Brian Osborne wrote: > Olena, > > What database contains the information you're looking for? > > Brian O. > > > On 10/16/05 8:17 PM, "Olena Morozova" wrote: > > > Hi agian, > > > > I just figured out how to obtain a list of conserved domains for a > > given sequence using the SeqHound.pm module available at > > http://www.blueprint.org/seqhound/apifunctslist.html > > > > Now I have a list of conserved domains for a given sequence and I need > > to extract information as to what these domains are and which ones are > > DNA-binding. Any help on this will be greatly appreciated > > > > Thanks again, > > Olena > > > > > > On 10/16/05, Olena Morozova wrote: > >> I have a list of transcription factor sequences, and I need to group > >> them according to the DNA-binding domains based on the classification > >> by TRANSFAC or any other database. Basically, I just need to extract > >> the DNA-binding domain information for a particular TF from a database > >> like TRANSFAC (I don't know what other databases would have this > >> information, but any will do) Anyone has any idea how to do this? > >> Thank you very much for your help and time > >> > >> Olena > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > From mayagao1999 at yahoo.com Tue Oct 18 15:03:57 2005 From: mayagao1999 at yahoo.com (Alex Zhang) Date: Tue Oct 18 15:09:42 2005 Subject: [Bioperl-l] About array CGH data based on BAC clones Message-ID: <20051018190358.56705.qmail@web53505.mail.yahoo.com> Hello everyone, Is there anybody who has the experience of analyzing array CGH data based on BAC clones to identify the BACs which are amplified or deleted(gain or loss)? Any soft tools or packages recommended? Thank you very much ahead of time! Sincerely, Alex __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com From brian_osborne at cognia.com Tue Oct 18 15:10:23 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Oct 18 15:15:16 2005 Subject: [Bioperl-l] Re: grouping sequences by DNA-binding domains -- elaboration In-Reply-To: <259a224c0510181126u598d8a6ew5e132c9f1ec18c87@mail.gmail.com> Message-ID: Olena, I'm pretty sure that there's no code in Bioperl that accesses or parses CDD, hopefully I'm corrected if I'm wrong. Brian O. On 10/18/05 2:26 PM, "Olena Morozova" wrote: > Hi Brian, > > Thank you for your reply. It is the CDD (Conserved Domain Database) on > the NCBI web site. > Olena > > On 10/18/05, Brian Osborne wrote: >> Olena, >> >> What database contains the information you're looking for? >> >> Brian O. >> >> >> On 10/16/05 8:17 PM, "Olena Morozova" wrote: >> >>> Hi agian, >>> >>> I just figured out how to obtain a list of conserved domains for a >>> given sequence using the SeqHound.pm module available at >>> http://www.blueprint.org/seqhound/apifunctslist.html >>> >>> Now I have a list of conserved domains for a given sequence and I need >>> to extract information as to what these domains are and which ones are >>> DNA-binding. Any help on this will be greatly appreciated >>> >>> Thanks again, >>> Olena >>> >>> >>> On 10/16/05, Olena Morozova wrote: >>>> I have a list of transcription factor sequences, and I need to group >>>> them according to the DNA-binding domains based on the classification >>>> by TRANSFAC or any other database. Basically, I just need to extract >>>> the DNA-binding domain information for a particular TF from a database >>>> like TRANSFAC (I don't know what other databases would have this >>>> information, but any will do) Anyone has any idea how to do this? >>>> Thank you very much for your help and time >>>> >>>> Olena >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >> From skirov at utk.edu Tue Oct 18 15:33:36 2005 From: skirov at utk.edu (Stefan Kirov) Date: Tue Oct 18 15:32:46 2005 Subject: [Bioperl-l] Re: grouping sequences by DNA-binding domains -- elaboration In-Reply-To: References: Message-ID: <43554E10.8070405@utk.edu> Actually Brian, Bio::SeqIO::entrezgene will extract this data from the ASN1 file: use Bio::SeqIO; my $eio=new Bio::SeqIO(-file=>$file,-format=>'entrezgene', -debug=>'off',-service_record=>'no'); ($seq,$struct,$uncapt)=$eio->next_seq; my @contigs=$struct->get_members();#(-authority=>'genomic'); foreach my $contig (@contigs) { if ($contig->authority eq 'Product') { foreach my $sf ($contig->get_SeqFeatures) { foreach my $dblink ($sf->annotation->get_Annotations(dblink)) { my $key=$dblink->{_anchor}?$dblink->{_anchor}:$dblink->optional_id; my $db=$dblink->database; next unless (($db =~/cdd/i)||($sf->primary_tag=~ /conserved/i)); my $desc; if ($key =~ /:/) { ($key,$desc)=split(/:/,$key); } print join($fs, $gid,$contig->id,$desc,$key,$sf->score,'','',$db,$sf->start,$sf->end),"\n"; } } } } I guess it is really a good time time to write thise docs :-) Stefan Brian Osborne wrote: >Olena, > >I'm pretty sure that there's no code in Bioperl that accesses or parses CDD, >hopefully I'm corrected if I'm wrong. > >Brian O. > > >On 10/18/05 2:26 PM, "Olena Morozova" wrote: > > > >>Hi Brian, >> >>Thank you for your reply. It is the CDD (Conserved Domain Database) on >>the NCBI web site. >>Olena >> >>On 10/18/05, Brian Osborne wrote: >> >> >>>Olena, >>> >>>What database contains the information you're looking for? >>> >>>Brian O. >>> >>> >>>On 10/16/05 8:17 PM, "Olena Morozova" wrote: >>> >>> >>> >>>>Hi agian, >>>> >>>>I just figured out how to obtain a list of conserved domains for a >>>>given sequence using the SeqHound.pm module available at >>>>http://www.blueprint.org/seqhound/apifunctslist.html >>>> >>>>Now I have a list of conserved domains for a given sequence and I need >>>>to extract information as to what these domains are and which ones are >>>>DNA-binding. Any help on this will be greatly appreciated >>>> >>>>Thanks again, >>>>Olena >>>> >>>> >>>>On 10/16/05, Olena Morozova wrote: >>>> >>>> >>>>>I have a list of transcription factor sequences, and I need to group >>>>>them according to the DNA-binding domains based on the classification >>>>>by TRANSFAC or any other database. Basically, I just need to extract >>>>>the DNA-binding domain information for a particular TF from a database >>>>>like TRANSFAC (I don't know what other databases would have this >>>>>information, but any will do) Anyone has any idea how to do this? >>>>>Thank you very much for your help and time >>>>> >>>>>Olena >>>>> >>>>> >>>>> >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l@portal.open-bio.org >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> >>> > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From sdavis2 at mail.nih.gov Tue Oct 18 15:58:29 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue Oct 18 15:57:33 2005 Subject: [Bioperl-l] About array CGH data based on BAC clones In-Reply-To: <20051018190358.56705.qmail@web53505.mail.yahoo.com> Message-ID: On 10/18/05 3:03 PM, "Alex Zhang" wrote: > Hello everyone, > > Is there anybody who has the experience of > analyzing array CGH data based on BAC clones > to identify the BACs which are amplified or > deleted(gain or loss)? Any soft tools or > packages recommended? You really probably want to be looking at the bioconductor project (http://www.bioconductor.org). Perl just isn't (currently) the right tool for the job. Bioconductor is probably the most advanced and flexible framework for much of array analysis, including CGH. Look at the aCGH, DNAcopy, and GLAD packages, for starters. Sean From jason at portal.open-bio.org Tue Oct 18 15:26:53 2005 From: jason at portal.open-bio.org (Jason Stajich) Date: Tue Oct 18 16:26:34 2005 Subject: [Bioperl-l] Re: a problem with DNAStatistics In-Reply-To: <43552571.4809e59d.4b40.ffffd95a@mx.gmail.com> References: <43552571.4809e59d.4b40.ffffd95a@mx.gmail.com> Message-ID: <842466C5-445F-41B9-BF82-C60AFC6DDBAE@bioperl.org> Hmm are you sure you are using bioperl 1.4? Did someone upgrade it without telling you? You can check by running this: perl -MBio::Align::DNAStatistics -e 'print $Bio::Align::DNAStatistics::VERSION, "\n"' The API changed after bioperl 1.4 so that it returned a Matrix object not an array reference. The errors you are seeing suggest that $mat is not an array reference, I am guessing it is a Bio::Matrix::PhylipDist object. See the updated documentation: http://doc.bioperl.org/bioperl-live/Bio/Align/DNAStatistics.html You'll want to call $mat->print_matrix to see the matrix. -jason On Oct 18, 2005, at 12:40 PM, eran elhaik wrote: > Dear Sir > > > > I am having a problem with the module: DNAStatistics > > I can not repeat the example provided on the web site: http:// > doc.bioperl.org/releases/bioperl-1.4/Bio/Align/DNAStatistics.html > > > > use Bio::AlignIO; > > use Bio::Align::DNAStatistics; > > > > my $stats = new Bio::Align::DNAStatistics; > > my $alignin = new Bio::AlignIO(-format => 'emboss', > > -file => 't/data/insulin.water'); > > my $aln = $alignin->next_aln; > > my $jc = $stats->distance(-align => $aln, > > -method => 'Jukes-Cantor'); > > foreach my $d ( @$jc ) { > > print "\t"; > > foreach my $r ( @$d ) { > > print "$r\t"; > > } > > print "\n"; > > > This is the example. > > > > I repeat the same exact thing is my code with the attached sample > file: > > > > Notice that the code: > > ###################################### > > foreach my $d ( @$mat ) > > { > > print "\t"; > > > > foreach my $r ( @$d ) > > { > > print "$r\t"; > > } > > print "\n"; > > } > > ###################################### > > > > In my code is the same exact as in the example but I get the error: > > > > Not an ARRAY reference at trees.pl line 101. > > > > To run my program write: perl trees.pl Sample.txt > > > > > > > > Thank you for your help! > > > > > > ____________________________________ > > Eran Elhaik: Lab Phone: (713) 743-2312 > > Doctoral Student > > University of Houston > > http://nsmn1.uh.edu/~dgraur/eran/main.htm > > ____________________________________ > > > > > -- > No virus found in this outgoing message. > Checked by AVG Anti-Virus. > Version: 7.0.344 / Virus Database: 267.12.2/137 - Release Date: > 10/16/2005 > > > > -- Jason Stajich jason@bioperl.org http://jason.open-bio.org/ From brian_osborne at cognia.com Tue Oct 18 17:08:44 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Oct 18 17:07:53 2005 Subject: [Bioperl-l] Re: grouping sequences by DNA-binding domains -- elaboration In-Reply-To: <43554E10.8070405@utk.edu> Message-ID: Stefan, Yes, the hyperlinks are in the text just like they were in our old friend LocusLink. But it seems that Olena wanted information about the domains, like whether or not the domain was DNA-binding - is this in the ASN? In my too-brief response I was attempting to say that starting with a list of domains, or domain ids, and finding out whether they were DNA-binding domains or not seems to imply working with an ontology. Brian O. On 10/18/05 3:33 PM, "Stefan Kirov" wrote: > Actually Brian, Bio::SeqIO::entrezgene will extract this data from the > ASN1 file: > > use Bio::SeqIO; > my $eio=new Bio::SeqIO(-file=>$file,-format=>'entrezgene', > -debug=>'off',-service_record=>'no'); > ($seq,$struct,$uncapt)=$eio->next_seq; > my @contigs=$struct->get_members();#(-authority=>'genomic'); > foreach my $contig (@contigs) { > if ($contig->authority eq 'Product') { > foreach my $sf ($contig->get_SeqFeatures) { > foreach my $dblink ($sf->annotation->get_Annotations(dblink)) { > my > $key=$dblink->{_anchor}?$dblink->{_anchor}:$dblink->optional_id; > my $db=$dblink->database; > next unless (($db =~/cdd/i)||($sf->primary_tag=~ > /conserved/i)); > my $desc; > if ($key =~ /:/) { > ($key,$desc)=split(/:/,$key); > } > print join($fs, > $gid,$contig->id,$desc,$key,$sf->score,'','',$db,$sf->start,$sf->end),"\n"; > } > } > } > } > > I guess it is really a good time time to write thise docs :-) > Stefan > > Brian Osborne wrote: > >> Olena, >> >> I'm pretty sure that there's no code in Bioperl that accesses or parses CDD, >> hopefully I'm corrected if I'm wrong. >> >> Brian O. >> >> >> On 10/18/05 2:26 PM, "Olena Morozova" wrote: >> >> >> >>> Hi Brian, >>> >>> Thank you for your reply. It is the CDD (Conserved Domain Database) on >>> the NCBI web site. >>> Olena >>> >>> On 10/18/05, Brian Osborne wrote: >>> >>> >>>> Olena, >>>> >>>> What database contains the information you're looking for? >>>> >>>> Brian O. >>>> >>>> >>>> On 10/16/05 8:17 PM, "Olena Morozova" wrote: >>>> >>>> >>>> >>>>> Hi agian, >>>>> >>>>> I just figured out how to obtain a list of conserved domains for a >>>>> given sequence using the SeqHound.pm module available at >>>>> http://www.blueprint.org/seqhound/apifunctslist.html >>>>> >>>>> Now I have a list of conserved domains for a given sequence and I need >>>>> to extract information as to what these domains are and which ones are >>>>> DNA-binding. Any help on this will be greatly appreciated >>>>> >>>>> Thanks again, >>>>> Olena >>>>> >>>>> >>>>> On 10/16/05, Olena Morozova wrote: >>>>> >>>>> >>>>>> I have a list of transcription factor sequences, and I need to group >>>>>> them according to the DNA-binding domains based on the classification >>>>>> by TRANSFAC or any other database. Basically, I just need to extract >>>>>> the DNA-binding domain information for a particular TF from a database >>>>>> like TRANSFAC (I don't know what other databases would have this >>>>>> information, but any will do) Anyone has any idea how to do this? >>>>>> Thank you very much for your help and time >>>>>> >>>>>> Olena >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@portal.open-bio.org >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> >>>> >>>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> From skirov at utk.edu Tue Oct 18 17:27:56 2005 From: skirov at utk.edu (Stefan Kirov) Date: Tue Oct 18 17:27:04 2005 Subject: [Bioperl-l] Re: grouping sequences by DNA-binding domains -- elaboration In-Reply-To: References: Message-ID: <435568DC.5050506@utk.edu> Certainly you are right Brian- there is no particular domain type as for example in a controlled vocabulary. One can grep the DNA & binding ones, which is not perfect... Anyway, I had the feeling Olena needs to know what is the CDD description, given the CDD identifier, which is possible using the parser (though it is not the most efficient way). Stefan Brian Osborne wrote: >Stefan, > >Yes, the hyperlinks are in the text just like they were in our old friend >LocusLink. But it seems that Olena wanted information about the domains, >like whether or not the domain was DNA-binding - is this in the ASN? > >In my too-brief response I was attempting to say that starting with a list >of domains, or domain ids, and finding out whether they were DNA-binding >domains or not seems to imply working with an ontology. > >Brian O. > > >On 10/18/05 3:33 PM, "Stefan Kirov" wrote: > > > >>Actually Brian, Bio::SeqIO::entrezgene will extract this data from the >>ASN1 file: >> >>use Bio::SeqIO; >>my $eio=new Bio::SeqIO(-file=>$file,-format=>'entrezgene', >>-debug=>'off',-service_record=>'no'); >>($seq,$struct,$uncapt)=$eio->next_seq; >>my @contigs=$struct->get_members();#(-authority=>'genomic'); >>foreach my $contig (@contigs) { >> if ($contig->authority eq 'Product') { >> foreach my $sf ($contig->get_SeqFeatures) { >> foreach my $dblink ($sf->annotation->get_Annotations(dblink)) { >> my >>$key=$dblink->{_anchor}?$dblink->{_anchor}:$dblink->optional_id; >> my $db=$dblink->database; >> next unless (($db =~/cdd/i)||($sf->primary_tag=~ >>/conserved/i)); >> my $desc; >> if ($key =~ /:/) { >> ($key,$desc)=split(/:/,$key); >> } >> print join($fs, >>$gid,$contig->id,$desc,$key,$sf->score,'','',$db,$sf->start,$sf->end),"\n"; >> } >> } >> } >>} >> >>I guess it is really a good time time to write thise docs :-) >>Stefan >> >>Brian Osborne wrote: >> >> >> >>>Olena, >>> >>>I'm pretty sure that there's no code in Bioperl that accesses or parses CDD, >>>hopefully I'm corrected if I'm wrong. >>> >>>Brian O. >>> >>> >>>On 10/18/05 2:26 PM, "Olena Morozova" wrote: >>> >>> >>> >>> >>> >>>>Hi Brian, >>>> >>>>Thank you for your reply. It is the CDD (Conserved Domain Database) on >>>>the NCBI web site. >>>>Olena >>>> >>>>On 10/18/05, Brian Osborne wrote: >>>> >>>> >>>> >>>> >>>>>Olena, >>>>> >>>>>What database contains the information you're looking for? >>>>> >>>>>Brian O. >>>>> >>>>> >>>>>On 10/16/05 8:17 PM, "Olena Morozova" wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>Hi agian, >>>>>> >>>>>>I just figured out how to obtain a list of conserved domains for a >>>>>>given sequence using the SeqHound.pm module available at >>>>>>http://www.blueprint.org/seqhound/apifunctslist.html >>>>>> >>>>>>Now I have a list of conserved domains for a given sequence and I need >>>>>>to extract information as to what these domains are and which ones are >>>>>>DNA-binding. Any help on this will be greatly appreciated >>>>>> >>>>>>Thanks again, >>>>>>Olena >>>>>> >>>>>> >>>>>>On 10/16/05, Olena Morozova wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>I have a list of transcription factor sequences, and I need to group >>>>>>>them according to the DNA-binding domains based on the classification >>>>>>>by TRANSFAC or any other database. Basically, I just need to extract >>>>>>>the DNA-binding domain information for a particular TF from a database >>>>>>>like TRANSFAC (I don't know what other databases would have this >>>>>>>information, but any will do) Anyone has any idea how to do this? >>>>>>>Thank you very much for your help and time >>>>>>> >>>>>>>Olena >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>_______________________________________________ >>>>>>Bioperl-l mailing list >>>>>>Bioperl-l@portal.open-bio.org >>>>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l@portal.open-bio.org >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> > > > > -- From jason at portal.open-bio.org Tue Oct 18 22:06:01 2005 From: jason at portal.open-bio.org (Jason Stajich) Date: Wed Oct 19 01:01:42 2005 Subject: [Bioperl-l] Re: a problem with DNAStatistics In-Reply-To: <4355556f.16865e99.02b7.ffffc9c6@mx.gmail.com> References: <4355556f.16865e99.02b7.ffffc9c6@mx.gmail.com> Message-ID: <67ADF5A5-4034-4E5A-B5D2-E14C6091C022@bioperl.org> [I am CCing bioperl-l so that others can learn from these Q&A] Well ... what do you want to do? See the documentation for Bio::Matrix::PhylipDist http://doc.bioperl.org/bioperl-live/Bio/Matrix/PhylipDist.html It holds all the pairwise distances. If you want to print it out use the code I sent which includes the line to $matrix->print_matrix. You can write it out with Bio::Matrix::IO module. You can build NJ trees from it with Bio::Tree::DistanceFactory -jason On Oct 18, 2005, at 4:05 PM, eran elhaik wrote: > Hi Jason thanks for your answer, > > I am using BioPrel 1.5 > > > > How should I address the Matrix object? > > > > > > _____ > > From: Jason Stajich [mailto:jason@bioperl.org] > Sent: Tuesday, October 18, 2005 2:27 PM > To: eran elhaik > Cc: bioperl-ml List > Subject: Re: a problem with DNAStatistics > > > > Hmm are you sure you are using bioperl 1.4? Did someone upgrade it > without > telling you? > > You can check by running this: > > perl -MBio::Align::DNAStatistics -e 'print > $Bio::Align::DNAStatistics::VERSION, "\n"' > > > > The API changed after bioperl 1.4 so that it returned a Matrix > object not an > array reference. > > > > The errors you are seeing suggest that $mat is not an array > reference, I am > guessing it is a Bio::Matrix::PhylipDist object. > > > > See the updated documentation: > > HYPERLINK > "http://doc.bioperl.org/bioperl-live/Bio/Align/ > DNAStatistics.html"http://doc > .bioperl.org/bioperl-live/Bio/Align/DNAStatistics.html > > > > You'll want to call > > $mat->print_matrix > > to see the matrix. > > > > -jason > > > > On Oct 18, 2005, at 12:40 PM, eran elhaik wrote: > > > > > > Dear Sir > > > > I am having a problem with the module: DNAStatistics > > I can not repeat the example provided on the web site: HYPERLINK > "http://doc.bioperl.org/releases/bioperl-1.4/Bio/Align/ > DNAStatistics.html"ht > tp://doc.bioperl.org/releases/bioperl-1.4/Bio/Align/DNAStatistics.html > > > > use Bio::AlignIO; > > > > > > > use Bio::Align::DNAStatistics; > > > > > > > > > > > > > > my $stats = new Bio::Align::DNAStatistics; > > > > > > > my $alignin = new Bio::AlignIO(-format => 'emboss', > > > > > > > -file => 't/data/insulin.water'); > > > > > > > my $aln = $alignin->next_aln; > > > > > > > my $jc = $stats->distance(-align => $aln, > > > > > > > -method => 'Jukes-Cantor'); > > > > > > > foreach my $d ( @$jc ) { > > > > > > > print "\t"; > > > > > > > foreach my $r ( @$d ) { > > > > > > > print "$r\t"; > > > > > > > } > > > > > > > print "\n"; > > > > This is the example. > > > > I repeat the same exact thing is my code with the attached sample > file: > > > > Notice that the code: > > ####################################### > > foreach my $d ( @$mat ) > > { > > print "\t"; > > > > foreach my $r ( @$d ) > > { > > print "$r\t"; > > } > > print "\n"; > > } > > ####################################### > > > > In my code is the same exact as in the example but I get the error: > > > > Not an ARRAY reference at trees.pl line 101. > > > > To run my program write: perl trees.pl Sample.txt > > > > > > > > Thank you for your help! > > > > > > ____________________________________ > > Eran Elhaik: Lab Phone: (713) 743-2312 > > Doctoral Student > > University of Houston > > HYPERLINK > "BLOCKED::http://nsmn1.uh.edu/~dgraur/eran/main.htm"http:// > nsmn1.uh.edu/~dgr > aur/eran/main.htm > > ____________________________________ > > > > > > -- > No virus found in this outgoing message. > Checked by AVG Anti-Virus. > Version: 7.0.344 / Virus Database: 267.12.2/137 - Release Date: > 10/16/2005 > > > > > > > > > > -- > > Jason Stajich > > HYPERLINK "mailto:jason@bioperl.org"jason@bioperl.org > > HYPERLINK "http://jason.open-bio.org"http://jason.open-bio.org/ > > > > > > > > > -- > No virus found in this incoming message. > Checked by AVG Anti-Virus. > Version: 7.0.344 / Virus Database: 267.12.2/137 - Release Date: > 10/16/2005 > > > > -- > No virus found in this outgoing message. > Checked by AVG Anti-Virus. > Version: 7.0.344 / Virus Database: 267.12.2/137 - Release Date: > 10/16/2005 > > -- Jason Stajich jason@bioperl.org http://jason.open-bio.org/ From jason.stajich at duke.edu Wed Oct 19 08:20:02 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Oct 19 08:19:15 2005 Subject: [Bioperl-l] Fwd: Could you help me to solve the problem about installing bioperl under Windous XP References: Message-ID: 1.5.1 is not in a PPM right now, it requires someone to pack it up, I don't know when it will be. Sounds like you need to get make for windows. I'm sure the windows users on the list will be able to provide some help or point you to already written documentation. -jason Begin forwarded message: > From: > Date: October 19, 2005 7:04:47 AM EDT > To: jason.stajich@duke.edu > Subject: Could you help me to solve the problem about installing > bioperl under Windous XP > > > Dr.jason.stajich: > I am very glad to communicate with you to talk about > bioperl-1.5.1 . I am a new guy about perl and need your help. > I cannot install the lastest Core - bioperl-1.5.1 in my computer > (OS: windows xp).According to the http://bioperl.org/Core/Latest/ > INSTALL.WIN , like this: > > ppm> rep add Bioperl http://bioperl.org/DIST > ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms > ppm> rep add Bribes http://www.Bribes.org/perl/ppm > > ppm> search Bioperl > > ppm> install , > I just install the Core bioperl-1.4.0. > > And if I download the lastest bundle from http://bioperl.org/ > DIST/bioperl-1.5.1.tar.gz and install in local-computer ,like this: > > >perl Makefile.PL > >make > >make install > I had a problem when I run ">make", the system worning:" 'make' > is not recognized as an internal or external command, operable > program or batch file." > I hope you could help me to resolve this problem that is very > easy about you.Thank you very much. > > > > Wang > Guodong > Kunming > Institute of Zoology > > CHINA,KUNMIN > > _________________________________________________________________ > ?????????????? MSN Messenger: http:// > messenger.msn.com/cn > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From cain at cshl.edu Wed Oct 19 09:25:46 2005 From: cain at cshl.edu (Scott Cain) Date: Wed Oct 19 09:34:52 2005 Subject: [Bioperl-l] Fwd: Could you help me to solve the problem about installing bioperl under Windous XP In-Reply-To: References: Message-ID: <1129728346.3010.8.camel@localhost.localdomain> While there is not yet a full ppm build of bioperl 1.5.1, I created one specifically for use with GBrowse. It can be obtained at http://www.gmod.org/ggb/ppm : ppm> rep add gmod http://www.gmod.org/ggb/ppm ppm> rep up gmod ppm> rep up gmod .. (until it is at the top of the list) ppm> search bioperl .. install I would suggest that you uninstall the old bioperl first, as I don't trust the ppm manager to do everything right. Note that this ppm does not enforce most of the optional prereqs for bioperl, just the minimum required for GBrowse. Of course, after you install bioperl, you can install GBrowse too: ppm>install Generic-Genome-Browser (as long as you already have Apache installed as well). Scott On Wed, 2005-10-19 at 08:20 -0400, Jason Stajich wrote: > 1.5.1 is not in a PPM right now, it requires someone to pack it up, I > don't know when it will be. > > Sounds like you need to get make for windows. I'm sure the windows > users on the list will be able to provide some help or point you to > already written documentation. > > -jason > > Begin forwarded message: > > > From: > > Date: October 19, 2005 7:04:47 AM EDT > > To: jason.stajich@duke.edu > > Subject: Could you help me to solve the problem about installing > > bioperl under Windous XP > > > > > > Dr.jason.stajich: > > I am very glad to communicate with you to talk about > > bioperl-1.5.1 . I am a new guy about perl and need your help. > > I cannot install the lastest Core - bioperl-1.5.1 in my computer > > (OS: windows xp).According to the http://bioperl.org/Core/Latest/ > > INSTALL.WIN , like this: > > > > ppm> rep add Bioperl http://bioperl.org/DIST > > ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms > > ppm> rep add Bribes http://www.Bribes.org/perl/ppm > > > > ppm> search Bioperl > > > > ppm> install , > > I just install the Core bioperl-1.4.0. > > > > And if I download the lastest bundle from http://bioperl.org/ > > DIST/bioperl-1.5.1.tar.gz and install in local-computer ,like this: > > > > >perl Makefile.PL > > >make > > >make install > > I had a problem when I run ">make", the system worning:" 'make' > > is not recognized as an internal or external command, operable > > program or batch file." > > I hope you could help me to resolve this problem that is very > > easy about you.Thank you very much. > > > > > > > > Wang > > Guodong > > Kunming > > Institute of Zoology > > > > CHINA,KUNMIN > > > > _________________________________________________________________ > > ?????????????? MSN Messenger: http:// > > messenger.msn.com/cn > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12/ > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From szhan at uoguelph.ca Wed Oct 19 10:11:35 2005 From: szhan at uoguelph.ca (szhan@uoguelph.ca) Date: Wed Oct 19 10:26:30 2005 Subject: [Bioperl-l] looking for modules to retrieve signals and genomic positton for each probe in the probe set of Affymetrix Array Message-ID: <1129731095.43565417b4ecb@webmail.uoguelph.ca> Hello, Bioperl Users, I am analyzing gene expression raw data (.cel and .cdf file) of Affymetrix genome gene expression array. Do you know which bioperl module (or another way) can retrieve the signals and genomic position for each probe pair in the probe set of th Affymetrix array? I read the perldoc for Bio::Affymetrix module. It seems that it can only retrieve the signals for the probe set, not for each probe and its genomic postion. I also have a problem to install this module (Bio-Affymetrix-0.5.tar) on PC running Windows XP, Perl 5.8 and Bioperl 1.4. Any information will be high appreciate! Joshua From sdavis2 at mail.nih.gov Wed Oct 19 11:02:42 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed Oct 19 11:01:45 2005 Subject: [Bioperl-l] looking for modules to retrieve signals and genomic positton for each probe in the probe set of Affymetrix Array In-Reply-To: <1129731095.43565417b4ecb@webmail.uoguelph.ca> Message-ID: On 10/19/05 10:11 AM, "szhan@uoguelph.ca" wrote: > Hello, Bioperl Users, > I am analyzing gene expression raw data (.cel and .cdf file) of Affymetrix > genome gene expression array. > Do you know which bioperl module (or another way) can retrieve the signals and > genomic position for each probe pair in the probe set of th Affymetrix array? > I > read the perldoc for Bio::Affymetrix module. It seems that it can only > retrieve > the signals for the probe set, not for each probe and its genomic postion. I > also have a problem to install this module (Bio-Affymetrix-0.5.tar) on PC > running Windows XP, Perl 5.8 and Bioperl 1.4. > Any information will be high appreciate! > Joshua Your best bet for working with affymetrix arrays is to use R (http://www.r-project.org) and Bioconductor (http://www.bioconductor.org). These packages provide MANY, MANY functions for dealing with affymetrix arrays from normalization to analysis to annotation. Mapping probes to genomic position is frought with difficulty, as 20-mers will typically align many places in the genome at some threshold. That said, I think that Ensembl does this mapping, if you are interested. You can use their public mysql server to access these data, if I recall. Sean From daniel.lang at biologie.uni-freiburg.de Wed Oct 19 11:18:22 2005 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Wed Oct 19 11:42:53 2005 Subject: [Bioperl-l] Re: Problem with Bio::DB::DBI::Pg after upgrading to core-1.5.1 and latest cvs In-Reply-To: <504d32ec41926e5f89b8c454535a6e37@gnf.org> References: <4354C0FD.1070904@biologie.uni-freiburg.de> <504d32ec41926e5f89b8c454535a6e37@gnf.org> Message-ID: <435663BE.40109@biologie.uni-freiburg.de> Hi Hilmar, Yes, I have installed the latest bioperl-db: lang@frontend-0-0:~/bioperl> perldoc -m Bio::DB::SimpleDBContext.pm | grep '$Id' # $Id: SimpleDBContext.pm,v 1.5 2005/08/26 19:34:14 lapp Exp $ lang@frontend-0-0:~/bioperl> perldoc -m Bio::DB::DBI::Pg.pm | grep '$Id' # $Id: Pg.pm,v 1.2 2005/08/26 19:34:14 lapp Exp $ But your guess was right, I had a stone-age version in my @INC :( Now everything works fine... Thanks! Regards, Daniel:) Hilmar Lapp wrote: > Bioperl-db required some changes to work fine with the 1.5.x releases, > so it is critical that you upgrade bioperl-db as well if you upgrade > Bioperl to 1.5. I believe the reason you're getting the error is a > version incompatibility. SimpleDBContext does implement dsn(), and Pg.pm > doesn't use SimpleDBContext as a literal anywhere. > > However, you're saying that you are using the latest cvs update of > bioperl-db, so maybe you haven't installed your upgraded bioperl-db > version but did install the previous one? > > You can check for individual versions of files by grep'ing for the $Id > tag. Here's what you should see: > > reigen: 9:49 19>perldoc -m Bio::DB::SimpleDBContext.pm | grep '$Id' > # $Id: SimpleDBContext.pm,v 1.5 2005/08/26 19:34:14 lapp Exp $ > reigen: 9:51 20>perldoc -m Bio::DB::DBI::Pg.pm | grep '$Id' > # $Id: Pg.pm,v 1.2 2005/08/26 19:34:14 lapp Exp $ > reigen: 9:52 21> > > Did you run the tests? Was there a problem? If the tests run fine (they > should) then it is almost certainly older modules installed somewhere > else in your @INC that interfere with the new ones. > > -hilmar > > On Oct 18, 2005, at 2:31 AM, Daniel Lang wrote: > From Philippe.Hupe at curie.fr Wed Oct 19 06:08:35 2005 From: Philippe.Hupe at curie.fr (=?ISO-8859-1?Q?Philippe_Hup=E9?=) Date: Wed Oct 19 12:40:59 2005 Subject: [Bioperl-l] Re: [BiO BB] About array CGH data based on BAC clones In-Reply-To: <20051018190358.56705.qmail@web53505.mail.yahoo.com> References: <20051018190358.56705.qmail@web53505.mail.yahoo.com> Message-ID: <43561B23.3000302@curie.fr> Alex Zhang a ?crit : >Hello everyone, > >Is there anybody who has the experience of >analyzing array CGH data based on BAC clones >to identify the BACs which are amplified or >deleted(gain or loss)? Any soft tools or >packages recommended? > >Thank you very much ahead of time! > >Sincerely, > Alex > > > > >__________________________________ >Yahoo! Mail - PC Magazine Editors' Choice 2005 >http://mail.yahoo.com >_______________________________________________ >Bioinformatics.Org general forum - BiO_Bulletin_Board@bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > Dear colleague, The bioinformatics team of Institut Curie has developed several tools related to the analysis of array CGH data: - GLAD for breakpoint detection - MAIA for automatic microarray image analysis - MANOR for normalisation of microarray data - VAMP, a java graphical interface for visualisation and analysis of CGH profiles. - CAPweb, a suite of tools for the management, visualization and analysis of CGH-arrays VAMP can be requested at vamp@curie.fr , MAIA at maia@curie.fr , CAPweb at capweb@curie.fr , GLAD at glad@curie.fr and MANOR at manor@curie.fr A VAMP demo is available at http://bioinfo.curie.fr/vamp (Then click on Direct Launch and File->Import) Two movies give you an overview of VAMP software capabilities. - http://bioinfo-out.curie.fr/tutorial/vamp/vamp-demo1.html - http://bioinfo-out.curie.fr/tutorial/vamp/vamp-demo2.html - http://bioinfo-out.curie.fr/tutorial/vamp/vamp-demo3.html You can visit our Web site at http://bioinfo.curie.fr You can try CAPweb which is a complete web platform at the following url: http://bioinfo.curie.fr/CAPweb. It allows to analyze your data directly from the gpr file and the clone info file. It includes the normalization, the breakpoints detection, the data storage and the visualization. This environment can be installed directly in our lab. Do not hesitate to ask for questions at capweb@curie.fr Best regards, Philippe hup? -- Philippe Hup? UMR 144 - Service Bioinformatique Institut Curie Laboratoire de Transfert (4?me ?tage) 26 rue d'Ulm 75005 Paris - France Email : Philippe.Hupe@curie.fr T?l : +33 (0)1 44 32 42 75 Fax : +33 (0)1 42 34 65 28 website : http://bioinfo.curie.fr From hlapp at gmx.net Wed Oct 19 13:03:04 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Oct 19 13:22:10 2005 Subject: [Bioperl-l] Re: [BiO BB] About array CGH data based on BAC clones In-Reply-To: <43561B23.3000302@curie.fr> References: <20051018190358.56705.qmail@web53505.mail.yahoo.com> <43561B23.3000302@curie.fr> Message-ID: <39dc54bf83b7cdf55327fa72cf1d50f3@gmx.net> Phillippe, what is the license on these software packages? Except for GLAD (which presumably is licensed as OSS compatible with Bioconductor), the website states the notorious 'available upon request,' leaving it to everybody's guess what license applies upon whose request. Is there a reason not to openly and explicitly state the license(s)? -hilmar On Oct 19, 2005, at 3:08 AM, Philippe Hup? wrote: > Alex Zhang a ?crit : > >> Hello everyone, >> >> Is there anybody who has the experience of >> analyzing array CGH data based on BAC clones >> to identify the BACs which are amplified or >> deleted(gain or loss)? Any soft tools or packages recommended? >> >> Thank you very much ahead of time! >> >> Sincerely, >> Alex >> >> >> >> >> __________________________________ Yahoo! Mail - PC Magazine Editors' >> Choice 2005 http://mail.yahoo.com >> _______________________________________________ >> Bioinformatics.Org general forum - >> BiO_Bulletin_Board@bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> >> >> > Dear colleague, > > > The bioinformatics team of Institut Curie has developed several tools > related to the analysis of array CGH data: > - GLAD for breakpoint detection > - MAIA for automatic microarray image analysis > - MANOR for normalisation of microarray data - VAMP, a java graphical > interface for visualisation and analysis of CGH profiles. > - CAPweb, a suite of tools for the management, visualization and > analysis of CGH-arrays > VAMP can be requested at vamp@curie.fr , MAIA at maia@curie.fr , > CAPweb at capweb@curie.fr , GLAD at glad@curie.fr and MANOR at > manor@curie.fr > > A VAMP demo is available at http://bioinfo.curie.fr/vamp (Then click > on Direct Launch and File->Import) > Two movies give you an overview of VAMP software capabilities. > - http://bioinfo-out.curie.fr/tutorial/vamp/vamp-demo1.html > - http://bioinfo-out.curie.fr/tutorial/vamp/vamp-demo2.html > - http://bioinfo-out.curie.fr/tutorial/vamp/vamp-demo3.html > > > You can visit our Web site at http://bioinfo.curie.fr > > > You can try CAPweb which is a complete web platform at the following > url: http://bioinfo.curie.fr/CAPweb. It allows to analyze your data > directly from the gpr file and the clone info file. It includes the > normalization, the breakpoints detection, the data storage and the > visualization. This environment can be installed directly in our lab. > Do not hesitate to ask for questions at capweb@curie.fr > > > Best regards, > > > Philippe hup? > > -- > Philippe Hup? > UMR 144 - Service Bioinformatique > Institut Curie > Laboratoire de Transfert (4?me ?tage) > 26 rue d'Ulm > 75005 Paris - France > > Email : Philippe.Hupe@curie.fr > T?l : +33 (0)1 44 32 42 75 > Fax : +33 (0)1 42 34 65 28 > > website : http://bioinfo.curie.fr > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From bmoore at genetics.utah.edu Wed Oct 19 18:09:05 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Wed Oct 19 18:05:41 2005 Subject: [Bioperl-l] Fwd: Could you help me to solve the problem aboutinstalling bioperl under Windous XP Message-ID: In the INSTALL.WIN doc that you are quoting in your mail you will find a link for getting nmake from Microsoft. This is the Microsoft version of make that you need to substitute for make. Download it and put it in your path. The as before: nmake (skip test with nmake) namke install > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > bounces@portal.open-bio.org] On Behalf Of Jason Stajich > Sent: Wednesday, October 19, 2005 6:20 AM > To: Bioperl List > Cc: puddingwang@hotmail.com > Subject: [Bioperl-l] Fwd: Could you help me to solve the problem > aboutinstalling bioperl under Windous XP > > 1.5.1 is not in a PPM right now, it requires someone to pack it up, I > don't know when it will be. > > Sounds like you need to get make for windows. I'm sure the windows > users on the list will be able to provide some help or point you to > already written documentation. > > -jason > > Begin forwarded message: > > > From: > > Date: October 19, 2005 7:04:47 AM EDT > > To: jason.stajich@duke.edu > > Subject: Could you help me to solve the problem about installing > > bioperl under Windous XP > > > > > > Dr.jason.stajich: > > I am very glad to communicate with you to talk about > > bioperl-1.5.1 . I am a new guy about perl and need your help. > > I cannot install the lastest Core - bioperl-1.5.1 in my computer > > (OS: windows xp).According to the http://bioperl.org/Core/Latest/ > > INSTALL.WIN , like this: > > > > ppm> rep add Bioperl http://bioperl.org/DIST > > ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms > > ppm> rep add Bribes http://www.Bribes.org/perl/ppm > > > > ppm> search Bioperl > > > > ppm> install , > > I just install the Core bioperl-1.4.0. > > > > And if I download the lastest bundle from http://bioperl.org/ > > DIST/bioperl-1.5.1.tar.gz and install in local-computer ,like this: > > > > >perl Makefile.PL > > >make > > >make install > > I had a problem when I run ">make", the system worning:" 'make' > > is not recognized as an internal or external command, operable > > program or batch file." > > I hope you could help me to resolve this problem that is very > > easy about you.Thank you very much. > > > > > > > > Wang > > Guodong > > Kunming > > Institute of Zoology > > > > CHINA,KUNMIN > > > > _________________________________________________________________ > > ÓëÁª»úµÄÅóÓѽøÐн»Á÷£¬ÇëʹÓà MSN Messenger: http:// > > messenger.msn.com/cn > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12/ > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From bmoore at genetics.utah.edu Wed Oct 19 18:21:27 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Wed Oct 19 18:17:57 2005 Subject: [Bioperl-l] Re: grouping sequences by DNA-binding domains --elaboration Message-ID: Olena- If all you want is the description from the CDD ID, then grepping or hashing or otherwise working with this file will take care of your needs. ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/cddid.tbl Barry > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > bounces@portal.open-bio.org] On Behalf Of Stefan Kirov > Sent: Tuesday, October 18, 2005 3:28 PM > To: Brian Osborne > Cc: bioperl-l; Olena Morozova > Subject: Re: [Bioperl-l] Re: grouping sequences by DNA-binding domains -- > elaboration > > Certainly you are right Brian- there is no particular domain type as > for example in a controlled vocabulary. One can grep the DNA & binding > ones, which is not perfect... > Anyway, I had the feeling Olena needs to know what is the CDD > description, given the CDD identifier, which is possible using the > parser (though it is not the most efficient way). > Stefan > > Brian Osborne wrote: > > >Stefan, > > > >Yes, the hyperlinks are in the text just like they were in our old friend > >LocusLink. But it seems that Olena wanted information about the domains, > >like whether or not the domain was DNA-binding - is this in the ASN? > > > >In my too-brief response I was attempting to say that starting with a > list > >of domains, or domain ids, and finding out whether they were DNA-binding > >domains or not seems to imply working with an ontology. > > > >Brian O. > > > > > >On 10/18/05 3:33 PM, "Stefan Kirov" wrote: > > > > > > > >>Actually Brian, Bio::SeqIO::entrezgene will extract this data from the > >>ASN1 file: > >> > >>use Bio::SeqIO; > >>my $eio=new Bio::SeqIO(-file=>$file,-format=>'entrezgene', > >>-debug=>'off',-service_record=>'no'); > >>($seq,$struct,$uncapt)=$eio->next_seq; > >>my @contigs=$struct->get_members();#(-authority=>'genomic'); > >>foreach my $contig (@contigs) { > >> if ($contig->authority eq 'Product') { > >> foreach my $sf ($contig->get_SeqFeatures) { > >> foreach my $dblink ($sf->annotation- > >get_Annotations(dblink)) { > >> my > >>$key=$dblink->{_anchor}?$dblink->{_anchor}:$dblink->optional_id; > >> my $db=$dblink->database; > >> next unless (($db =~/cdd/i)||($sf->primary_tag=~ > >>/conserved/i)); > >> my $desc; > >> if ($key =~ /:/) { > >> ($key,$desc)=split(/:/,$key); > >> } > >> print join($fs, > >>$gid,$contig->id,$desc,$key,$sf->score,'','',$db,$sf->start,$sf- > >end),"\n"; > >> } > >> } > >> } > >>} > >> > >>I guess it is really a good time time to write thise docs :-) > >>Stefan > >> > >>Brian Osborne wrote: > >> > >> > >> > >>>Olena, > >>> > >>>I'm pretty sure that there's no code in Bioperl that accesses or parses > CDD, > >>>hopefully I'm corrected if I'm wrong. > >>> > >>>Brian O. > >>> > >>> > >>>On 10/18/05 2:26 PM, "Olena Morozova" wrote: > >>> > >>> > >>> > >>> > >>> > >>>>Hi Brian, > >>>> > >>>>Thank you for your reply. It is the CDD (Conserved Domain Database) on > >>>>the NCBI web site. > >>>>Olena > >>>> > >>>>On 10/18/05, Brian Osborne wrote: > >>>> > >>>> > >>>> > >>>> > >>>>>Olena, > >>>>> > >>>>>What database contains the information you're looking for? > >>>>> > >>>>>Brian O. > >>>>> > >>>>> > >>>>>On 10/16/05 8:17 PM, "Olena Morozova" wrote: > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>>Hi agian, > >>>>>> > >>>>>>I just figured out how to obtain a list of conserved domains for a > >>>>>>given sequence using the SeqHound.pm module available at > >>>>>>http://www.blueprint.org/seqhound/apifunctslist.html > >>>>>> > >>>>>>Now I have a list of conserved domains for a given sequence and I > need > >>>>>>to extract information as to what these domains are and which ones > are > >>>>>>DNA-binding. Any help on this will be greatly appreciated > >>>>>> > >>>>>>Thanks again, > >>>>>>Olena > >>>>>> > >>>>>> > >>>>>>On 10/16/05, Olena Morozova wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>I have a list of transcription factor sequences, and I need to > group > >>>>>>>them according to the DNA-binding domains based on the > classification > >>>>>>>by TRANSFAC or any other database. Basically, I just need to > extract > >>>>>>>the DNA-binding domain information for a particular TF from a > database > >>>>>>>like TRANSFAC (I don't know what other databases would have this > >>>>>>>information, but any will do) Anyone has any idea how to do this? > >>>>>>>Thank you very much for your help and time > >>>>>>> > >>>>>>>Olena > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>_______________________________________________ > >>>>>>Bioperl-l mailing list > >>>>>>Bioperl-l@portal.open-bio.org > >>>>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>_______________________________________________ > >>>Bioperl-l mailing list > >>>Bioperl-l@portal.open-bio.org > >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >>> > >>> > > > > > > > > > > -- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From saldroubi at yahoo.com Thu Oct 20 12:04:49 2005 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Thu Oct 20 12:10:30 2005 Subject: [Bioperl-l] 1.5.1 install on top of old one? Message-ID: <20051020160449.79860.qmail@web34311.mail.mud.yahoo.com> All, I already have 1.5.0 installed. Do I just download and install 1.5.1 on top of the old like such as the steps outline below or do I need to remove the old bioperl somehow. If so, how? Thank you very much. Download, then unpack the tar file. For example: >gunzip bioperl-1.2.tar.gz >tar xvf bioperl-1.2.tar >cd bioperl-1.2 Now issue the make commands: >perl Makefile.PL >make >make test Sincerely, Sam Al-Droubi, M.S. saldroubi@yahoo.com From brian_osborne at cognia.com Thu Oct 20 12:32:04 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Oct 20 12:36:16 2005 Subject: [Bioperl-l] 1.5.1 install on top of old one? In-Reply-To: <20051020160449.79860.qmail@web34311.mail.mud.yahoo.com> Message-ID: Sam, You can install 1.5.1 on top of 1.5.0. Brian O. On 10/20/05 12:04 PM, "Sam Al-Droubi" wrote: > All, > > I already have 1.5.0 installed. Do I just download > and install 1.5.1 on top of the old like such as the > steps outline below or do I need to remove the old > bioperl somehow. If so, how? > > Thank you very much. > > > Download, then unpack the tar file. For example: > >> gunzip bioperl-1.2.tar.gz >> tar xvf bioperl-1.2.tar >> cd bioperl-1.2 > > Now issue the make commands: > >> perl Makefile.PL >> make >> make test > > Sincerely, > Sam Al-Droubi, M.S. > saldroubi@yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From fgarret at ub.edu Fri Oct 21 13:58:34 2005 From: fgarret at ub.edu (Filipe Garrett) Date: Fri Oct 21 14:16:12 2005 Subject: [Bioperl-l] Patch to Bioperl Modules Message-ID: <43592C4A.2080201@ub.edu> Hi all, I've been using the PAML parser and wanted to capture the number of parameters for each model. Since it was not implemented, I've made some changes to two Bioperl Modules (Tree.pm and PAML.pm). I've put the number of parameters as a tree description. For that I've added the method description to Tree.pm Another slight change that I made was to change the hardcoded "mlc" file name in the run method of Codeml to the outfile variable that is declared some lines before in the method Please take this changes into consideration. Bests Filipe Vieira diff -N of the changes Codeml.pm 497c497 < $parser = new Bio::Tools::Phylo::PAML(-file => "$tmpdir/$outfile", --- > Tree.pm 242,258d241 < =head2 description < < Title : description < Usage : $obj->description($newval) < Function: Get/Set the description string < Returns : value of description < Args : newvalue (optional) < < < =cut < < sub description{ < my $self = shift; < $self->{'_description'} = shift @_ if @_; < return $self->{'_description'}; < } < 279a263 > 318d301 < PAML.pm 830c830 < my ($instancecount,$num_param,$loglikelihood,$score,$done,$treelength) = (0,0,0,0,0); --- > my ($instancecount,$loglikelihood,$score,$done,$treelength) = (0,0,0,0,0); 848,850c848,849 < } elsif( /^\s*lnL\(.+np\:\s*(\d+)\)\:\s+(\S+)/ ) { < $num_param = $1; < $loglikelihood = $2; --- > } elsif( /^\s*lnL\(.+\)\:\s+(\S+)/ ) { > $loglikelihood = $1; 859d857 < $tree->description("num_param: $num_param"); 1395,1400d1392 < sub _parse_rst { < my ($self) = @_; < < } < < From jason.stajich at duke.edu Fri Oct 21 14:44:12 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Oct 21 16:12:27 2005 Subject: [Bioperl-l] Patch to Bioperl Modules In-Reply-To: <43592C4A.2080201@ub.edu> References: <43592C4A.2080201@ub.edu> Message-ID: Looks good - What about using the already existing 'id' field for a tree rather than adding the description field? -jason On Oct 21, 2005, at 1:58 PM, Filipe Garrett wrote: > Hi all, > > I've been using the PAML parser and wanted to capture the number of > parameters for each model. Since it was not implemented, I've made > some changes to two Bioperl Modules (Tree.pm and PAML.pm). I've put > the number of parameters as a tree description. For that I've added > the method description to Tree.pm > > Another slight change that I made was to change the hardcoded "mlc" > file name in the run method of Codeml to the outfile variable that > is declared some lines before in the method > > Please take this changes into consideration. > > Bests > > Filipe Vieira > > diff -N of the changes > > Codeml.pm > 497c497 > < $parser = new Bio::Tools::Phylo::PAML(-file => "$tmpdir/ > $outfile", > --- > > > > Tree.pm > 242,258d241 > < =head2 description > < > < Title : description > < Usage : $obj->description($newval) > < Function: Get/Set the description string > < Returns : value of description > < Args : newvalue (optional) > < > < > < =cut > < > < sub description{ > < my $self = shift; > < $self->{'_description'} = shift @_ if @_; > < return $self->{'_description'}; > < } > < > 279a263 > > > 318d301 > < > > > PAML.pm > 830c830 > < my ($instancecount,$num_param,$loglikelihood,$score,$done, > $treelength) = (0,0,0,0,0); > --- > > my ($instancecount,$loglikelihood,$score,$done,$treelength) = > (0,0,0,0,0); > 848,850c848,849 > < } elsif( /^\s*lnL\(.+np\:\s*(\d+)\)\:\s+(\S+)/ ) { > < $num_param = $1; > < $loglikelihood = $2; > --- > > } elsif( /^\s*lnL\(.+\)\:\s+(\S+)/ ) { > > $loglikelihood = $1; > 859d857 > < $tree->description("num_param: $num_param"); > 1395,1400d1392 > < sub _parse_rst { > < my ($self) = @_; > < > < } > < > < > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From osborne1 at optonline.net Sat Oct 22 11:17:40 2005 From: osborne1 at optonline.net (Brian Osborne) Date: Sat Oct 22 21:36:45 2005 Subject: [Bioperl-l] Cigar? Message-ID: bioperl-l, SimpleAlign::cigar_line is not working, I?d like to fix it. I?m seeing 2 definitions floating around, this one is from http://www.ensembl.org/info/glossary.html: Cigar - Cigar stands for Compact Idiosyncratic Gapped Alignment Report and defines the sequence of matches/mismatches and deletions (or gaps). The cigar line defines the sequence of matches/mismatches and deletions (or gaps). For example, this cigar line 2MD3M2D2M will mean that the alignment contains 2 matches/mismatches, 1 deletion (number 1 is omitted in order to save some space), 3 matches/mismatches, 2 deletions and 2 matches/mismatches. If the original sequence is: Original sequence: AACGCTT The aligned sequence will be: cigar line: 2MD3M2D2M M M D M M M D D M M A A - C G C - - T T This one is from the SimpleAlign documentation: Function : Generates a "cigar" (Compact Idiosyncratic Gapped Alignment Report) line for each sequence in the alignment The format is simply A-1,60;B-1,1:4,60;C-5,10:12,58 where A,B,C, etc. are the sequence identifiers, and the numbers refer to conserved positions within the alignment Bioperl uses the second, yes? Brian O. PS There was no test for cigar_line in SimpleAlign.t From osborne1 at optonline.net Sat Oct 22 12:53:31 2005 From: osborne1 at optonline.net (Brian Osborne) Date: Sat Oct 22 21:36:50 2005 Subject: [Bioperl-l] Cigar? In-Reply-To: Message-ID: bioperl-l, SimpleAlign::cigar_line appears to be broken, I?d like to fix it. I?m seeing 2 definitions of cigar format floating around, this one is from http://www.ensembl.org/info/glossary.html: Cigar - Cigar stands for Compact Idiosyncratic Gapped Alignment Report and defines the sequence of matches/mismatches and deletions (or gaps). The cigar line defines the sequence of matches/mismatches and deletions (or gaps). For example, this cigar line 2MD3M2D2M will mean that the alignment contains 2 matches/mismatches, 1 deletion (number 1 is omitted in order to save some space), 3 matches/mismatches, 2 deletions and 2 matches/mismatches. If the original sequence is: Original sequence: AACGCTT The aligned sequence will be: cigar line: 2MD3M2D2M M M D M M M D D M M A A - C G C - - T T This one is from the SimpleAlign documentation: Function : Generates a "cigar" (Compact Idiosyncratic Gapped Alignment Report) line for each sequence in the alignment The format is simply A-1,60;B-1,1:4,60;C-5,10:12,58 where A,B,C, etc. are the sequence identifiers, and the numbers refer to conserved positions within the alignment Bioperl uses this second, yes? Brian O. PS There was no test for cigar_line in SimpleAlign.t From jason.stajich at duke.edu Sat Oct 22 21:07:28 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat Oct 22 23:00:37 2005 Subject: [Bioperl-l] Re: [Bioperl-guts-l] please can anyone help me.. In-Reply-To: References: Message-ID: Can you at least tell folks on the list what you tried? Did you follow the directions to download nmake.exe and try it? My suspicion is this s only going to work if you have a unix-like environment on your windows machine, something like Cygwin. Please direct this question to the bioperl-l@bioperl.org list for more help as there will hopefully be Bioperl users with windows experience that can help. -jason On Oct 21, 2005, at 9:59 PM, Tahira Farid wrote: > > hi...I need the bioperl -exe package because i want to use > Bio::Tools::dpAlign for smith waterman alignmnet. > > I just want to know how to install the additional bioperl packages > like bioperl -exe so that i can use smith waterman local alignment. > > thanks > > > >> From: Jason Stajich >> To: Tahira Farid >> CC: bioperl-guts-l@portal.open-bio.org >> Subject: Re: [Bioperl-guts-l] please can anyone help me.. >> Date: Fri, 21 Oct 2005 13:20:23 -0400 >> >> Are you sure you really want to install this package!!!!!!!! >> What did you already do!!!!! You read that you need to use >> nmake, so what happens when you use nmake?!!!!! >> >> >> On Oct 21, 2005, at 11:49 AM, Tahira Farid wrote: >> >> >>> Hi please someone help me. I hv not got any replies >>> yet....!!!!!!!!!! >>> >>> >>> .I have windows xp and i have successfully installed active perl >>> 5.8 and the bioperl core. but i need some help in installing the >>> bioperl-ext package. there r lots of information available >>> online but it is very scattered. i read i have to use nmake.exe >>> to install the bioperl-ext. and I have also download the bioperl- >>> ext zip file from bioperl.org. >>> >>> >>> please can anyone help me out ini installing the package. I >>> would really appreciate ur help. >>> >>> thanks. >>> >>> >>> _______________________________________________ >>> Bioperl-guts-l mailing list >>> Bioperl-guts-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l >>> >>> >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > > > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From fgarret at ub.edu Tue Oct 25 06:45:33 2005 From: fgarret at ub.edu (Filipe Garrett) Date: Tue Oct 25 07:02:56 2005 Subject: [Bioperl-l] Unroot a Bio::Tree Message-ID: <435E0CCD.7080509@ub.edu> Hi all, I have been looking at Bio::Tree and I would like to know if there is any way to unroot a tree? Any ideas? thanks in adv FG From clusterbuilder at gmail.com Tue Oct 25 14:05:53 2005 From: clusterbuilder at gmail.com (Nick I) Date: Tue Oct 25 15:50:13 2005 Subject: [Bioperl-l] Ask the Cluster Expert Message-ID: Hi, Thanks to the response from many in the community I have added sections about diskless clusters and information on 32-bit and 64-bit processors at the site I help run, www.ClusterBuilder.org.I also added a section called Ask the Cluster Expert ( http://www.clusterbuilder.org/pages/ask-the-expert.php) for people to submit questions they have about cluster and grid computing. I post the questions at an FAQ page (http://www.clusterbuilder.org/pages/ask-the-expert/faq.php) and then research the answer as well as allow those knowledgeable in the community to submit a response to the question. I want to build a valuable knowledgebase of high performance computing information. I need you to share your knowledge by adding to the question responses and also submitting questions/answers to common problems you've experienced in the past and are experiencing now. Thanks, Nick From indapa at gmail.com Tue Oct 25 16:48:09 2005 From: indapa at gmail.com (Amit Indap) Date: Tue Oct 25 22:38:49 2005 Subject: [Bioperl-l] extracting subsequences Message-ID: <3cfaa4040510251348g78471ccas8c634391150cd64a@mail.gmail.com> Hi, I have to extract subsequences from fasta files containing entire human chromosomes. For example I would like to extract bp 167506667..167523040. I know how to do this using the Bio::Seq and Bio::SeqIO APIs. The problem is it takes a long time to read in an entire fasta file containing a chromosome. Is there a way I can speed this up? The bp indices are taken from BLAT-ing my sequences to the genome. I could use megablast to find which contigs my sequences lie on, and then read in those files rather than the whole chromosome. Any suggestions would be helpful. Thanks. Amit -- Amit Indap http://www.bscb.cornell.edu/Homepages/Amit_Indap/ From jason.stajich at duke.edu Tue Oct 25 22:57:22 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Oct 25 23:24:52 2005 Subject: [Bioperl-l] extracting subsequences In-Reply-To: <3cfaa4040510251348g78471ccas8c634391150cd64a@mail.gmail.com> References: <3cfaa4040510251348g78471ccas8c634391150cd64a@mail.gmail.com> Message-ID: <4744AD72-2F6E-470A-99FB-8432BB439A76@duke.edu> Bio::DB::Fasta is the best way to do this for big sequences. -jason On Oct 25, 2005, at 4:48 PM, Amit Indap wrote: > Hi, > > I have to extract subsequences from fasta files containing entire > human chromosomes. For example I would like to extract bp > 167506667..167523040. I know how to do this using the Bio::Seq and > Bio::SeqIO APIs. The problem is it takes a long time to read in an > entire fasta file containing a chromosome. Is there a way I can speed > this up? > > The bp indices are taken from BLAT-ing my sequences to the genome. I > could use megablast to find which contigs my sequences lie on, and > then read in those files rather than the whole chromosome. > > Any suggestions would be helpful. Thanks. > > Amit > -- > Amit Indap > http://www.bscb.cornell.edu/Homepages/Amit_Indap/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From sdavis2 at mail.nih.gov Wed Oct 26 06:36:48 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed Oct 26 06:42:08 2005 Subject: [Bioperl-l] extracting subsequences In-Reply-To: <4744AD72-2F6E-470A-99FB-8432BB439A76@duke.edu> Message-ID: As an alternative, Jim Kent and UCSC have tools for working with .nib and twobit files (smart storage for large sequences) that are also very fast. They have names like "twoBitToFa". That said, I really like Bio::DB::Fasta and use it. Sean On 10/25/05 10:57 PM, "Jason Stajich" wrote: > Bio::DB::Fasta is the best way to do this for big sequences. > > -jason > On Oct 25, 2005, at 4:48 PM, Amit Indap wrote: > >> Hi, >> >> I have to extract subsequences from fasta files containing entire >> human chromosomes. For example I would like to extract bp >> 167506667..167523040. I know how to do this using the Bio::Seq and >> Bio::SeqIO APIs. The problem is it takes a long time to read in an >> entire fasta file containing a chromosome. Is there a way I can speed >> this up? >> >> The bp indices are taken from BLAT-ing my sequences to the genome. I >> could use megablast to find which contigs my sequences lie on, and >> then read in those files rather than the whole chromosome. >> >> Any suggestions would be helpful. Thanks. >> >> Amit >> -- >> Amit Indap >> http://www.bscb.cornell.edu/Homepages/Amit_Indap/ >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From cain at cshl.edu Tue Oct 25 23:06:48 2005 From: cain at cshl.edu (Scott Cain) Date: Wed Oct 26 07:09:22 2005 Subject: [Bioperl-l] extracting subsequences In-Reply-To: <3cfaa4040510251348g78471ccas8c634391150cd64a@mail.gmail.com> References: <3cfaa4040510251348g78471ccas8c634391150cd64a@mail.gmail.com> Message-ID: <1130296008.22733.148.camel@localhost.localdomain> Amit, Look at Bio::DB::Fasta. It builds a BerkeleyDB of the fasta files and results in significantly faster substringing. Scott On Tue, 2005-10-25 at 16:48 -0400, Amit Indap wrote: > Hi, > > I have to extract subsequences from fasta files containing entire > human chromosomes. For example I would like to extract bp > 167506667..167523040. I know how to do this using the Bio::Seq and > Bio::SeqIO APIs. The problem is it takes a long time to read in an > entire fasta file containing a chromosome. Is there a way I can speed > this up? > > The bp indices are taken from BLAT-ing my sequences to the genome. I > could use megablast to find which contigs my sequences lie on, and > then read in those files rather than the whole chromosome. > > Any suggestions would be helpful. Thanks. > > Amit > -- > Amit Indap > http://www.bscb.cornell.edu/Homepages/Amit_Indap/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From tembe at bioanalysis.org Wed Oct 26 11:12:37 2005 From: tembe at bioanalysis.org (Waibhav Tembe) Date: Wed Oct 26 11:18:54 2005 Subject: [Bioperl-l] Setting Theoretical Database size for bl2seq Message-ID: <435F9CE5.1070601@bioanalysis.org> Hello List, This is not a BioPerl question, but I could not find a satisfactory answer from other sources and would appreciate any help. I am trying to use bl2seq for comparing query "q" and another genome "g". Now, for "q" I already have blastall output from an nt database containing >2 million sequences. I understand that to get compatible e values, I need to set -d parameter for bl2seq to the theoretical data size of that nt database. Which number from the following 4 (taken from blastall output) should be used for -d ? length of database: 12,254,801,043 effective length of database: 12,167,805,299 effective search space: 48671221196 effective search space used: 48671221196 Any pointers/website/docs will be appreciated. Thank you. Tembe From khoueiry at ibdm.univ-mrs.fr Wed Oct 26 12:44:05 2005 From: khoueiry at ibdm.univ-mrs.fr (khoueiry) Date: Wed Oct 26 13:03:57 2005 Subject: [Bioperl-l] Setting Theoretical Database size for bl2seq In-Reply-To: <435F9CE5.1070601@bioanalysis.org> References: <435F9CE5.1070601@bioanalysis.org> Message-ID: <1130345045.3308.8.camel@DavidLinux> Tembe, I made a fast search and found the following: I have bl2seq installed on my machine and thus, making a simple "man bl2seq" gave me an idea about the parameters and it says that -d correspond to : "-d N (bl2seq)--- Use theoretical DB size of N (zero stands for the real size)" a fast search in google gave me a similar result that you can find in this link ' http://hits.isb-sib.ch/doc/motif_score.shtml'. Briefly, they say that, when calculating an E-value, specialy when converting from a normalized score, you have to take the database size in residues. So, I think that in your case, it will correspond to the "length of database: 12,254,801,043". I hope this is the fine answer, and hope that others will give you more details if possible. Pierre On Wed, 2005-10-26 at 11:12 -0400, Waibhav Tembe wrote: > Hello List, > > This is not a BioPerl question, but I could not find a satisfactory answer > from other sources and would appreciate any help. > > I am trying to use bl2seq for comparing query "q" and another genome "g". > Now, for "q" I already have blastall output from an nt database > containing >2 million > sequences. I understand that to get compatible e values, I need to set > -d parameter > for bl2seq to the theoretical data size of that nt database. Which > number from > the following 4 (taken from blastall output) should be used for -d ? > > length of database: 12,254,801,043 > effective length of database: 12,167,805,299 > effective search space: 48671221196 > effective search space used: 48671221196 > > Any pointers/website/docs will be appreciated. > > Thank you. > > Tembe > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From Zaigang.Liu at usa.dupont.com Wed Oct 26 11:55:23 2005 From: Zaigang.Liu at usa.dupont.com (Zaigang Liu) Date: Wed Oct 26 13:11:50 2005 Subject: [Bioperl-l] perl memory problem Message-ID: Hello, everyone, I am new to this list and I am looking for help for one problem I met. I am trying to run a perl program doing calculation on large data set. It uses an array to store the input data and the array size will reach 8G after all the data is loaded by my estimation. The process always stops and gives a "out of memory" error when the memory usage reached about 2G. I used a solaris machine with 40G memory so it is not really run out of memory when the program stopped. Does anyone know where could this program come from and how to fix it. Thanks in advance, Zaigang This communication is for use by the intended recipient and contains information that may be Privileged, confidential or copyrighted under applicable law. If you are not the intended recipient, you are hereby formally notified that any use, copying or distribution of this e-mail, in whole or in part, is strictly prohibited. Please notify the sender by return e-mail and delete this e-mail from your system. Unless explicitly and conspicuously designated as "E-Contract Intended", this e-mail does not constitute a contract offer, a contract amendment, or an acceptance of a contract offer. This e-mail does not constitute a consent to the use of sender's contact information for direct marketing purposes or for transfers of data to third parties. Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean http://www.DuPont.com/corp/email_disclaimer.html From jbedell at oriongenomics.com Wed Oct 26 13:08:54 2005 From: jbedell at oriongenomics.com (Joseph Bedell) Date: Wed Oct 26 13:21:33 2005 Subject: [Bioperl-l] Setting Theoretical Database size for bl2seq Message-ID: <434AF352F9D03C4C896782B8CC78BC769855FC@VADER.oriongenomics.com> Hi Tembe, >-----Original Message----- >From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- >bounces@portal.open-bio.org] On Behalf Of Waibhav Tembe >Sent: Wednesday, October 26, 2005 10:13 AM >To: bioperl-l >Subject: [Bioperl-l] Setting Theoretical Database size for bl2seq > >Hello List, > >This is not a BioPerl question, but I could not find a satisfactory answer >from other sources and would appreciate any help. > >I am trying to use bl2seq for comparing query "q" and another genome "g". >Now, for "q" I already have blastall output from an nt database >containing >2 million >sequences. I understand that to get compatible e values, I need to set >-d parameter >for bl2seq to the theoretical data size of that nt database. Which >number from >the following 4 (taken from blastall output) should be used for -d ? > >length of database: 12,254,801,043 >effective length of database: 12,167,805,299 >effective search space: 48671221196 >effective search space used: 48671221196 I believe that you would set -d to "length of database", as opposed to effective length; however the best bet would be to set -Y to "effective search space" since Effective length is actually given in the -Y usage statement. I'm not completely sure if -d is looking for the effective size or actual size of the database. At the end of the day, the numbers differ by so little that you probably won't see a true diff in the e-values between actual vs. effective database size. Joey ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Joseph A Bedell, Ph.D. office: 314-615-6979 Director, Bioinformatics fax: 314-615-6975 Orion Genomics cell: 314-518-1343 4041 Forest Park Ave St. Louis, MO 63108 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >Any pointers/website/docs will be appreciated. > >Thank you. > >Tembe > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l From tobias.straub at lmu.de Wed Oct 26 13:18:47 2005 From: tobias.straub at lmu.de (Tobias Straub) Date: Wed Oct 26 13:34:19 2005 Subject: [Bioperl-l] retrieving gene model from gff database Message-ID: <04c9643d330c0cf4410c5fd0a4bc8974@lmu.de> dear all, probably kind of a basic question but I'm somehow unable to answer it without help: I have the most recent gadfly GFF database loaded. now I simply want to get the gene models for selected genes - like searching with get_feature_by_name(-class => "Gene", -name=>"Gene_of_interest") - and to find out whether they contain annotated introns or not. first problem is that the class gene does not seem to be supported by the GFF database. do I need to use an aggregator for that? if I have to use an aggregator, how do I search for the gene then? so far I was only able to get the gene when searching for class 'Sequence'. but this won't yield a gene-structure then, right? thank you in advance! Tobias ====================================================================== Dr. Tobias Straub Adolf-Butenandt-Institute, Molecular Biology tel: +49-89-2180 75 439 Schillerstr. 44, 80336 Munich, Germany ====================================================================== From jason.stajich at duke.edu Wed Oct 26 13:29:42 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Oct 26 14:37:31 2005 Subject: [Bioperl-l] perl memory problem In-Reply-To: References: Message-ID: On Oct 26, 2005, at 11:55 AM, Zaigang Liu wrote: > > > > > Hello, everyone, > > I am new to this list and I am looking for help for one problem I met. > > I am trying to run a perl program doing calculation on large data > set. It > uses an array to store the input data and the array size will reach 8G > after all the data is loaded by my estimation. The process always > stops and > gives a "out of memory" error when the memory usage reached about > 2G. I > used a solaris machine with 40G memory so it is not really run out of > memory when the program stopped. > there is probably a limit on how much memory perl can address in the version you have compiled. Have you considered using a data structure which does not require the entire array to be in memory at once. the DB_File module is very useful for this as you can tie your array to a flatfile Berkeley DB. You can also use something like SQLite or a full fledged relational database instead if you are crunching such a large number of items. > Does anyone know where could this program come from and how to fix it. > > Thanks in advance, > > Zaigang > > This communication is for use by the intended recipient and contains > information that may be Privileged, confidential or copyrighted under > applicable law. If you are not the intended recipient, you are hereby > formally notified that any use, copying or distribution of this e- > mail, > in whole or in part, is strictly prohibited. Please notify the > sender by > return e-mail and delete this e-mail from your system. Unless > explicitly > and conspicuously designated as "E-Contract Intended", this e-mail > does > not constitute a contract offer, a contract amendment, or an > acceptance > of a contract offer. This e-mail does not constitute a consent to the > use of sender's contact information for direct marketing purposes > or for > transfers of data to third parties. > > Francais Deutsch Italiano Espanol Portugues Japanese Chinese > Korean > > http://www.DuPont.com/corp/email_disclaimer.html > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From gbazykin at Princeton.EDU Wed Oct 26 17:27:07 2005 From: gbazykin at Princeton.EDU (Georgii Bazykin) Date: Wed Oct 26 17:34:06 2005 Subject: [Bioperl-l] suggestions for additions to Tree Message-ID: <148174979677.20051026172707@princeton.edu> Hi, here are some tree-related methods I needed and added to my bioperl. Hope someone else finds any of them useful as well. Yegor Bazykin ============================================= To NodeI: # modified from total_branch_length in Tree:Tree module # gets sum of branches in the subtree - descendents of given node =head2 children_branch_length Title : children_branch_length Usage : my $size = $node->children_branch_length Function: Returns the sum of the length of all branches of the subtree which starts at given node Returns : integer Args : none =cut sub children_branch_length { my ($self) = @_; return 0 if($self -> is_Leaf) ; my $sum = 0; for ($self -> get_all_Descendents) { $sum += $_->branch_length || 0; } return $sum; } ----------------------------------- =head2 height_nodes Title : height_nodes Usage : my $len = $node->height_nodes Function: Returns the height of the tree starting at this node. Height is the maximum branchlength to get to the tip. Returns : The longest length to a leaf, in nodes Args : none =cut sub height_nodes{ my ($self) = @_; return 0 if( $self->is_Leaf ); my $max = 0; foreach my $subnode ( $self->each_Descendent ) { my $s = $subnode->height_nodes + 1; if( $s > $max ) { $max = $s; } } return $max; } ---------------------------------- =head2 get_all_Descendent_Leaves Title : get_all_Descendent_Leaves($sortby) Usage : my @nodes = $node->get_all_Descendent_Leaves; Function: Recursively fetch all the nodes and their descendents, only selecting leaves *NOTE* This is different from each_Descendent Returns : Array or Bio::Tree::NodeI objects Args : $sortby [optional] "height", "creation" or coderef to be used to sort the order of children nodes. =cut sub get_all_Descendent_Leaves{ my ($self, $sortby) = @_; $sortby ||= 'height'; my @nodes; foreach my $node ( $self->each_Descendent($sortby) ) { if ($node->is_Leaf) { push @nodes, $node; } else { push @nodes, ($node->get_all_Descendents($sortby)); } } return @nodes; } ===================================================== To Tree: =head2 total_internal_branch_length Title : total_internal_branch_length Usage : my $size = $tree->total_internal_branch_length Function: Returns the sum of the length of all branches, excluding branches leading to leaves Returns : integer Args : none =cut sub total_internal_branch_length { my ($self) = @_; my $sum = 0; if( defined $self->get_root_node ) { for ( $self->get_root_node->get_Descendents() ) { unless ($_->is_Leaf) { # YB: THIS IS ALL I ADDED $sum += $_->branch_length || 0; } } } return $sum; } ================================================= To TreeFunctionsI: =head2 distance_nodes Title : distance_nodes Usage : distance_nodes(-nodes => \@nodes ) Function: returns the distance between two given nodes in numbers of nodes Returns : numerical distance Args : -nodes => arrayref of nodes to test =cut # YB: distance_nodes is very similar to distance method in TreeFunctionsI except that # it estimates distances between nodes in numbers of nodes (e.g., 1 between mother and # daughter, 2 between two sisters, etc.) sub distance_nodes { my ($self,@args) = @_; my ($nodes) = $self->_rearrange([qw(NODES)],@args); if( ! defined $nodes ) { $self->warn("Must supply -nodes parameter to distance_nodes() method"); return undef; } my ($node1,$node2) = $self->_check_two_nodes($nodes); # algorithm: # Find lca: Start with first node, find and save every node from it # to root, saving cumulative distance. Then start with second node; # for it and each of its ancestor nodes, check to see if it's in # the first node's ancestor list - if so it is the lca. Return sum # of (cumul. distance from node1 to lca) and (cumul. distance from # node2 to lca) # find and save every ancestor of node1 (including itself) my %node1_ancestors; # keys are internal ids, values are objects my %node1_cumul_dist; # keys are internal ids, values # are cumulative distance from node1 to given node my $place = $node1; # start at node1 my $cumul_dist = 0; while ( $place ){ $node1_ancestors{$place->internal_id} = $place; $node1_cumul_dist{$place->internal_id} = $cumul_dist; $cumul_dist++; # YB #YB if ($place->branch_length) { #YB $cumul_dist += $place->branch_length; # include current branch #YB # length in next iteration #YB } $place = $place->ancestor; } # now climb up node2, for each node checking whether # it's in node1_ancestors $place = $node2; # start at node2 $cumul_dist = 0; while ( $place ){ foreach my $key ( keys %node1_ancestors ){ # ugh if ( $place->internal_id == $key){ # we're at lca return $node1_cumul_dist{$key} + $cumul_dist; } } # include current branch length in next iteration #YB $cumul_dist += $place->branch_length || 0; $cumul_dist++; # YB $place = $place->ancestor; } $self->warn("Could not find distance!"); # should never execute, # if so, there's a problem return undef; } From golharam at umdnj.edu Wed Oct 26 16:50:37 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Wed Oct 26 17:48:44 2005 Subject: [Bioperl-l] Setting Theoretical Database size for bl2seq In-Reply-To: <435F9CE5.1070601@bioanalysis.org> Message-ID: <008501c5da6e$ebc01920$4722db82@GOLHARMOBILE1> Use length of the database: 12,254,801,043. BLAST will adjust the number to get the effective length... -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Waibhav Tembe Sent: Wednesday, October 26, 2005 11:13 AM To: bioperl-l Subject: [Bioperl-l] Setting Theoretical Database size for bl2seq Hello List, This is not a BioPerl question, but I could not find a satisfactory answer from other sources and would appreciate any help. I am trying to use bl2seq for comparing query "q" and another genome "g". Now, for "q" I already have blastall output from an nt database containing >2 million sequences. I understand that to get compatible e values, I need to set -d parameter for bl2seq to the theoretical data size of that nt database. Which number from the following 4 (taken from blastall output) should be used for -d ? length of database: 12,254,801,043 effective length of database: 12,167,805,299 effective search space: 48671221196 effective search space used: 48671221196 Any pointers/website/docs will be appreciated. Thank you. Tembe _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From bmoore at genetics.utah.edu Wed Oct 26 23:10:14 2005 From: bmoore at genetics.utah.edu (Barry Moore) Date: Wed Oct 26 23:18:31 2005 Subject: [Bioperl-l] perl memory problem Message-ID: Zaigang- Tie::File will allow you to tie a file to an array. You can then step through the array, modifying it as you go if you like, and only the portion of the file that you are working on is actually current in memory. It's easy to use and has worked well for me in similar situations. Barry > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > bounces@portal.open-bio.org] On Behalf Of Jason Stajich > Sent: Wednesday, October 26, 2005 11:30 AM > To: Zaigang Liu > Cc: bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] perl memory problem > > > On Oct 26, 2005, at 11:55 AM, Zaigang Liu wrote: > > > > > > > > > > > Hello, everyone, > > > > I am new to this list and I am looking for help for one problem I met. > > > > I am trying to run a perl program doing calculation on large data > > set. It > > uses an array to store the input data and the array size will reach 8G > > after all the data is loaded by my estimation. The process always > > stops and > > gives a "out of memory" error when the memory usage reached about > > 2G. I > > used a solaris machine with 40G memory so it is not really run out of > > memory when the program stopped. > > > there is probably a limit on how much memory perl can address in the > version you have compiled. > > Have you considered using a data structure which does not require the > entire array to be in memory at once. the DB_File module is very > useful for this as you can tie your array to a flatfile Berkeley DB. > You can also use something like SQLite or a full fledged relational > database instead if you are crunching such a large number of items. > > > > Does anyone know where could this program come from and how to fix it. > > > > Thanks in advance, > > > > Zaigang > > > > This communication is for use by the intended recipient and contains > > information that may be Privileged, confidential or copyrighted under > > applicable law. If you are not the intended recipient, you are hereby > > formally notified that any use, copying or distribution of this e- > > mail, > > in whole or in part, is strictly prohibited. Please notify the > > sender by > > return e-mail and delete this e-mail from your system. Unless > > explicitly > > and conspicuously designated as "E-Contract Intended", this e-mail > > does > > not constitute a contract offer, a contract amendment, or an > > acceptance > > of a contract offer. This e-mail does not constitute a consent to the > > use of sender's contact information for direct marketing purposes > > or for > > transfers of data to third parties. > > > > Francais Deutsch Italiano Espanol Portugues Japanese Chinese > > Korean > > > > http://www.DuPont.com/corp/email_disclaimer.html > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Thu Oct 27 08:25:05 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Oct 27 09:48:14 2005 Subject: [Bioperl-l] FW: [Bug 1754] Bio::Location::Fuzzy.pm -> SwissProt FeatureTable locationvalue error (undef values!) In-Reply-To: <4360BA3D.30808@mendel.imp.univie.ac.at> Message-ID: Hi Brian, finally took some time to check your testcases! They are fine and test OK on our installation as well. You wrote you were not sure if you are covering all cases. There are two more possible cases those would need verification: "?..?" => [$fuzzy_impl, undef, undef, "UNCERTAIN", undef, undef, "UNCERTAIN", "UNCERTAIN", 1, 1], "12..?1" => [$fuzzy_impl, 1, 1, "UNCERTAIN", 12, 12, "EXACT", "EXACT", 1, 1] The first (?..?) is a really occuring SP entry - they have annotations which do not map to an exact location... :S. the second case i'm not 100% sure anymore that i did find it in SP, but your parser definetly should be/is capable of coping with such a locationstring. Also these two cases tested fine on my machine as well. So for by part this bug is CLOSED... Another problem on the horizon: are you aware of the new annotations coming up from the SP-team in the database? Have you had a look at them? Do you know if your paser is ready for it? If you need help, I'm available - we will need to verify the parser is working for these when the new annotations are integrated. I had a talk with some guys from the SP-team at the ECCB'05, so I guess I could receive some "testrecords" if I ask very nicely ;). br,flo -- Florian Leitner Institute for Molecular Pathology, Vienna (www.imp.univie.ac.at) Eisenhaber Bioinformatics Group (mendel.imp.univie.ac.at) e-mail: mobile: +43-(0)650-4567 321 ------ End of Forwarded Message From brian_osborne at cognia.com Thu Oct 27 09:55:03 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Oct 27 09:59:32 2005 Subject: [Bioperl-l] Re: [Bug 1754] Bio::Location::Fuzzy.pm -> SwissProt FeatureTable locationvalue error (undef values!) In-Reply-To: <4360BA3D.30808@mendel.imp.univie.ac.at> Message-ID: Florian, By "12..?1" you mean the complement? Shouldn't the strand value be -1? Brian O. On 10/27/05 7:30 AM, "Florian Leitner" wrote: > "12..?1" => [$fuzzy_impl, > 1, 1, "UNCERTAIN", 12, 12, "EXACT", "EXACT", 1, 1] From boris.steipe at utoronto.ca Thu Oct 27 14:32:50 2005 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Thu Oct 27 14:37:34 2005 Subject: [Bioperl-l] Fwd: Please take the Gene Ontology survey References: Message-ID: My apologies in case this reaches anyone more than once. Context: the GO grant is up for competitive renewal in the new year, and the volume of responses to this survey will help GO demonstrate the degree to which it has been adopted by the community. And, as you know, government support for computational biology infrastructure has been insufficient in the recent past. Boris Begin forwarded message: > From: Jane Lomax > Date: 27 October 2005 13:20:14 GMT-04:00 > To: boris.steipe@utoronto.ca > Subject: Please take the Gene Ontology survey > > > Hello, > > The Gene Ontology (GO) is a system for functional annotation of > genes and > gene products. It enables classification of gene products according to > molecular function, biological process, and cellular location of > action. > > Please help us by taking part in our survey. > > The results of this survey will help us improve our services > to our user community, and help direct our resources more effectively. > > It's a very straightforward set of questions, which should take a > maximum > of 10 minutes to complete. There's no requirement to submit your name > or email address. To complete the survey, go to: > > http://www.AdvancedSurvey.com/default.asp?SurveyID=32355 > > Please pass on to any friends or collegues not on these lists. > > Many thanks for your time, > > The GO Consortium > > > > > > > > > > > From jeremy_just at netcourrier.com Thu Oct 27 15:11:59 2005 From: jeremy_just at netcourrier.com (=?ISO-8859-15?Q?J=E9r=E9my?= JUST) Date: Thu Oct 27 15:30:48 2005 Subject: [Bioperl-l] perl memory problem In-Reply-To: References: Message-ID: <20051027211159.000025ac@pearson.infobiogen.fr> On Wed, 26 Oct 2005 11:55:23 -0400 Zaigang Liu wrote: > The process always stops and gives a "out of memory" error when the > memory usage reached about 2G. I used a solaris machine with 40G memory > so it is not really run out of memory when the program stopped. Are you using a 32 bits Perl binary? To check, use: $ file `which perl` If you get something like: /usr/bin/perl: ELF 32-bit MSB executable SPARC Version 1, dynamically linked, stripped then you should compile another binary, in 64 bits. -- J?r?my JUST From govind.chandra at bbsrc.ac.uk Fri Oct 28 09:17:17 2005 From: govind.chandra at bbsrc.ac.uk (Govind Chandra) Date: Fri Oct 28 09:21:16 2005 Subject: [Bioperl-l] Bio::SeqFeature::Generic tag trouble. Message-ID: <1130505437.3361.21.camel@pc25231.jic.bbsrc.ac.uk> Hi, The script below is giving me the output shown further below. Could someone please point out what is it that I am doing wrong. Perl is 5.8.7. BioPerl version is 1.5.0 Linux is RedHat 8.0 Thanks Govind Microbiology John Innes Centre Norwich NR4 7UH UK ### script begins ### use Bio::SeqIO; $seqout=Bio::SeqIO->new('-fh' => \*STDOUT, '-format' => 'embl'); $outobj=Bio::Seq->new( '-seq' => 'cccgcggagcgggtaccacatcgctgcgcgatgtgcaagcgaacacccgcgctgc'); $nft=Bio::SeqFeature::Generic->new( '-start' => 10, '-end' => 25, '-strand' => -1, '-primary' => 'CDS', '-tag' => {'locus_tag' => 'something', 'gene' => 'hoaX'} ); $outobj->add_SeqFeature($nft); $seqout->write_seq($outobj); ### script ends ### ### output begins ### ID standard; DNA; UNK; 55 BP. XX AC unknown; XX XX FH Key Location/Qualifiers FH FT CDS complement(10..25) FT /locus_tag="Bio::Annotation::SimpleValue=HASH(0x85daeac)" FT /gene="Bio::Annotation::SimpleValue=HASH(0x85e1adc)" XX SQ Sequence 55 BP; 10 A; 21 C; 18 G; 6 T; 0 other; cccgcggagc gggtaccaca tcgctgcgcg atgtgcaagc gaacacccgc gctgc 55 // ### output ends ### From jason.stajich at duke.edu Fri Oct 28 09:45:11 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Oct 28 09:42:45 2005 Subject: [Bioperl-l] Bio::SeqFeature::Generic tag trouble. In-Reply-To: <1130505437.3361.21.camel@pc25231.jic.bbsrc.ac.uk> References: <1130505437.3361.21.camel@pc25231.jic.bbsrc.ac.uk> Message-ID: <68B87571-5DB1-4CDE-84BB-5F9B4F4DF670@duke.edu> Upgrade to bioperl 1.5.1 This is an oft reported and described bug on this mailing list.... -jason On Oct 28, 2005, at 9:17 AM, Govind Chandra wrote: > Hi, > > The script below is giving me the output shown further below. Could > someone please point out what is it that I am doing wrong. > > Perl is 5.8.7. BioPerl version is 1.5.0 > Linux is RedHat 8.0 > > Thanks > > Govind > Microbiology > John Innes Centre > Norwich NR4 7UH > UK > > > ### script begins ### > > use Bio::SeqIO; > > $seqout=Bio::SeqIO->new('-fh' => \*STDOUT, '-format' => 'embl'); > > $outobj=Bio::Seq->new( > '-seq' => 'cccgcggagcgggtaccacatcgctgcgcgatgtgcaagcgaacacccgcgctgc'); > > > $nft=Bio::SeqFeature::Generic->new( > '-start' => 10, > '-end' => 25, > '-strand' => -1, > '-primary' => 'CDS', > '-tag' => {'locus_tag' => 'something', > 'gene' => 'hoaX'} > ); > > $outobj->add_SeqFeature($nft); > > $seqout->write_seq($outobj); > ### script ends ### > > > > ### output begins ### > ID standard; DNA; UNK; 55 BP. > XX > AC unknown; > XX > XX > FH Key Location/Qualifiers > FH > FT CDS complement(10..25) > FT /locus_tag="Bio::Annotation::SimpleValue=HASH > (0x85daeac)" > FT /gene="Bio::Annotation::SimpleValue=HASH > (0x85e1adc)" > XX > SQ Sequence 55 BP; 10 A; 21 C; 18 G; 6 T; 0 other; > cccgcggagc gggtaccaca tcgctgcgcg atgtgcaagc gaacacccgc > gctgc 55 > // > ### output ends ### > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From kellert at ohsu.edu Fri Oct 28 15:55:02 2005 From: kellert at ohsu.edu (Thomas J Keller) Date: Fri Oct 28 16:03:20 2005 Subject: [Bioperl-l] RemoteBlast problem Message-ID: <316D6B2B-809A-4A4B-A8D3-76BF4384458B@ohsu.edu> Greetings, I'm using perl 5.8.6 and the fink installation of bioperl 1.4.5 on an Apple G5 running OS X 10.4.2 running my script with a simple fasta dna sequence file: $ bp_remote_blast2.pl -p blastx -d nr -i test.fa Here's the error: ------------- EXCEPTION ------------- MSG: no data for midline Query 78 HRRPSFSACRCVLSASSVFPSRLGNNYITAAGAQVLAEGLRGNTSLQFLG 227 STACK Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ SearchIO/blast.pm:1151 STACK toplevel /Users/kellert/Sandbox/Perlscripts/bp_remote_blast2.pl:90 It looks like Bio::SearchIO::blast is choking on the result from Bio::Tools::Run::RemoteBlast I lifted this from Jason's exampleL bp_remote_blast but modified it to use SearchIO method instead of BPLite: use warnings; use strict; use vars qw($USAGE); use Bio::Tools::Run::RemoteBlast; use Bio::SeqIO; use Bio::SearchIO; use Getopt::Long; $USAGE = "remote_blast.pl [-h] [-p prog] [-d db] [-e expect] [-f seqformat] -i seqfile\n"; my ($prog, $db, $expect ); my ($sequencefile,$sequenceformat,$help) = (undef, 'fasta',undef); &GetOptions('prog|p=s' => \$prog, 'db|d=s' => \$db, 'expect|e=s' => \$expect, 'input|i=s' => \$sequencefile, 'format|f=s' => \$sequenceformat, 'help|h' => \$help, ); if( $help ) { exec('perldoc', $0); die; } if( !defined $prog ) { die($USAGE . "\n\tMust specify a valid program name ([t]blast [pxn])\n"); } if( !defined $db ) { die($USAGE . "\n\tMust specify a db (e.g. \'nr\') to search\n"); } if( !defined $sequencefile ) { die($USAGE . "\n\tMust specify a path to a sequence file.\n"); } my $factory = new Bio::Tools::Run::RemoteBlast ('-prog' => $prog, '-data' => $db, '-expect' => $expect, 'readmethod' => 'SearchIO', #use SearchIO to parse ); # submit_blast can only currenly handle fasta format files so I'll # preprocess outside of the module but I'd rather be sure here my $input; if( $sequenceformat !~ /fasta/ ) { my @seqs; my $seqio = new Bio::SeqIO('-format' => $sequenceformat, '-file' => $sequencefile ); while( my $seq = $seqio->next_seq() ) { push @seqs, $seq; } $input = \@seqs; } else { $input = $sequencefile; } my $r = $factory->submit_blast($input); my $v = 1; ## set to 0 to turn off Sanity check messages print STDERR "retrieving blasts for $input ...\n" if( $v > 0); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); next unless ($result); print "\nQuery Name: ", $result->query_name(), "\n"; #save the output my $filename = $result->query_name()."\.out"; print STDERR "Saving result to $filename.\n" if( $v > 0); $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp- >score, "\n"; } } } } } I'm guessing either NCBI has changed it blast format and bioperl 1.4 no longer works, or I'm missing something that should be obvious. Help much apprecieated. Tom K Thomas J. Keller, Ph.D. Director, MMI Core Facility Oregon Health & Science University 3181 SW Sam Jackson Park Rd. Portland, OR, USA, 97239 http://www.ohsu.edu/research/core From smarkel at scitegic.com Fri Oct 28 16:53:03 2005 From: smarkel at scitegic.com (Scott Markel) Date: Fri Oct 28 17:04:06 2005 Subject: [Bioperl-l] RemoteBlast problem In-Reply-To: <316D6B2B-809A-4A4B-A8D3-76BF4384458B@ohsu.edu> References: <316D6B2B-809A-4A4B-A8D3-76BF4384458B@ohsu.edu> Message-ID: <43628FAF.2040709@scitegic.com> Tom, I haven't seen your exact problem, but I recently saw something similar. In my case (blastn, NM_021267, Chromosome DB, default parameters), one of the hits has a comment about the sequence features. =============================================== >gi|23618947|ref|NC_004331.1| Download subject sequence spanning the HSP Plasmodium falciparum 3D7 chromosome 13, *** SEQUENCING IN PROGRESS ***, 33 ordered pieces Length=2732359 Features in this part of subject sequence: hypothetical protein Score = 46.1 bits (23), Expect = 0.27 Identities = 23/23 (100%), Gaps = 0/23 (0%) Strand=Plus/Minus Query 830 ACATCCCCTTCTACTTCTTCTTC 852 ||||||||||||||||||||||| Sbjct 1369466 ACATCCCCTTCTACTTCTTCTTC 1369444 =============================================== BioPerl's parser is choking on the extra lines. Scott Thomas J Keller wrote: > Greetings, > I'm using perl 5.8.6 and the fink installation of bioperl 1.4.5 on an > Apple G5 running OS X 10.4.2 > running my script with a simple fasta dna sequence file: > $ bp_remote_blast2.pl -p blastx -d nr -i test.fa > > Here's the error: > > ------------- EXCEPTION ------------- > MSG: no data for midline Query 78 > HRRPSFSACRCVLSASSVFPSRLGNNYITAAGAQVLAEGLRGNTSLQFLG 227 > STACK Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ > SearchIO/blast.pm:1151 > STACK toplevel /Users/kellert/Sandbox/Perlscripts/bp_remote_blast2.pl:90 > > > It looks like Bio::SearchIO::blast > is choking on the result from Bio::Tools::Run::RemoteBlast > > I lifted this from Jason's exampleL bp_remote_blast but modified it to > use SearchIO method instead of BPLite: > > use warnings; > use strict; > use vars qw($USAGE); > > use Bio::Tools::Run::RemoteBlast; > use Bio::SeqIO; > use Bio::SearchIO; > use Getopt::Long; > > $USAGE = "remote_blast.pl [-h] [-p prog] [-d db] [-e expect] [-f > seqformat] -i seqfile\n"; > > my ($prog, $db, $expect ); > > my ($sequencefile,$sequenceformat,$help) = (undef, 'fasta',undef); > > &GetOptions('prog|p=s' => \$prog, > 'db|d=s' => \$db, > 'expect|e=s' => \$expect, > 'input|i=s' => \$sequencefile, > 'format|f=s' => \$sequenceformat, > 'help|h' => \$help, > ); > > if( $help ) { > exec('perldoc', $0); > die; > } > > if( !defined $prog ) { > die($USAGE . "\n\tMust specify a valid program name ([t]blast > [pxn])\n"); > } > if( !defined $db ) { > die($USAGE . "\n\tMust specify a db (e.g. \'nr\') to search\n"); > } > if( !defined $sequencefile ) { > die($USAGE . "\n\tMust specify a path to a sequence file.\n"); > } > > my $factory = new Bio::Tools::Run::RemoteBlast ('-prog' => $prog, > '-data' => $db, > '-expect' => $expect, > 'readmethod' => 'SearchIO', #use > SearchIO to parse > ); > > # submit_blast can only currenly handle fasta format files so I'll > # preprocess outside of the module but I'd rather be sure here > > my $input; > if( $sequenceformat !~ /fasta/ ) { > my @seqs; > my $seqio = new Bio::SeqIO('-format' => $sequenceformat, > '-file' => $sequencefile ); > while( my $seq = $seqio->next_seq() ) { > push @seqs, $seq; > } > $input = \@seqs; > } else { > $input = $sequencefile; > } > my $r = $factory->submit_blast($input); > > my $v = 1; > ## set to 0 to turn off Sanity check messages > print STDERR "retrieving blasts for $input ...\n" if( $v > 0); > > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { > my $result = $rc->next_result(); > next unless ($result); > print "\nQuery Name: ", $result->query_name(), "\n"; > #save the output > my $filename = $result->query_name()."\.out"; > print STDERR "Saving result to $filename.\n" if( $v > 0); > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp- >score, > "\n"; > } > } > } > } > } > > I'm guessing either NCBI has changed it blast format and bioperl 1.4 no > longer works, or I'm missing something that should be obvious. > > Help much apprecieated. > > Tom K > > > Thomas J. Keller, Ph.D. > Director, MMI Core Facility > Oregon Health & Science University > 3181 SW Sam Jackson Park Rd. > Portland, OR, USA, 97239 > > http://www.ohsu.edu/research/core > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel@scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From Marc.Logghe at DEVGEN.com Fri Oct 28 17:09:52 2005 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Fri Oct 28 17:20:43 2005 Subject: [Bioperl-l] RemoteBlast problem Message-ID: <0C528E3670D8CE4B8E013F6749231AA62F56C1@ANTARESIA.be.devgen.com> Hi Tom, What a coincidence. I had exactly the same a few hours ago. Did a few things with conditionals in the script, but in the end, I had to change the size check in B::T::R::RemoteBlast::retrieve_blast() In my case it worked when the size was set to 2000 instead of 1000 ## if proper reply my $size = -s $tempfile; if( $size > 2000 ) { HTH, Marc > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of > Thomas J Keller > Sent: Friday, October 28, 2005 9:55 PM > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] RemoteBlast problem > > Greetings, > I'm using perl 5.8.6 and the fink installation of bioperl > 1.4.5 on an Apple G5 running OS X 10.4.2 running my script > with a simple fasta dna sequence file: > $ bp_remote_blast2.pl -p blastx -d nr -i test.fa > > Here's the error: > > ------------- EXCEPTION ------------- > MSG: no data for midline Query 78 > HRRPSFSACRCVLSASSVFPSRLGNNYITAAGAQVLAEGLRGNTSLQFLG 227 STACK > Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ > SearchIO/blast.pm:1151 > STACK toplevel > /Users/kellert/Sandbox/Perlscripts/bp_remote_blast2.pl:90 > > > It looks like Bio::SearchIO::blast > is choking on the result from Bio::Tools::Run::RemoteBlast > > I lifted this from Jason's exampleL bp_remote_blast but modified it > to use SearchIO method instead of BPLite: > > use warnings; > use strict; > use vars qw($USAGE); > > use Bio::Tools::Run::RemoteBlast; > use Bio::SeqIO; > use Bio::SearchIO; > use Getopt::Long; > > $USAGE = "remote_blast.pl [-h] [-p prog] [-d db] [-e expect] [-f > seqformat] -i seqfile\n"; > > my ($prog, $db, $expect ); > > my ($sequencefile,$sequenceformat,$help) = (undef, 'fasta',undef); > > &GetOptions('prog|p=s' => \$prog, > 'db|d=s' => \$db, > 'expect|e=s' => \$expect, > 'input|i=s' => \$sequencefile, > 'format|f=s' => \$sequenceformat, > 'help|h' => \$help, > ); > > if( $help ) { > exec('perldoc', $0); > die; > } > > if( !defined $prog ) { > die($USAGE . "\n\tMust specify a valid program name ([t]blast > [pxn])\n"); > } > if( !defined $db ) { > die($USAGE . "\n\tMust specify a db (e.g. \'nr\') to search\n"); > } > if( !defined $sequencefile ) { > die($USAGE . "\n\tMust specify a path to a sequence file.\n"); > } > > my $factory = new Bio::Tools::Run::RemoteBlast ('-prog' => $prog, > '-data' => $db, > '-expect' => $expect, > 'readmethod' => 'SearchIO', > #use > SearchIO to parse > ); > > # submit_blast can only currenly handle fasta format files so I'll > # preprocess outside of the module but I'd rather be sure here > > my $input; > if( $sequenceformat !~ /fasta/ ) { > my @seqs; > my $seqio = new Bio::SeqIO('-format' => $sequenceformat, > '-file' => $sequencefile ); > while( my $seq = $seqio->next_seq() ) { > push @seqs, $seq; > } > $input = \@seqs; > } else { > $input = $sequencefile; > } > my $r = $factory->submit_blast($input); > > my $v = 1; > ## set to 0 to turn off Sanity check messages > print STDERR "retrieving blasts for $input ...\n" if( $v > 0); > > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { > my $result = $rc->next_result(); > next unless ($result); > print "\nQuery Name: ", $result->query_name(), "\n"; > #save the output > my $filename = $result->query_name()."\.out"; > print STDERR "Saving result to > $filename.\n" if( $v > > 0); > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp- > >score, "\n"; > } > } > } > } > } > > I'm guessing either NCBI has changed it blast format and bioperl 1.4 > no longer works, or I'm missing something that should be obvious. > > Help much apprecieated. > > Tom K > > > Thomas J. Keller, Ph.D. > Director, MMI Core Facility > Oregon Health & Science University > 3181 SW Sam Jackson Park Rd. > Portland, OR, USA, 97239 > > http://www.ohsu.edu/research/core > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Fri Oct 28 16:22:08 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Oct 28 17:46:15 2005 Subject: [Bioperl-l] RemoteBlast problem In-Reply-To: <316D6B2B-809A-4A4B-A8D3-76BF4384458B@ohsu.edu> References: <316D6B2B-809A-4A4B-A8D3-76BF4384458B@ohsu.edu> Message-ID: <78AC302F-19D6-4D1C-BD3B-D38384EE9532@duke.edu> Fixes to handle both changes in BLAST report format and CGI-server at NCBI. All this and more in bioperl 1.5.1 You can install it locally without messing with your fink (leave 1.4 installed) Just download bioperl 1.5.1 from bioperl - uncompress and move to somewhere ( like $HOME/src/bioperl-1.5.1 or whatever you want) and updated your PERL5LIB variable and prepend this path to it: So add this in your .bash_profile or .bashrc depending on how you are setup export PERL5LIB=$HOME/src/bioperl-1.5.1:$PERL5LIB (after the test -r /sw/bin/init.sh && . /sw/bin/init.sh line) When fink supports 1.5.1 you can just remove this line and you'll be back to normal. -jason On Oct 28, 2005, at 3:55 PM, Thomas J Keller wrote: > Greetings, > I'm using perl 5.8.6 and the fink installation of bioperl 1.4.5 on > an Apple G5 running OS X 10.4.2 > running my script with a simple fasta dna sequence file: > $ bp_remote_blast2.pl -p blastx -d nr -i test.fa > > Here's the error: > > ------------- EXCEPTION ------------- > MSG: no data for midline Query 78 > HRRPSFSACRCVLSASSVFPSRLGNNYITAAGAQVLAEGLRGNTSLQFLG 227 > STACK Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ > SearchIO/blast.pm:1151 > STACK toplevel /Users/kellert/Sandbox/Perlscripts/ > bp_remote_blast2.pl:90 > > > It looks like Bio::SearchIO::blast > is choking on the result from Bio::Tools::Run::RemoteBlast > > I lifted this from Jason's exampleL bp_remote_blast but modified it > to use SearchIO method instead of BPLite: > > use warnings; > use strict; > use vars qw($USAGE); > > use Bio::Tools::Run::RemoteBlast; > use Bio::SeqIO; > use Bio::SearchIO; > use Getopt::Long; > > $USAGE = "remote_blast.pl [-h] [-p prog] [-d db] [-e expect] [-f > seqformat] -i seqfile\n"; > > my ($prog, $db, $expect ); > > my ($sequencefile,$sequenceformat,$help) = (undef, 'fasta',undef); > > &GetOptions('prog|p=s' => \$prog, > 'db|d=s' => \$db, > 'expect|e=s' => \$expect, > 'input|i=s' => \$sequencefile, > 'format|f=s' => \$sequenceformat, > 'help|h' => \$help, > ); > > if( $help ) { > exec('perldoc', $0); > die; > } > > if( !defined $prog ) { > die($USAGE . "\n\tMust specify a valid program name ([t]blast > [pxn])\n"); > } > if( !defined $db ) { > die($USAGE . "\n\tMust specify a db (e.g. \'nr\') to search\n"); > } > if( !defined $sequencefile ) { > die($USAGE . "\n\tMust specify a path to a sequence file.\n"); > } > > my $factory = new Bio::Tools::Run::RemoteBlast ('-prog' => $prog, > '-data' => $db, > '-expect' => $expect, > 'readmethod' => 'SearchIO', > #use SearchIO to parse > ); > > # submit_blast can only currenly handle fasta format files so I'll > # preprocess outside of the module but I'd rather be sure here > > my $input; > if( $sequenceformat !~ /fasta/ ) { > my @seqs; > my $seqio = new Bio::SeqIO('-format' => $sequenceformat, > '-file' => $sequencefile ); > while( my $seq = $seqio->next_seq() ) { > push @seqs, $seq; > } > $input = \@seqs; > } else { > $input = $sequencefile; > } > my $r = $factory->submit_blast($input); > > my $v = 1; > ## set to 0 to turn off Sanity check messages > print STDERR "retrieving blasts for $input ...\n" if( $v > 0); > > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { > my $result = $rc->next_result(); > next unless ($result); > print "\nQuery Name: ", $result->query_name(), "\n"; > #save the output > my $filename = $result->query_name()."\.out"; > print STDERR "Saving result to $filename.\n" if > ( $v > 0); > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp- > >score, "\n"; > } > } > } > } > } > > I'm guessing either NCBI has changed it blast format and bioperl > 1.4 no longer works, or I'm missing something that should be obvious. > > Help much apprecieated. > > Tom K > > > Thomas J. Keller, Ph.D. > Director, MMI Core Facility > Oregon Health & Science University > 3181 SW Sam Jackson Park Rd. > Portland, OR, USA, 97239 > > http://www.ohsu.edu/research/core > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From kellert at ohsu.edu Fri Oct 28 20:47:54 2005 From: kellert at ohsu.edu (Thomas J Keller) Date: Fri Oct 28 20:47:16 2005 Subject: [Bioperl-l] RemoteBlast problem In-Reply-To: <78AC302F-19D6-4D1C-BD3B-D38384EE9532@duke.edu> References: <316D6B2B-809A-4A4B-A8D3-76BF4384458B@ohsu.edu> <78AC302F-19D6-4D1C-BD3B-D38384EE9532@duke.edu> Message-ID: <083E97CF-B38D-4483-9CB8-AD11C54E9D29@ohsu.edu> Jason 1.5.1 rocks!! .. and solved my particular problem Thanks for the responses. Tom Keller, Ph.D. http://www.ohsu.edu/research/core kellert@ohsu.edu 503-494-2442 On Oct 28, 2005, at 1:22 PM, Jason Stajich wrote: > Fixes to handle both changes in BLAST report format and CGI-server > at NCBI. All this and more in bioperl 1.5.1 > > You can install it locally without messing with your fink (leave > 1.4 installed) > > Just download bioperl 1.5.1 from bioperl - uncompress and move to > somewhere ( like $HOME/src/bioperl-1.5.1 or whatever you want) and > updated your PERL5LIB variable and prepend this path to it: > > So add this in your .bash_profile or .bashrc depending on how you > are setup > > export PERL5LIB=$HOME/src/bioperl-1.5.1:$PERL5LIB > > (after the test -r /sw/bin/init.sh && . /sw/bin/init.sh line) > > When fink supports 1.5.1 you can just remove this line and you'll > be back to normal. > > -jason > > On Oct 28, 2005, at 3:55 PM, Thomas J Keller wrote: > > >> Greetings, >> I'm using perl 5.8.6 and the fink installation of bioperl 1.4.5 on >> an Apple G5 running OS X 10.4.2 >> running my script with a simple fasta dna sequence file: >> $ bp_remote_blast2.pl -p blastx -d nr -i test.fa >> >> Here's the error: >> >> ------------- EXCEPTION ------------- >> MSG: no data for midline Query 78 >> HRRPSFSACRCVLSASSVFPSRLGNNYITAAGAQVLAEGLRGNTSLQFLG 227 >> STACK Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ >> SearchIO/blast.pm:1151 >> STACK toplevel /Users/kellert/Sandbox/Perlscripts/ >> bp_remote_blast2.pl:90 >> >> >> It looks like Bio::SearchIO::blast >> is choking on the result from Bio::Tools::Run::RemoteBlast >> >> I lifted this from Jason's exampleL bp_remote_blast but modified >> it to use SearchIO method instead of BPLite: >> >> use warnings; >> use strict; >> use vars qw($USAGE); >> >> use Bio::Tools::Run::RemoteBlast; >> use Bio::SeqIO; >> use Bio::SearchIO; >> use Getopt::Long; >> >> $USAGE = "remote_blast.pl [-h] [-p prog] [-d db] [-e expect] [-f >> seqformat] -i seqfile\n"; >> >> my ($prog, $db, $expect ); >> >> my ($sequencefile,$sequenceformat,$help) = (undef, 'fasta',undef); >> >> &GetOptions('prog|p=s' => \$prog, >> 'db|d=s' => \$db, >> 'expect|e=s' => \$expect, >> 'input|i=s' => \$sequencefile, >> 'format|f=s' => \$sequenceformat, >> 'help|h' => \$help, >> ); >> >> if( $help ) { >> exec('perldoc', $0); >> die; >> } >> >> if( !defined $prog ) { >> die($USAGE . "\n\tMust specify a valid program name ([t]blast >> [pxn])\n"); >> } >> if( !defined $db ) { >> die($USAGE . "\n\tMust specify a db (e.g. \'nr\') to search\n"); >> } >> if( !defined $sequencefile ) { >> die($USAGE . "\n\tMust specify a path to a sequence file.\n"); >> } >> >> my $factory = new Bio::Tools::Run::RemoteBlast ('-prog' => >> $prog, >> '-data' => $db, >> '-expect' => $expect, >> 'readmethod' => 'SearchIO', >> #use SearchIO to parse >> ); >> >> # submit_blast can only currenly handle fasta format files so I'll >> # preprocess outside of the module but I'd rather be sure here >> >> my $input; >> if( $sequenceformat !~ /fasta/ ) { >> my @seqs; >> my $seqio = new Bio::SeqIO('-format' => $sequenceformat, >> '-file' => $sequencefile ); >> while( my $seq = $seqio->next_seq() ) { >> push @seqs, $seq; >> } >> $input = \@seqs; >> } else { >> $input = $sequencefile; >> } >> my $r = $factory->submit_blast($input); >> >> my $v = 1; >> ## set to 0 to turn off Sanity check messages >> print STDERR "retrieving blasts for $input ...\n" if( $v > 0); >> >> while ( my @rids = $factory->each_rid ) { >> foreach my $rid ( @rids ) { >> my $rc = $factory->retrieve_blast($rid); >> if( !ref($rc) ) { >> if( $rc < 0 ) { >> $factory->remove_rid($rid); >> } >> print STDERR "." if ( $v > 0 ); >> sleep 5; >> } else { >> my $result = $rc->next_result(); >> next unless ($result); >> print "\nQuery Name: ", $result->query_name(), "\n"; >> #save the output >> my $filename = $result->query_name()."\.out"; >> print STDERR "Saving result to $filename.\n" if >> ( $v > 0); >> $factory->save_output($filename); >> $factory->remove_rid($rid); >> print "\nQuery Name: ", $result->query_name(), "\n"; >> while ( my $hit = $result->next_hit ) { >> next unless ( $v > 0); >> print "\thit name is ", $hit->name, "\n"; >> while( my $hsp = $hit->next_hsp ) { >> print "\t\tscore is ", $hsp- >> >score, "\n"; >> } >> } >> } >> } >> } >> >> I'm guessing either NCBI has changed it blast format and bioperl >> 1.4 no longer works, or I'm missing something that should be obvious. >> >> Help much apprecieated. >> >> Tom K >> >> >> Thomas J. Keller, Ph.D. >> Director, MMI Core Facility >> Oregon Health & Science University >> 3181 SW Sam Jackson Park Rd. >> Portland, OR, USA, 97239 >> >> http://www.ohsu.edu/research/core >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From bli1 at bcm.tmc.edu Sun Oct 30 20:35:52 2005 From: bli1 at bcm.tmc.edu (Bingshan Li) Date: Tue Nov 1 16:35:33 2005 Subject: [Bioperl-l] Genbank parser Message-ID: <865D7AB5-B2F4-4555-9365-62B881F9F8ED@bcm.tmc.edu> Hi All, I am new to Bioperl and recently found it has a lot of useful modules. I wrote a perl script to extract all protein sequences and corresponding coding sequences from Genbank files. But I found some bugs and it's hard to make sue my script is bug free after several revision. I am wondering if there are some modules and specific functions to use to fulfill my requirement. Does anybody have similar code to share with? Most bugs come from multiple segments of coding sequences and some cds sequences are on the minus strand. Thanks a lot! From bli1 at bcm.tmc.edu Mon Oct 31 17:54:13 2005 From: bli1 at bcm.tmc.edu (Bingshan Li) Date: Tue Nov 1 16:48:16 2005 Subject: [Bioperl-l] Bio::Align::DNAStatistics ka/ks calculation bugs? Message-ID: <732BF5F9-3ADE-4BEF-9B14-E0C3369B47B8@bcm.tmc.edu> Hi all, I used Bio::Align::DNAStatistics module to calculate ka/ks ratio but I found the result is very different from the output by yn00 in PAML 3.14. I applied "calc_KaKs_pair" function on two sequences to test it. The difference is pretty big (for one case it gave ka/ks=1.35 while yn00 gave 0.37). Does anybody have same experience? Is there any other methods I can use to calculate ka/ks ratios? I will also use it to calculate ka/ks in sliding windows. Thanks a lot! From cy200 at gen.cam.ac.uk Mon Oct 31 09:19:26 2005 From: cy200 at gen.cam.ac.uk (Chihiro Yamada) Date: Tue Nov 1 21:25:33 2005 Subject: [Bioperl-l] noob Bio::DB::Biblio problem Message-ID: <1130768366.12915.37.camel@leona.gen.cam.ac.uk> Hiya folks, I'm trying to use the Bio::DB::Biblio::biofetch module to get references by ID, but I'm having a problem... If I run this code... use Bio::Biblio; my $biblio = new Bio::Biblio (-access => 'biofetch'); my $ref = $biblio->get_by_id('15519282'); I get the error message: ------------- EXCEPTION ------------- MSG: retrieval type pipeline unsupported STACK Bio::DB::Biblio::biofetch::get_seq_stream /opt/csw/share/perl/site_perl/Bio/DB/Biblio/biofetch.pm:237 STACK Bio::DB::DBFetch::get_Stream_by_id /opt/csw/share/perl/site_perl/Bio/DB/DBFetch.pm:194 STACK Bio::DB::Biblio::biofetch::get_by_id /opt/csw/share/perl/site_perl/Bio/DB/Biblio/biofetch.pm:153 STACK toplevel bbt4.pl:7 -------------------------------------- I've tried this on 1.3 (on Solaris) and 1.5.1 (using OS X) and get the same error message. Can anyone tell me what I'm doing wrong? Thanks Chihiro -- ---------------------------------------------------------------------- Chihiro Yamada. FlyBase (Cambridge), http://www.FlyBase.org/ Department of Genetics, University of Cambridge, email: c.yamada@gen.cam.ac.uk Downing Street, Tel : 01223-333963 Cambridge, CB2 3EH, FAX : 01223-333992 United Kingdom. Memes don't exist. Spread the Word. ---------------------------------------------------------------------- From jdw at ou.edu Mon Oct 31 14:55:48 2005 From: jdw at ou.edu (James D. White) Date: Tue Nov 1 23:54:31 2005 Subject: [Bioperl-l] perl memory problem References: <200510282122.j9SLMZnq022732@portal.open-bio.org> <436651B8.2060809@ou.edu> Message-ID: <436676C3.B7E7FFD1@ou.edu> Sorry about the repost. I forgot to fix the Subject line the first time. Jim White "James D. White" wrote: > Compiling a 64-bit version of Perl on Solaris is easy. The hard part is > getting 64-bit versions of all > of the prerequisites built properly, but it can be done. Check out > for some > notes I wrote about > building a 64-bit version of Perl 5.8.5 under Solaris 9 with gcc. Once > you get the 64-bit C libraries > and the 64-bit non-pure Perl modules built, the main Bioperl installs > easily. > > Jim White > > >Date: Thu, 27 Oct 2005 21:11:59 +0200 > >From: J?r?my JUST > >Subject: Re: [Bioperl-l] perl memory problem > >To: bioperl-l@portal.open-bio.org > >Message-ID: <20051027211159.000025ac@pearson.infobiogen.fr> > >Content-Type: text/plain; charset=ISO-8859-15 > > > >On Wed, 26 Oct 2005 11:55:23 -0400 > >Zaigang Liu wrote: > > > > > > > >>> The process always stops and gives a "out of memory" error when the > >>> memory usage reached about 2G. I used a solaris machine with 40G memory > >>> so it is not really run out of memory when the program stopped. > >> > >> > > > > Are you using a 32 bits Perl binary? To check, use: > > > >$ file `which perl` > > > > > > If you get something like: > > > >/usr/bin/perl: ELF 32-bit MSB executable SPARC Version 1, dynamically linked, stripped > > > >then you should compile another binary, in 64 bits. > > > > > > -- J?r?my JUST > > ------------------------------ > > -- James D. White (jdw@ou.edu) Director of Bioinformatics Department of Chemistry and Biochemistry/ACGT University of Oklahoma 101 David L. Boren Blvd., SRTC 2100 Norman, OK 73019 Phone: (405) 325-4912, FAX: (405) 325-7762 From jdw at ou.edu Mon Oct 31 12:17:44 2005 From: jdw at ou.edu (James D. White) Date: Wed Nov 2 05:18:32 2005 Subject: [Bioperl-l] Re: Bioperl-l Digest, Vol 30, Issue 11 In-Reply-To: <200510282122.j9SLMZnq022732@portal.open-bio.org> References: <200510282122.j9SLMZnq022732@portal.open-bio.org> Message-ID: <436651B8.2060809@ou.edu> Compiling a 64-bit version of Perl on Solaris is easy. The hard part is getting 64-bit versions of all of the prerequisites built properly, but it can be done. Check out for some notes I wrote about building a 64-bit version of Perl 5.8.5 under Solaris 9 with gcc. Once you get the 64-bit C libraries and the 64-bit non-pure Perl modules built, the main Bioperl installs easily. Jim White >Date: Thu, 27 Oct 2005 21:11:59 +0200 >From: J?r?my JUST >Subject: Re: [Bioperl-l] perl memory problem >To: bioperl-l@portal.open-bio.org >Message-ID: <20051027211159.000025ac@pearson.infobiogen.fr> >Content-Type: text/plain; charset=ISO-8859-15 > >On Wed, 26 Oct 2005 11:55:23 -0400 >Zaigang Liu wrote: > > > >>> The process always stops and gives a "out of memory" error when the >>> memory usage reached about 2G. I used a solaris machine with 40G memory >>> so it is not really run out of memory when the program stopped. >> >> > > Are you using a 32 bits Perl binary? To check, use: > >$ file `which perl` > > > If you get something like: > >/usr/bin/perl: ELF 32-bit MSB executable SPARC Version 1, dynamically linked, stripped > >then you should compile another binary, in 64 bits. > > > -- J?r?my JUST > ------------------------------ > From avilella at gmail.com Mon Oct 31 02:19:02 2005 From: avilella at gmail.com (Albert Vilella) Date: Wed Nov 2 18:18:20 2005 Subject: [Bioperl-l] PAML wrappers Message-ID: <1130743142.8199.17.camel@localhost.localdomain> Hi Sergios, (I forward your email to the bioperl mailing list) > i noticed there are wrappers for the ctl's of PAML's codeml. do u > have for baseml, too? do u have them tested on OSX? can u email them > to me, pls? zip them, preferably (no compression). > The wrappers you mention are part of bioperl-run. They should work under OSX having bioperl installed: Download http://bioperl.org/DIST/current_core_unstable.zip Download http://bioperl.org/DIST/current_run_unstable.zip unzip current_run_unstable.zip unzip current_core_unstable.zip export PERL5LIB="$HOME/bioperl-1.5.1:$HOME/bioperl-run-1.5.1" tar zxf paml3.14b.OSX_G5.tar.gz export PAMLDIR="$HOME/paml3.14/src" cd $PAMLDIR make ----- I believe that the codeml wrapper is the most tested and used, but the baseml wrapper has a problem right now with the input files: PAML's baseml is pickier than a hungry three year old in an expensive French bistro about the format of the sequence file and the tree file, so the example in the synopsis for bioperl-run-1.5.1/Bio/Tools/Run/Phylo/PAML/Baseml.pm is not working right now (at least for me). You will find more information on how to run PAML wrappers in the documentation. For example, in this howto: http://bioperl.org/HOWTOs/html/PAML.html Cheers, Albert. From anst at kvl.dk Fri Oct 28 10:33:58 2005 From: anst at kvl.dk (Anders Stegmann) Date: Thu Nov 3 14:25:30 2005 Subject: [Bioperl-l] install binary bioperl-1.4 on windows Message-ID: Hi BioPerl! I have downloaded the binary windows BioPerl-1.4 version unzipped it and run perl Makefile.Pl It tells me everything looks good and I should be able to use it after typing: make install but I get: 'make' is not recognized as an internal or external command, operable program of batch file I guess I can't use make, but what then? I have tried a lot of different commands with the same result