From cjfields1 at gmail.com Fri Jul 1 21:29:51 2011 From: cjfields1 at gmail.com (Christopher Fields) Date: Fri, 1 Jul 2011 20:29:51 -0500 Subject: [Bioperl-l] New perl CUDA bindings Message-ID: All, I know there are a number of devs who might be running tasks on GPUs these days, just wanted to point out that there are now CUDA bindings in Perl available from David Mertens (I'll likely be using them locally :) These are not present in CPAN yet, but they are on github: https://github.com/run4flat/perl-CUDA-Minimal Also some demos, latest one here: http://blogs.perl.org/users/david_mertens/2011/07/cudaminimal-and-error-handling.html chris From carandraug+dev at gmail.com Fri Jul 1 22:36:12 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Sat, 2 Jul 2011 03:36:12 +0100 Subject: [Bioperl-l] How (from where) to retrieve FieldInfo objects? In-Reply-To: References: <58D258F0-7A80-4CEC-ACF6-99110AA623A4@verizon.net> <18DF7D20DFEC044098A1062202F5FFF3396074D238@exchsth.agresearch.co.nz> Message-ID: 2011/6/30 Carn? Draug : > On 29 June 2011 23:30, Smithies, Russell > wrote: >> How about just returning ASN.1 then parsing that? >> There's far more data in that format than any of the others. >> >> my $factory = Bio::DB::EUtilities->new(-eutil ? ? ?=> 'esearch', >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -term ? ? ? => 'h2afx[sym] AND human[organism]', >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -db ? ? ? ? => 'gene', >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -usehistory => 'y'); >> >> >> my $hist ?= $factory->next_History || die "No history data returned"; >> >> $factory->set_parameters(-eutil ? => 'efetch',-history => $hist); >> >> print Dumper $factory->get_Response; > > When I do this, I get a XML with the ASN.1 inside the tag pre. Is is > supposed to be this way? Should I extract it myself? Shouldn't the > method do this? It's nice that I can get so many information but > wouldn't it be lighter on the NCBI server if I could ask only for the > info that I need rather than the whole record? After much work, I've done this and as such I'm sharing back the code in case someone comes across it. Basically, get_Response returns a HTML::Message object. Since I couldn't find a method to get it pretty, I used HTML::Parser to do it. It seems that the ASN.1/entrezgene are all inside the
 tag. Also, if there's more than one gene, all
genes are inside the same 
 tag. Here's the code I used.

use Bio::DB::EUtilities;
use HTML::Parser;

my @ids = qw(9555 3014);
my $factory = Bio::DB::EUtilities->new(
                                      -eutil   => 'efetch',
                                      -db      => 'gene',
                                      -id      => \@ids,
                                      -retmode => 'asn1',
                                      );
my $html = $factory->get_Response->content;

my $parser = HTML::Parser->new(
                                api_version => 3,
                                start_h     => [\&handle_start],
                                end_h       => [\&handle_end],
                                text_h      => [\&handle_text, 'dtext'],
                                report_tags => qw(pre),
                              );
my $seq;
{
  my $inside_tag = 0;
  sub handle_start {
    $inside_tag = 1;
  }
  sub handle_text {
    $seq = $_[0] if $inside_tag;
    return 4;
  }
  sub handle_end {
    $inside_tag = 0;
  }
}
$parser->parse($html);

After running parse, $seq holds a sequence file that can be opened
with Bio::SeqIO or written to disk.

Carn?


From cjfields at illinois.edu  Sat Jul  2 09:22:52 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 2 Jul 2011 08:22:52 -0500
Subject: [Bioperl-l] How (from where) to retrieve FieldInfo objects?
In-Reply-To: 
References: 
	<58D258F0-7A80-4CEC-ACF6-99110AA623A4@verizon.net>
	
	<18DF7D20DFEC044098A1062202F5FFF3396074D238@exchsth.agresearch.co.nz>
	
	
Message-ID: <2D2B0BB3-5BDE-43A9-BA13-28843528FAC5@illinois.edu>

Just a few notes on this thread (just got back from vacation):

On Jul 1, 2011, at 9:36 PM, Carn? Draug wrote:

> 2011/6/30 Carn? Draug :
>> On 29 June 2011 23:30, Smithies, Russell
>>  wrote:
>>> How about just returning ASN.1 then parsing that?
>>> There's far more data in that format than any of the others.
>>> 
>>> my $factory = Bio::DB::EUtilities->new(-eutil      => 'esearch',
>>>                                       -term       => 'h2afx[sym] AND human[organism]',
>>>                                       -db         => 'gene',
>>>                                                   -usehistory => 'y');
>>> 
>>> 
>>> my $hist  = $factory->next_History || die "No history data returned";
>>> 
>>> $factory->set_parameters(-eutil   => 'efetch',-history => $hist);
>>> 
>>> print Dumper $factory->get_Response;
>> 
>> When I do this, I get a XML with the ASN.1 inside the tag pre. Is is
>> supposed to be this way? Should I extract it myself? Shouldn't the
>> method do this? It's nice that I can get so many information but
>> wouldn't it be lighter on the NCBI server if I could ask only for the
>> info that I need rather than the whole record?

No, Bio::DB::EUtilities was intentionally designed only to process XML data related specifically to eutil operations, and decouple it from any other format (specifically those from efetch) as there are just too many.  It does not parse any end-point XML/text/ASN.1 like GenBank, Gene XML, ASN.1, etc.  Those should be handled by outside parsers.

There is a Gene ASN.1 parser, by the way: Bio::SeqIO::entrezgene.

> After much work, I've done this and as such I'm sharing back the code
> in case someone comes across it. Basically, get_Response returns a
> HTML::Message object. Since I couldn't find a method to get it pretty,
> I used HTML::Parser to do it. It seems that the ASN.1/entrezgene are
> all inside the 
 tag. Also, if there's more than one gene, all
> genes are inside the same 
 tag. Here's the code I used.

More specifically, get_Response returns an HTTP::Response object, hence the name of the method.  The base class is HTTP::Message, not HTML::Message (very important difference, the message doesn't have to be HTML but can be XML, HTML, plain text, etc).

> use Bio::DB::EUtilities;
> use HTML::Parser;
> 
> my @ids = qw(9555 3014);
> my $factory = Bio::DB::EUtilities->new(
>                                      -eutil   => 'efetch',
>                                      -db      => 'gene',
>                                      -id      => \@ids,
>                                      -retmode => 'asn1',
>                                      );
> my $html = $factory->get_Response->content;
> 
> my $parser = HTML::Parser->new(
>                                api_version => 3,
>                                start_h     => [\&handle_start],
>                                end_h       => [\&handle_end],
>                                text_h      => [\&handle_text, 'dtext'],
>                                report_tags => qw(pre),
>                              );
> my $seq;
> {
>  my $inside_tag = 0;
>  sub handle_start {
>    $inside_tag = 1;
>  }
>  sub handle_text {
>    $seq = $_[0] if $inside_tag;
>    return 4;
>  }
>  sub handle_end {
>    $inside_tag = 0;
>  }
> }
> $parser->parse($html);
> 
> After running parse, $seq holds a sequence file that can be opened
> with Bio::SeqIO or written to disk.
> 
> Carn?

The interaction to eutils is supposed to be fairly low-level (in fact, there is a long-overdue refactoring of the internals that simplifies this somewhat, just haven't had time to work on it).  In general, the rettype and retmode params should *both* be set just to be on the safe side.  

For instance, to get plain text ASN.1 (not ASN.1 with HTML tags) use '-rettype' => 'asn1' and '-retmode' => 'text'.  Yeah, one would think it should be easier than that but NCBI has it set up this way.  See the following link for more:

http://www.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html

chris



From Daniel.Lang at biologie.uni-freiburg.de  Sun Jul  3 05:48:22 2011
From: Daniel.Lang at biologie.uni-freiburg.de (Daniel Lang)
Date: Sun, 03 Jul 2011 11:48:22 +0200
Subject: [Bioperl-l] scores in Bio::DB::BigBed
Message-ID: <4E103AE6.2030003@biologie.uni-freiburg.de>

Hi,

quick question about the BigBed adaptor: Is it correct that the bin and
summary functions only return statistics about the number of features in
the defined intervals?
I was expecting them to deliver statistics about the score if the
respective bb file has a defined score field.
If this is true, does this also mean that I cannot plot the distribution
of scores in BigBed files in gbrowse?

This is the first time I'm using BigBed, maybe I'm doing something wrong...

I had some trouble formatting the bed files correctly in order to see
the score in the features returned by the Bio::DB::BigBed::features()
routine. It seems the bigbed entries will only have a correctly assigned
score field if you also provide a non-empty name field. Initially I
thought that the order of columns is irrelevant if you use an .as file
in the bedToBigBed call, but that doesn't seem to be the case.

Best,
Daniel
-- 

Dr. Daniel Lang
University of Freiburg, Plant Biotechnology
Schaenzlestr. 1, D-79104 Freiburg
fax:        +49 761 203 6945
phone:      +49 761 203 6989
homepage:   http://www.plant-biotech.net/
            http://www.cosmoss.org/
e-mail:     daniel.lang at biologie.uni-freiburg.de

#################################################
My software never has bugs.
It just develops random features.
#################################################




From Russell.Smithies at agresearch.co.nz  Sun Jul  3 17:40:00 2011
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 4 Jul 2011 09:40:00 +1200
Subject: [Bioperl-l] New perl CUDA bindings
In-Reply-To: 
References: 
Message-ID: <18DF7D20DFEC044098A1062202F5FFF3396074D258@exchsth.agresearch.co.nz>

Bit of a coincidence, I'm currently looking at GPU stuff as we're putting in a load of them as a new compute cluster. All HP SL390-G7s so if anyone has any experience working with these, please get in touch as I think I'll need a few pointers!

For anyone looking for a good implementation of blast that runs on GPUs, I'd recommend this one: http://eudoxus.cheme.cmu.edu/gpublast/gpublast.html
Compiles easily and runs as it should.
In the past I've tried many of the applications from the "Tesla Bio Workbench" http://www.nvidia.com/object/tesla_bio_workbench.html  and had limited success getting anything to even compile.

--Russell


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Christopher Fields
> Sent: Saturday, 2 July 2011 1:30 p.m.
> To: BioPerl List
> Subject: [Bioperl-l] New perl CUDA bindings
> 
> All,
> 
> I know there are a number of devs who might be running tasks on GPUs
> these days, just wanted to point out that there are now CUDA bindings
> in Perl available from David Mertens (I'll likely be using them locally
> :)
> 
> These are not present in CPAN yet, but they are on github:
> 
> https://github.com/run4flat/perl-CUDA-Minimal
> 
> Also some demos, latest one here:
> 
> http://blogs.perl.org/users/david_mertens/2011/07/cudaminimal-and-
> error-handling.html
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From sm468 at exeter.ac.uk  Fri Jul  1 09:34:54 2011
From: sm468 at exeter.ac.uk (Suhaib Mohammed)
Date: Fri, 01 Jul 2011 22:34:54 +0900
Subject: [Bioperl-l] Bioperl script execution
Message-ID: <1309527295.2336.21.camel@holeybox>

Dear Sir/Madam,
I am a Phd student who employing Bioperl for comparative genomics. I
have downloaded Bioperl 1.6 in ubnuntu 11.04 using synaptic manager. All
the Biperl scripts are in usr/bin folder. I have made a folder Bioperl
under home directory. And placed simple script of bioperl called
perl4-1.pl. When I execute with this command

perl perl4-1.pl 

The following error is flagged.

Can't locate Bio/.pm in @INC (@INC
contains: /etc/perl /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl .) at perl4-1.pl line 3.
BEGIN failed--compilation aborted at perl4-1.pl line 3.

Appreciate if you any one could help me debug this issue.

Thanks
Suhaib
-------------- next part --------------
A non-text attachment was scrubbed...
Name: perl4-1.pl
Type: application/x-perl
Size: 153 bytes
Desc: not available
URL: 

From Timothy.Parnell at hci.utah.edu  Sun Jul  3 16:10:25 2011
From: Timothy.Parnell at hci.utah.edu (Timothy Parnell)
Date: Sun, 3 Jul 2011 14:10:25 -0600
Subject: [Bioperl-l] [Gmod-gbrowse] scores in Bio::DB::BigBed
In-Reply-To: <4E103AE6.2030003@biologie.uni-freiburg.de>
Message-ID: 

Hi Daniel,

You are correct about the bin and summary function of the BigBed adaptor
working only with the number of features and not the individual scores.

There is a workaround, albeit not as efficient as the statistical method.
In the conf stanza, you'll need to use the region feature, and then use
the older xyplot glyph. This glyph will iterate through all the bed
features, calling the score method on each, and then draw an xyplot with
those collected scores. Be sure to set the group_on function to tie them
all into one graph. Here is an example.

[bigbed_score_graph]
database     = bigbed_db
feature      = region
glyph        = xyplot
graph_type   = line
group_on     = type

As for the BED format, per the format definition from UCSC, the first
three columns (chromosome, start, stop) are required, and any additional
higher number columns must have the lower columns filled. So to include a
score (5th column), you need to also fill the name (4th) column.

If your features don't have names, then I would recommend using the BigWig
format instead. You can load a bedgraph file (chromosome, start, stop,
score) into a BigWig database. You'll also have access to the fast summary
statistical functions that work on the scores.

Hope that helps.
Tim



On 7/3/11 3:48 AM, "Daniel Lang" 
wrote:

>Hi,
>
>quick question about the BigBed adaptor: Is it correct that the bin and
>summary functions only return statistics about the number of features in
>the defined intervals?
>I was expecting them to deliver statistics about the score if the
>respective bb file has a defined score field.
>If this is true, does this also mean that I cannot plot the distribution
>of scores in BigBed files in gbrowse?
>
>This is the first time I'm using BigBed, maybe I'm doing something
>wrong...
>
>I had some trouble formatting the bed files correctly in order to see
>the score in the features returned by the Bio::DB::BigBed::features()
>routine. It seems the bigbed entries will only have a correctly assigned
>score field if you also provide a non-empty name field. Initially I
>thought that the order of columns is irrelevant if you use an .as file
>in the bedToBigBed call, but that doesn't seem to be the case.
>
>Best,
>Daniel
>-- 
>
>Dr. Daniel Lang
>University of Freiburg, Plant Biotechnology
>Schaenzlestr. 1, D-79104 Freiburg
>fax:        +49 761 203 6945
>phone:      +49 761 203 6989
>homepage:   http://www.plant-biotech.net/
>            http://www.cosmoss.org/
>e-mail:     daniel.lang at biologie.uni-freiburg.de
>
>#################################################
>My software never has bugs.
>It just develops random features.
>#################################################
>
>
>
>
>--------------------------------------------------------------------------
>----
>All of the data generated in your IT infrastructure is seriously valuable.
>Why? It contains a definitive record of application performance, security
>threats, fraudulent activity, and more. Splunk takes this data and makes
>sense of it. IT sense. And common sense.
>http://p.sf.net/sfu/splunk-d2d-c2
>_______________________________________________
>Gmod-gbrowse mailing list
>Gmod-gbrowse at lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse



From carandraug+dev at gmail.com  Sun Jul  3 19:32:17 2011
From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=)
Date: Mon, 4 Jul 2011 00:32:17 +0100
Subject: [Bioperl-l] Bioperl script execution
In-Reply-To: <1309527295.2336.21.camel@holeybox>
References: <1309527295.2336.21.camel@holeybox>
Message-ID: 

Hi Suhaib,

your code is wrong. You're not loading any module. 2 reasons:

1 - there's no module Bio:: Perl
2 - even if there was, you shouldn't have a space between Perl and the
double colon

I see many errors on your code such as a space between $ and the
variable name. I'd recommend you learn a bit of Perl first. Learning
Perl by O'reilly is very good if you have no programming background
(but it doesn't cover objects oriented programming which BioPerl
uses). If you already know how to program, just not in Perl, I
recommend Modern Perl which is free (as in freedom)
http://onyxneon.com/books/modern_perl/index.html

There's also a small introduction to Perl and BIoPerl on the wiki
http://www.bioperl.org/wiki/HOWTO:Beginners

Carn?


From carandraug+dev at gmail.com  Sun Jul  3 19:37:30 2011
From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=)
Date: Mon, 4 Jul 2011 00:37:30 +0100
Subject: [Bioperl-l] Bioperl script execution
In-Reply-To: 
References: <1309527295.2336.21.camel@holeybox>
	
Message-ID: 

I take back what I said. Turns out that there is a Bio::Perl module
for people who don't know objects. In that case, your problem is the
missing space between the double colon and Perl and the spaces between
$ and your vaiable name.

Carn?


From Daniel.Lang at biologie.uni-freiburg.de  Mon Jul  4 01:51:53 2011
From: Daniel.Lang at biologie.uni-freiburg.de (Daniel Lang)
Date: Mon, 04 Jul 2011 07:51:53 +0200
Subject: [Bioperl-l] [Gmod-gbrowse] scores in Bio::DB::BigBed
In-Reply-To: 
References: 
Message-ID: <4E1154F9.2000706@biologie.uni-freiburg.de>

Hi Timothy,

thanks for your immediate and thorough response! That helps me a lot!
I'll try it out directly.

I didn't correctly connect the dots with BigWig vs BigBed. So if I don't
want to identify individual features, I should better use that...

Best,
Daniel


Am 03.07.2011 22:10, schrieb Timothy Parnell:
> Hi Daniel,
> 
> You are correct about the bin and summary function of the BigBed adaptor
> working only with the number of features and not the individual scores.
> 
> There is a workaround, albeit not as efficient as the statistical method.
> In the conf stanza, you'll need to use the region feature, and then use
> the older xyplot glyph. This glyph will iterate through all the bed
> features, calling the score method on each, and then draw an xyplot with
> those collected scores. Be sure to set the group_on function to tie them
> all into one graph. Here is an example.
> 
> [bigbed_score_graph]
> database     = bigbed_db
> feature      = region
> glyph        = xyplot
> graph_type   = line
> group_on     = type
> 
> As for the BED format, per the format definition from UCSC, the first
> three columns (chromosome, start, stop) are required, and any additional
> higher number columns must have the lower columns filled. So to include a
> score (5th column), you need to also fill the name (4th) column.
> 
> If your features don't have names, then I would recommend using the BigWig
> format instead. You can load a bedgraph file (chromosome, start, stop,
> score) into a BigWig database. You'll also have access to the fast summary
> statistical functions that work on the scores.
> 
> Hope that helps.
> Tim
> 
> 
> 
> On 7/3/11 3:48 AM, "Daniel Lang" 
> wrote:
> 
>> Hi,
>>
>> quick question about the BigBed adaptor: Is it correct that the bin and
>> summary functions only return statistics about the number of features in
>> the defined intervals?
>> I was expecting them to deliver statistics about the score if the
>> respective bb file has a defined score field.
>> If this is true, does this also mean that I cannot plot the distribution
>> of scores in BigBed files in gbrowse?
>>
>> This is the first time I'm using BigBed, maybe I'm doing something
>> wrong...
>>
>> I had some trouble formatting the bed files correctly in order to see
>> the score in the features returned by the Bio::DB::BigBed::features()
>> routine. It seems the bigbed entries will only have a correctly assigned
>> score field if you also provide a non-empty name field. Initially I
>> thought that the order of columns is irrelevant if you use an .as file
>> in the bedToBigBed call, but that doesn't seem to be the case.
>>
>> Best,
>> Daniel
>> -- 
>>
>> Dr. Daniel Lang
>> University of Freiburg, Plant Biotechnology
>> Schaenzlestr. 1, D-79104 Freiburg
>> fax:        +49 761 203 6945
>> phone:      +49 761 203 6989
>> homepage:   http://www.plant-biotech.net/
>>            http://www.cosmoss.org/
>> e-mail:     daniel.lang at biologie.uni-freiburg.de
>>
>> #################################################
>> My software never has bugs.
>> It just develops random features.
>> #################################################
>>
>>
>>
>>
>> --------------------------------------------------------------------------
>> ----
>> All of the data generated in your IT infrastructure is seriously valuable.
>> Why? It contains a definitive record of application performance, security
>> threats, fraudulent activity, and more. Splunk takes this data and makes
>> sense of it. IT sense. And common sense.
>> http://p.sf.net/sfu/splunk-d2d-c2
>> _______________________________________________
>> Gmod-gbrowse mailing list
>> Gmod-gbrowse at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
> 

-- 

Dr. Daniel Lang
University of Freiburg, Plant Biotechnology
Schaenzlestr. 1, D-79104 Freiburg
fax:        +49 761 203 6945
phone:      +49 761 203 6989
homepage:   http://www.plant-biotech.net/
            http://www.cosmoss.org/
e-mail:     daniel.lang at biologie.uni-freiburg.de

#################################################
My software never has bugs.
It just develops random features.
#################################################




From lincoln.stein at gmail.com  Mon Jul  4 09:04:50 2011
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Mon, 4 Jul 2011 09:04:50 -0400
Subject: [Bioperl-l] [Gmod-gbrowse] scores in Bio::DB::BigBed
In-Reply-To: <4E103AE6.2030003@biologie.uni-freiburg.de>
References: <4E103AE6.2030003@biologie.uni-freiburg.de>
Message-ID: 

Hi Dan,

The documentation for BigBed is scanty; all I know about it is what is
provided by the bigbed library is in Jim Kent's bigbed.h include file. I had
thought that the scores in BED files would come through into the summary
statistics like those in BigWig, but now I'm looking at the example data
provided in Jim's source code, and see that the BigBed example source file
has scores of "0".

I'll investigate whether there is an issue in the Perl layer, but it could
easily be a limitation in the library itself. Have you considered using a
BedGraph file and indexing it with bedGraphToBigWig? I know that the
Bio::DB::BigWig interface works perfectly to retrieve and summarize the
scores.

Lincoln

On Sun, Jul 3, 2011 at 5:48 AM, Daniel Lang <
Daniel.Lang at biologie.uni-freiburg.de> wrote:

> Hi,
>
> quick question about the BigBed adaptor: Is it correct that the bin and
> summary functions only return statistics about the number of features in
> the defined intervals?
> I was expecting them to deliver statistics about the score if the
> respective bb file has a defined score field.
> If this is true, does this also mean that I cannot plot the distribution
> of scores in BigBed files in gbrowse?
>
> This is the first time I'm using BigBed, maybe I'm doing something wrong...
>
> I had some trouble formatting the bed files correctly in order to see
> the score in the features returned by the Bio::DB::BigBed::features()
> routine. It seems the bigbed entries will only have a correctly assigned
> score field if you also provide a non-empty name field. Initially I
> thought that the order of columns is irrelevant if you use an .as file
> in the bedToBigBed call, but that doesn't seem to be the case.
>
> Best,
> Daniel
> --
>
> Dr. Daniel Lang
> University of Freiburg, Plant Biotechnology
> Schaenzlestr. 1, D-79104 Freiburg
> fax:        +49 761 203 6945
> phone:      +49 761 203 6989
> homepage:   http://www.plant-biotech.net/
>            http://www.cosmoss.org/
> e-mail:     daniel.lang at biologie.uni-freiburg.de
>
> #################################################
> My software never has bugs.
> It just develops random features.
> #################################################
>
>
>
>
>
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is seriously valuable.
> Why? It contains a definitive record of application performance, security
> threats, fraudulent activity, and more. Splunk takes this data and makes
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2d-c2
> _______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>



-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa 

From lincoln.stein at gmail.com  Mon Jul  4 09:22:15 2011
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Mon, 4 Jul 2011 09:22:15 -0400
Subject: [Bioperl-l] [Gmod-gbrowse] scores in Bio::DB::BigBed
In-Reply-To: 
References: <4E103AE6.2030003@biologie.uni-freiburg.de>
	
Message-ID: 

I had a look at the output of bigBedSummary, which is from Jim Kent's source
tree (no Perl involved), and it appears that the statistics it provides are
limited to coverage; so I don't think you can do anything with the scores if
you're using BigBed indexing. Have a look at BedGraph=>BigWig and see if it
meets your needs.

Lincoln

On Mon, Jul 4, 2011 at 9:04 AM, Lincoln Stein wrote:

> Hi Dan,
>
> The documentation for BigBed is scanty; all I know about it is what is
> provided by the bigbed library is in Jim Kent's bigbed.h include file. I had
> thought that the scores in BED files would come through into the summary
> statistics like those in BigWig, but now I'm looking at the example data
> provided in Jim's source code, and see that the BigBed example source file
> has scores of "0".
>
> I'll investigate whether there is an issue in the Perl layer, but it could
> easily be a limitation in the library itself. Have you considered using a
> BedGraph file and indexing it with bedGraphToBigWig? I know that the
> Bio::DB::BigWig interface works perfectly to retrieve and summarize the
> scores.
>
> Lincoln
>
>
> On Sun, Jul 3, 2011 at 5:48 AM, Daniel Lang <
> Daniel.Lang at biologie.uni-freiburg.de> wrote:
>
>> Hi,
>>
>> quick question about the BigBed adaptor: Is it correct that the bin and
>> summary functions only return statistics about the number of features in
>> the defined intervals?
>> I was expecting them to deliver statistics about the score if the
>> respective bb file has a defined score field.
>> If this is true, does this also mean that I cannot plot the distribution
>> of scores in BigBed files in gbrowse?
>>
>> This is the first time I'm using BigBed, maybe I'm doing something
>> wrong...
>>
>> I had some trouble formatting the bed files correctly in order to see
>> the score in the features returned by the Bio::DB::BigBed::features()
>> routine. It seems the bigbed entries will only have a correctly assigned
>> score field if you also provide a non-empty name field. Initially I
>> thought that the order of columns is irrelevant if you use an .as file
>> in the bedToBigBed call, but that doesn't seem to be the case.
>>
>> Best,
>> Daniel
>> --
>>
>> Dr. Daniel Lang
>> University of Freiburg, Plant Biotechnology
>> Schaenzlestr. 1, D-79104 Freiburg
>> fax:        +49 761 203 6945
>> phone:      +49 761 203 6989
>> homepage:   http://www.plant-biotech.net/
>>            http://www.cosmoss.org/
>> e-mail :
>> daniel.lang at biologie.uni-freiburg.de
>>
>> #################################################
>> My software never has bugs.
>> It just develops random features.
>> #################################################
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> All of the data generated in your IT infrastructure is seriously valuable.
>> Why? It contains a definitive record of application performance, security
>> threats, fraudulent activity, and more. Splunk takes this data and makes
>> sense of it. IT sense. And common sense.
>> http://p.sf.net/sfu/splunk-d2d-c2
>> _______________________________________________
>> Gmod-gbrowse mailing list
>> Gmod-gbrowse at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>
>
>
>
> --
> Lincoln D. Stein
> Director, Informatics and Biocomputing Platform
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Renata Musa 
>



-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa 

From cjfields at illinois.edu  Mon Jul  4 12:10:43 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 4 Jul 2011 11:10:43 -0500
Subject: [Bioperl-l] [Gmod-gbrowse] scores in Bio::DB::BigBed
In-Reply-To: 
References: <4E103AE6.2030003@biologie.uni-freiburg.de>
	
	
Message-ID: 

I generally follow these rules where I want a common set of possibly volatile features (e.g. specific transcriptome analysis) separate from my main 'stable' feature database (e.g. gene models):

1) BigBed - lightweight bundle of simple features where the ranges may overlap, but I'm not concerned about score.  I have found BED/BigBed scores of limited use in most cases to me unless I scale the data (since they must be 0-1000 integer values).  Document it very well if you do any scaling! YMMV

2) SAM/BAM - bundle of (possibly overlapping) features where summary stats are needed.  I've seen these used for BLAST/BLAT runs, etc.

3) BigWig - quantitative data of fixed or varying ranges covering entire genome, ranges can't overlap

4) BedGraph - quantitative sparse data, ranges can't overlap (these are converted over to BigWig for GBrowse, though)

5) Of course, one can also set up separate DB::SF::Store databases as well depending on your needs (I have used both the SQLite and MySQL adaptors for this).

I think this is almost begging for a 'best practices' chart/table somewhere, maybe a GBrowse 'cookbook' of common data representation cases.

chris

On Jul 4, 2011, at 8:22 AM, Lincoln Stein wrote:

> I had a look at the output of bigBedSummary, which is from Jim Kent's source
> tree (no Perl involved), and it appears that the statistics it provides are
> limited to coverage; so I don't think you can do anything with the scores if
> you're using BigBed indexing. Have a look at BedGraph=>BigWig and see if it
> meets your needs.
> 
> Lincoln
> 
> On Mon, Jul 4, 2011 at 9:04 AM, Lincoln Stein wrote:
> 
>> Hi Dan,
>> 
>> The documentation for BigBed is scanty; all I know about it is what is
>> provided by the bigbed library is in Jim Kent's bigbed.h include file. I had
>> thought that the scores in BED files would come through into the summary
>> statistics like those in BigWig, but now I'm looking at the example data
>> provided in Jim's source code, and see that the BigBed example source file
>> has scores of "0".
>> 
>> I'll investigate whether there is an issue in the Perl layer, but it could
>> easily be a limitation in the library itself. Have you considered using a
>> BedGraph file and indexing it with bedGraphToBigWig? I know that the
>> Bio::DB::BigWig interface works perfectly to retrieve and summarize the
>> scores.
>> 
>> Lincoln
>> 
>> 
>> On Sun, Jul 3, 2011 at 5:48 AM, Daniel Lang <
>> Daniel.Lang at biologie.uni-freiburg.de> wrote:
>> 
>>> Hi,
>>> 
>>> quick question about the BigBed adaptor: Is it correct that the bin and
>>> summary functions only return statistics about the number of features in
>>> the defined intervals?
>>> I was expecting them to deliver statistics about the score if the
>>> respective bb file has a defined score field.
>>> If this is true, does this also mean that I cannot plot the distribution
>>> of scores in BigBed files in gbrowse?
>>> 
>>> This is the first time I'm using BigBed, maybe I'm doing something
>>> wrong...
>>> 
>>> I had some trouble formatting the bed files correctly in order to see
>>> the score in the features returned by the Bio::DB::BigBed::features()
>>> routine. It seems the bigbed entries will only have a correctly assigned
>>> score field if you also provide a non-empty name field. Initially I
>>> thought that the order of columns is irrelevant if you use an .as file
>>> in the bedToBigBed call, but that doesn't seem to be the case.
>>> 
>>> Best,
>>> Daniel
>>> --
>>> 
>>> Dr. Daniel Lang
>>> University of Freiburg, Plant Biotechnology
>>> Schaenzlestr. 1, D-79104 Freiburg
>>> fax:        +49 761 203 6945
>>> phone:      +49 761 203 6989
>>> homepage:   http://www.plant-biotech.net/
>>>           http://www.cosmoss.org/
>>> e-mail :
>>> daniel.lang at biologie.uni-freiburg.de
>>> 
>>> #################################################
>>> My software never has bugs.
>>> It just develops random features.
>>> #################################################
>>> 
>>> 
>>> 
>>> 
>>> 
>>> ------------------------------------------------------------------------------
>>> All of the data generated in your IT infrastructure is seriously valuable.
>>> Why? It contains a definitive record of application performance, security
>>> threats, fraudulent activity, and more. Splunk takes this data and makes
>>> sense of it. IT sense. And common sense.
>>> http://p.sf.net/sfu/splunk-d2d-c2
>>> _______________________________________________
>>> Gmod-gbrowse mailing list
>>> Gmod-gbrowse at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>> 
>> 
>> 
>> 
>> --
>> Lincoln D. Stein
>> Director, Informatics and Biocomputing Platform
>> Ontario Institute for Cancer Research
>> 101 College St., Suite 800
>> Toronto, ON, Canada M5G0A3
>> 416 673-8514
>> Assistant: Renata Musa 
>> 
> 
> 
> 
> -- 
> Lincoln D. Stein
> Director, Informatics and Biocomputing Platform
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Renata Musa 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From carandraug+dev at gmail.com  Sun Jul  3 22:38:36 2011
From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=)
Date: Mon, 4 Jul 2011 03:38:36 +0100
Subject: [Bioperl-l] parsing entrezgene file (lost data)
In-Reply-To: 
References: 
Message-ID: 

Hi

I've been trying to get some data from an ASN.1 entrezgene file.
However, I can't seem to access some of the data on the file. ?I've
read the Feature-annotations page on the wiki (even fixed a bug in
there) but still nothing. So I used Data::Dumper to look at the Seq
and Annotation objects and couldn't see it in there at all although
it's on the original file (attached).

The data I want from the sequence are the ids "NM_002105" and
"NP_002096" which show up several times on the file. However, when I
do this:

use Data::Dump;
use Bio::SeqIO;
my $file = $ARGV[0];
my $seqio_object = Bio::SeqIO->new(-file => $file, -format => 'entrezgene');
my $seq_object = $seqio_object->next_seq;
print Dumper($seq_object);

I can't find 002105 or 002096 anywhere on the output.

Am I doing something wrong? How can I solve this?

Thanks in advance,
Carn? Draug
-------------- next part --------------
A non-text attachment was scrubbed...
Name: entrezgene
Type: application/octet-stream
Size: 219779 bytes
Desc: not available
URL: 

From tzhu at mail.bnu.edu.cn  Tue Jul  5 06:50:11 2011
From: tzhu at mail.bnu.edu.cn (Tao Zhu)
Date: Tue, 05 Jul 2011 18:50:11 +0800
Subject: [Bioperl-l] questions on analysising clustalw alignment result files
Message-ID: <4E12EC63.7090807@mail.bnu.edu.cn>

I've created an alignment file using clustalw 2.0.12. This file 
"test.aln" is attached in the mail.

I want to analysis it using Bio::AlignIO, so I write
###########################################
use Bio::AlignIO;

my $catch_obj = Bio::AlignIO->new(-file=>'test.aln',
                                   -format=>'clustalw');

while ( my $align_obj = $catch_obj->next_aln() )
{
     for $seq_obj ($align_obj->each_seq())
     {
         my $name=$seq_obj->display_id;
         my $seq=$seq_obj->seq;
         print "$name\t$seq\n";
     }
}
############################################

But the scrpit prints nothing.

I've tried to see the object $catch_obj and $align_obj using Data::Dumper.
The object $catch_obj reviews as
$VAR1 = bless( {
                  '_line_length' => 60,
                  '_root_cleanup_methods' => [
                                               sub { "DUMMY" }
                                             ],
                  '_flush_on_write' => 1,
                  '_filehandle' => \*Symbol::GEN0,
                  '_file' => 'test.aln',
                  '_root_verbose' => 0
                }, 'Bio::AlignIO::clustalw' );
Obviously we've read the file correctly!
However the object $align_obj prints nothing. So probably there's 
something wrong with the method next_aln() called by $catch_obj. Why?

-- 
Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
100875, China
Email: tzhu at mail.bnu.edu.cn
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.aln
Type: application/x-wine-extension-aln
Size: 748 bytes
Desc: not available
URL: 

From p.j.a.cock at googlemail.com  Tue Jul  5 09:50:57 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Jul 2011 14:50:57 +0100
Subject: [Bioperl-l] questions on analysising clustalw alignment result
	files
In-Reply-To: <4E12EC63.7090807@mail.bnu.edu.cn>
References: <4E12EC63.7090807@mail.bnu.edu.cn>
Message-ID: 

On Tue, Jul 5, 2011 at 11:50 AM, Tao Zhu  wrote:
> I've created an alignment file using clustalw 2.0.12. This file "test.aln"
> is attached in the mail.
>
> I want to analysis it using Bio::AlignIO, ...

The Biopython clustal parser doesn't like it either - I think the extra sequence
numbers are the problem, I don't recall seeing those with clustalw 2.0.10.

I just checked and there is already a clustalw 2.1 release, and a new
release Clustalw Omega 1.0.2 (which curiously isn't called clustalw v3).

Peter

From cjfields at illinois.edu  Tue Jul  5 10:45:49 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 5 Jul 2011 09:45:49 -0500
Subject: [Bioperl-l] questions on analysising clustalw alignment result
	files
In-Reply-To: 
References: <4E12EC63.7090807@mail.bnu.edu.cn>
	
Message-ID: 


On Jul 5, 2011, at 8:50 AM, Peter Cock wrote:

> On Tue, Jul 5, 2011 at 11:50 AM, Tao Zhu  wrote:
>> I've created an alignment file using clustalw 2.0.12. This file "test.aln"
>> is attached in the mail.
>> 
>> I want to analysis it using Bio::AlignIO, ...
> 
> The Biopython clustal parser doesn't like it either - I think the extra sequence
> numbers are the problem, I don't recall seeing those with clustalw 2.0.10.
> 
> I just checked and there is already a clustalw 2.1 release, and a new
> release Clustalw Omega 1.0.2 (which curiously isn't called clustalw v3).
> 
> Peter

The best thing to do in this case is to file a bug for tracking (might be a good thing to have cross-lang bugs in this case, maybe?).  Anyway, it's very likely as Peter says, the parser is choking on something and doesn't recognize the file.

chris

From p.j.a.cock at googlemail.com  Tue Jul  5 10:51:28 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Jul 2011 15:51:28 +0100
Subject: [Bioperl-l] questions on analysising clustalw alignment result
	files
In-Reply-To: 
References: <4E12EC63.7090807@mail.bnu.edu.cn>
	
	
Message-ID: 

On Tue, Jul 5, 2011 at 3:45 PM, Chris Fields  wrote:
>
> The best thing to do in this case is to file a bug for tracking
> (might be a good thing to have cross-lang bugs in this case, maybe?).
> Anyway, it's very likely as Peter says, the parser is choking on something
> and doesn't recognize the file.
>
> chris

Good plan Chris.

Tao, please goto http://redmine.open-bio.org/projects/bioperl to file
a bug. As well as the example output file from your original email,
could you also include the input file for the alignment, and the
clustalw2 command line used to create it?

Thanks,

Peter

From cjfields at illinois.edu  Tue Jul  5 10:57:48 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 5 Jul 2011 09:57:48 -0500
Subject: [Bioperl-l] parsing entrezgene file (lost data)
In-Reply-To: 
References: 
	
Message-ID: <09FE9936-5C4E-4826-BF30-57336D9C2A79@illinois.edu>

Carne,

Using the latest Bio::ASN1::EntrezGene parser and bioperl-live from github, and changing the dumper module to Data::Dumper, I can sort of repeat this, but I see the accession attached to a URL only (in the ASN1 output it is in several places).  The Bio::Seq is populated with data, however, which makes me think these tags are not parsed for some reason, or the SeqIO parser is not catching them.  

Can you file this as a bug and attach the problematic EntrezGene data?  Not sure if it is in the ASN1 parser itself or the bioperl SeqIO parser.

chris

On Jul 3, 2011, at 9:38 PM, Carn? Draug wrote:

> Hi
> 
> I've been trying to get some data from an ASN.1 entrezgene file.
> However, I can't seem to access some of the data on the file.  I've
> read the Feature-annotations page on the wiki (even fixed a bug in
> there) but still nothing. So I used Data::Dumper to look at the Seq
> and Annotation objects and couldn't see it in there at all although
> it's on the original file (attached).
> 
> The data I want from the sequence are the ids "NM_002105" and
> "NP_002096" which show up several times on the file. However, when I
> do this:
> 
> use Data::Dump;
> use Bio::SeqIO;
> my $file = $ARGV[0];
> my $seqio_object = Bio::SeqIO->new(-file => $file, -format => 'entrezgene');
> my $seq_object = $seqio_object->next_seq;
> print Dumper($seq_object);
> 
> I can't find 002105 or 002096 anywhere on the output.
> 
> Am I doing something wrong? How can I solve this?
> 
> Thanks in advance,
> Carn? Draug
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From Russell.Smithies at agresearch.co.nz  Tue Jul  5 17:55:19 2011
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 6 Jul 2011 09:55:19 +1200
Subject: [Bioperl-l] parsing entrezgene file (lost data)
In-Reply-To: 
References: 
	
Message-ID: <18DF7D20DFEC044098A1062202F5FFF3396074D274@exchsth.agresearch.co.nz>

It is in there, just takes a bit of getting at.
Frequent use of Data::Dumper to work out where you are helps.



use warnings;
use strict;
use Bio::ASN1::EntrezGene;
use Data::Dumper;

my $parser = Bio::ASN1::EntrezGene->new('file' => "entrezgene.asn");
while(my $result = $parser->next_seq){
    $result = $result->[0] if(ref($result) eq 'ARRAY');
    foreach my $l (@{$result->{locus}}){
        foreach my $p (@{$l->{products}}){

          my $nuc_gi = $p->{seqs}->[0]->{whole}->[0]->{gi};
          my $nuc_acc = $p->{accession};

          my $prot_gi = $p->{products}->[0]->{seqs}->[0]->{whole}->[0]->{gi};
          my $prot_acc = $p->{products}->[0]->{accession};

          print "$nuc_gi, $nuc_acc\t$prot_gi, $prot_acc \n";
        }
    }
}



--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Carn? Draug
> Sent: Monday, 4 July 2011 2:39 p.m.
> To: bioperl mailing list
> Subject: [Bioperl-l] parsing entrezgene file (lost data)
> 
> Hi
> 
> I've been trying to get some data from an ASN.1 entrezgene file.
> However, I can't seem to access some of the data on the file. ?I've
> read the Feature-annotations page on the wiki (even fixed a bug in
> there) but still nothing. So I used Data::Dumper to look at the Seq and
> Annotation objects and couldn't see it in there at all although it's on
> the original file (attached).
> 
> The data I want from the sequence are the ids "NM_002105" and
> "NP_002096" which show up several times on the file. However, when I do
> this:
> 
> use Data::Dump;
> use Bio::SeqIO;
> my $file = $ARGV[0];
> my $seqio_object = Bio::SeqIO->new(-file => $file, -format =>
> 'entrezgene'); my $seq_object = $seqio_object->next_seq; print
> Dumper($seq_object);
> 
> I can't find 002105 or 002096 anywhere on the output.
> 
> Am I doing something wrong? How can I solve this?
> 
> Thanks in advance,
> Carn? Draug
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From carandraug+dev at gmail.com  Tue Jul  5 18:16:07 2011
From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=)
Date: Tue, 5 Jul 2011 23:16:07 +0100
Subject: [Bioperl-l] parsing entrezgene file (lost data)
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF3396074D274@exchsth.agresearch.co.nz>
References: 
	
	<18DF7D20DFEC044098A1062202F5FFF3396074D274@exchsth.agresearch.co.nz>
Message-ID: 

On 5 July 2011 22:55, Smithies, Russell
 wrote:
> It is in there, just takes a bit of getting at.
> Frequent use of Data::Dumper to work out where you are helps.
>
>
>
> use warnings;
> use strict;
> use Bio::ASN1::EntrezGene;
> use Data::Dumper;
>
> my $parser = Bio::ASN1::EntrezGene->new('file' => "entrezgene.asn");
> while(my $result = $parser->next_seq){
> ? ?$result = $result->[0] if(ref($result) eq 'ARRAY');
> ? ?foreach my $l (@{$result->{locus}}){
> ? ? ? ?foreach my $p (@{$l->{products}}){
>
> ? ? ? ? ?my $nuc_gi = $p->{seqs}->[0]->{whole}->[0]->{gi};
> ? ? ? ? ?my $nuc_acc = $p->{accession};
>
> ? ? ? ? ?my $prot_gi = $p->{products}->[0]->{seqs}->[0]->{whole}->[0]->{gi};
> ? ? ? ? ?my $prot_acc = $p->{products}->[0]->{accession};
>
> ? ? ? ? ?print "$nuc_gi, $nuc_acc\t$prot_gi, $prot_acc \n";
> ? ? ? ?}
> ? ?}
> }
>

Hmm.. I see it now but it's still not there when using the Bio::SeqIO
module (I just tried with Bio::ASN1::EntrezGene as in your example and
I can see it now). I thought that using the specific module was not
recommended.

I just cloned the bioperl repo but the modules code is too much for
me. It seems that Bio::SeqIO uses the Bio::SeqIO::entrezgene module
instead of Bio::ASN1::EntrezGene . But then Bio::SeqIO::entrezgene
does use Bio::ASN1::EntrezGene on the initializing method (this is the
line from the module code)

    $self->{_parser} = Bio::ASN1::EntrezGene->new( file => $param{-file} );

So I have no idea what's wrong. Still, it's nice to have a workaround
for now. Thank you,

Carn?


From cjfields at illinois.edu  Tue Jul  5 18:22:25 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 5 Jul 2011 17:22:25 -0500
Subject: [Bioperl-l] parsing entrezgene file (lost data)
In-Reply-To: 
References: 
	
	<18DF7D20DFEC044098A1062202F5FFF3396074D274@exchsth.agresearch.co.nz>
	
Message-ID: <57A22BC4-9A76-45B8-B403-EC193D47F36B@illinois.edu>

On Jul 5, 2011, at 5:16 PM, Carn? Draug wrote:

> On 5 July 2011 22:55, Smithies, Russell
>  wrote:
>> It is in there, just takes a bit of getting at.
>> Frequent use of Data::Dumper to work out where you are helps.
>> 
>> 
>> 
>> use warnings;
>> use strict;
>> use Bio::ASN1::EntrezGene;
>> use Data::Dumper;
>> 
>> my $parser = Bio::ASN1::EntrezGene->new('file' => "entrezgene.asn");
>> while(my $result = $parser->next_seq){
>>    $result = $result->[0] if(ref($result) eq 'ARRAY');
>>    foreach my $l (@{$result->{locus}}){
>>        foreach my $p (@{$l->{products}}){
>> 
>>          my $nuc_gi = $p->{seqs}->[0]->{whole}->[0]->{gi};
>>          my $nuc_acc = $p->{accession};
>> 
>>          my $prot_gi = $p->{products}->[0]->{seqs}->[0]->{whole}->[0]->{gi};
>>          my $prot_acc = $p->{products}->[0]->{accession};
>> 
>>          print "$nuc_gi, $nuc_acc\t$prot_gi, $prot_acc \n";
>>        }
>>    }
>> }
>> 
> 
> Hmm.. I see it now but it's still not there when using the Bio::SeqIO
> module (I just tried with Bio::ASN1::EntrezGene as in your example and
> I can see it now). I thought that using the specific module was not
> recommended.

Not that; in general the data should go into the Bio::Seq, but in this case it's being missed.

> I just cloned the bioperl repo but the modules code is too much for
> me. It seems that Bio::SeqIO uses the Bio::SeqIO::entrezgene module
> instead of Bio::ASN1::EntrezGene . But then Bio::SeqIO::entrezgene
> does use Bio::ASN1::EntrezGene on the initializing method (this is the
> line from the module code)
> 
>    $self->{_parser} = Bio::ASN1::EntrezGene->new( file => $param{-file} );
> 
> So I have no idea what's wrong. Still, it's nice to have a workaround
> for now. Thank you,
> 
> Carn?

My guess is lack of implementation from the bioperl end on grabbing this specific data.  

chris



From Russell.Smithies at agresearch.co.nz  Tue Jul  5 18:26:41 2011
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 6 Jul 2011 10:26:41 +1200
Subject: [Bioperl-l] parsing entrezgene file (lost data)
In-Reply-To: 
References: 
	
	<18DF7D20DFEC044098A1062202F5FFF3396074D274@exchsth.agresearch.co.nz>
	
Message-ID: <18DF7D20DFEC044098A1062202F5FFF3396074D275@exchsth.agresearch.co.nz>

Bio::ASN1::EntrezGene is not the easiest to work with but you can access everything if you try hard enough.
I used it last year from transforming ASN.1 gene records from NCBI into fully annotated Wiki pages and it was very successful though I got sick of typing so many curly brackets ;-)


--Russell


> -----Original Message-----
> From: carandraug at gmail.com [mailto:carandraug at gmail.com] On Behalf Of
> Carn? Draug
> Sent: Wednesday, 6 July 2011 10:16 a.m.
> To: Smithies, Russell
> Cc: bioperl mailing list
> Subject: Re: [Bioperl-l] parsing entrezgene file (lost data)
> 
> On 5 July 2011 22:55, Smithies, Russell
>  wrote:
> > It is in there, just takes a bit of getting at.
> > Frequent use of Data::Dumper to work out where you are helps.
> >
> >
> >
> > use warnings;
> > use strict;
> > use Bio::ASN1::EntrezGene;
> > use Data::Dumper;
> >
> > my $parser = Bio::ASN1::EntrezGene->new('file' => "entrezgene.asn");
> > while(my $result = $parser->next_seq){
> > ? ?$result = $result->[0] if(ref($result) eq 'ARRAY');
> > ? ?foreach my $l (@{$result->{locus}}){
> > ? ? ? ?foreach my $p (@{$l->{products}}){
> >
> > ? ? ? ? ?my $nuc_gi = $p->{seqs}->[0]->{whole}->[0]->{gi};
> > ? ? ? ? ?my $nuc_acc = $p->{accession};
> >
> > ? ? ? ? ?my $prot_gi = $p->{products}->[0]->{seqs}->[0]->{whole}-
> >[0]->{gi};
> > ? ? ? ? ?my $prot_acc = $p->{products}->[0]->{accession};
> >
> > ? ? ? ? ?print "$nuc_gi, $nuc_acc\t$prot_gi, $prot_acc \n";
> > ? ? ? ?}
> > ? ?}
> > }
> >
> 
> Hmm.. I see it now but it's still not there when using the Bio::SeqIO
> module (I just tried with Bio::ASN1::EntrezGene as in your example and
> I can see it now). I thought that using the specific module was not
> recommended.
> 
> I just cloned the bioperl repo but the modules code is too much for
> me. It seems that Bio::SeqIO uses the Bio::SeqIO::entrezgene module
> instead of Bio::ASN1::EntrezGene . But then Bio::SeqIO::entrezgene
> does use Bio::ASN1::EntrezGene on the initializing method (this is the
> line from the module code)
> 
>     $self->{_parser} = Bio::ASN1::EntrezGene->new( file => $param{-
> file} );
> 
> So I have no idea what's wrong. Still, it's nice to have a workaround
> for now. Thank you,
> 
> Carn?
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From carandraug+dev at gmail.com  Tue Jul  5 18:38:08 2011
From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=)
Date: Tue, 5 Jul 2011 23:38:08 +0100
Subject: [Bioperl-l] parsing entrezgene file (lost data)
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF3396074D275@exchsth.agresearch.co.nz>
References: 
	
	<18DF7D20DFEC044098A1062202F5FFF3396074D274@exchsth.agresearch.co.nz>
	
	<18DF7D20DFEC044098A1062202F5FFF3396074D275@exchsth.agresearch.co.nz>
Message-ID: 

Well, I update the bug report with what you found, thank you.

2011/7/5 Smithies, Russell :
> Bio::ASN1::EntrezGene is not the easiest to work with but you can access everything if you try hard enough.
> I used it last year from transforming ASN.1 gene records from NCBI into fully annotated Wiki pages and it was very successful though I got sick of typing so many curly brackets ;-)

You mean I should access the data "manually" rather than using
methods? It will have to do by now although that's kind of the
opposite of objects are meant to (I think, I'm no programmer).

My plan is to make an application that can be reused by other people
hence trying to do it in a nice maintainable way without too many
hacks and why I can't just parse the gene2refseq file.

Since what I want is to get the transcripts and proteins given a gene
UID, I can see two options.
  1 - parse the ASN1 file and access the data 'manually' until this is
fixed (and then fix the code to use the methods)
  2 - use elink from EUtilities. But since it fails around half the
times, I'd have to check if it's a pseudo gene first. If it's not it
should link to at least one place in the nucleotide database so I'd
have the connection on an eval block until an id is returned.

I think I'll go for the first option but opinions are welcome.

Carn?


From tzhu at mail.bnu.edu.cn  Tue Jul  5 20:46:41 2011
From: tzhu at mail.bnu.edu.cn (Tao Zhu)
Date: Wed, 06 Jul 2011 08:46:41 +0800
Subject: [Bioperl-l] questions on analysising clustalw alignment result
 files
In-Reply-To: 
References: <4E12EC63.7090807@mail.bnu.edu.cn>		
	
Message-ID: <4E13B071.5040002@mail.bnu.edu.cn>

Thank you for everyone. In fact I've solved the problem. It was maily 
due to the blanks in the sequence names. In the file "test.aln", the two 
sequences were named as "3R	21150113-21154664" and "3RHet 
2076433-2080968", both of which had a blank inside. If I delete the 
blanks, the script works!

? 2011?07?05? 22:51, Peter Cock ??:
> On Tue, Jul 5, 2011 at 3:45 PM, Chris Fields  wrote:
>>
>> The best thing to do in this case is to file a bug for tracking
>> (might be a good thing to have cross-lang bugs in this case, maybe?).
>> Anyway, it's very likely as Peter says, the parser is choking on something
>> and doesn't recognize the file.
>>
>> chris
>
> Good plan Chris.
>
> Tao, please goto http://redmine.open-bio.org/projects/bioperl to file
> a bug. As well as the example output file from your original email,
> could you also include the input file for the alignment, and the
> clustalw2 command line used to create it?
>
> Thanks,
>
> Peter
>


-- 
Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
100875, China
Email: tzhu at mail.bnu.edu.cn


From cjfields at illinois.edu  Tue Jul  5 21:00:57 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 5 Jul 2011 20:00:57 -0500
Subject: [Bioperl-l] parsing entrezgene file (lost data)
In-Reply-To: 
References: 
	
	<18DF7D20DFEC044098A1062202F5FFF3396074D274@exchsth.agresearch.co.nz>
	
	<18DF7D20DFEC044098A1062202F5FFF3396074D275@exchsth.agresearch.co.nz>
	
Message-ID: <50FD4B4C-1EE4-4916-A594-898E973281A6@illinois.edu>

On Jul 5, 2011, at 5:38 PM, Carn? Draug wrote:

> Well, I update the bug report with what you found, thank you.
> 
> 2011/7/5 Smithies, Russell :
>> Bio::ASN1::EntrezGene is not the easiest to work with but you can access everything if you try hard enough.
>> I used it last year from transforming ASN.1 gene records from NCBI into fully annotated Wiki pages and it was very successful though I got sick of typing so many curly brackets ;-)
> 
> You mean I should access the data "manually" rather than using
> methods? It will have to do by now although that's kind of the
> opposite of objects are meant to (I think, I'm no programmer).
> 
> My plan is to make an application that can be reused by other people
> hence trying to do it in a nice maintainable way without too many
> hacks and why I can't just parse the gene2refseq file.
> 
> Since what I want is to get the transcripts and proteins given a gene
> UID, I can see two options.
>  1 - parse the ASN1 file and access the data 'manually' until this is
> fixed (and then fix the code to use the methods)
>  2 - use elink from EUtilities. But since it fails around half the
> times, I'd have to check if it's a pseudo gene first. If it's not it
> should link to at least one place in the nucleotide database so I'd
> have the connection on an eval block until an id is returned.
> 
> I think I'll go for the first option but opinions are welcome.
> 
> Carn?

I assume the problem is that not every piece of data has been mapped to relevant BioPerl classes to be stored in the Bio::Seq, thus the lack of support for these.  Bio::ASN1::EntrezGene is a fairly generic parser, though (as Russell points out the data is there but hasn't been mapped to objects).  Maybe someone with a bit more experience with this parser can chip in, though?

chris

From Daniel.Lang at biologie.uni-freiburg.de  Wed Jul  6 03:54:57 2011
From: Daniel.Lang at biologie.uni-freiburg.de (Daniel Lang)
Date: Wed, 06 Jul 2011 09:54:57 +0200
Subject: [Bioperl-l] [Gmod-gbrowse] scores in Bio::DB::BigBed
In-Reply-To: 
References: <4E103AE6.2030003@biologie.uni-freiburg.de>
	
	
	
Message-ID: <4E1414D1.4010006@biologie.uni-freiburg.de>

Hi all,

thanks a lot for your input on this!

I want to explore the repeat structure of our model genome derived by
lastz self-alignments (using %id as score).
Since this is a HUGE file and I initially wanted to have the ability to
access the information for individual repeat regions also in gbrowse, I
wanted to use BigBed. Having the data in hand, it seems not to be such a
good idea anyway since the resulting repeat graph is much more complex
that I expected. So summarizing using the score and/or coverage will do
just fine;-)

But as they are repeats they're overlapping. So if I see it correctly
BigWig/BedGraph aren't an option. Due to the size limitations, I have
not stored individual CIGAR strings that I could use to generate full-
blown SAM files. Or can I use BAM without sequence/qual data?

Or is there an existing tool that would allow me to collapse overlapping
ranges with average scores for use in BigWig?

Otherwise, I'll have to live with the coverage graphs for visualization
in gbrowse and use Bio::DB::BigBed::features to look at conservation
score at individual loci.

Chris, the proposed BP page would be extremely helpful :-D

Again, thanks a lot!

Best,
Daniel

Am 04.07.2011 18:10, schrieb Chris Fields:
> I generally follow these rules where I want a common set of possibly volatile features (e.g. specific transcriptome analysis) separate from my main 'stable' feature database (e.g. gene models):
> 
> 1) BigBed - lightweight bundle of simple features where the ranges may overlap, but I'm not concerned about score.  I have found BED/BigBed scores of limited use in most cases to me unless I scale the data (since they must be 0-1000 integer values).  Document it very well if you do any scaling! YMMV
> 
> 2) SAM/BAM - bundle of (possibly overlapping) features where summary stats are needed.  I've seen these used for BLAST/BLAT runs, etc.
> 
> 3) BigWig - quantitative data of fixed or varying ranges covering entire genome, ranges can't overlap
> 
> 4) BedGraph - quantitative sparse data, ranges can't overlap (these are converted over to BigWig for GBrowse, though)
> 
> 5) Of course, one can also set up separate DB::SF::Store databases as well depending on your needs (I have used both the SQLite and MySQL adaptors for this).
> 
> I think this is almost begging for a 'best practices' chart/table somewhere, maybe a GBrowse 'cookbook' of common data representation cases.
> 
> chris
> 
> On Jul 4, 2011, at 8:22 AM, Lincoln Stein wrote:
> 
>> I had a look at the output of bigBedSummary, which is from Jim Kent's source
>> tree (no Perl involved), and it appears that the statistics it provides are
>> limited to coverage; so I don't think you can do anything with the scores if
>> you're using BigBed indexing. Have a look at BedGraph=>BigWig and see if it
>> meets your needs.
>>
>> Lincoln
>>
>> On Mon, Jul 4, 2011 at 9:04 AM, Lincoln Stein wrote:
>>
>>> Hi Dan,
>>>
>>> The documentation for BigBed is scanty; all I know about it is what is
>>> provided by the bigbed library is in Jim Kent's bigbed.h include file. I had
>>> thought that the scores in BED files would come through into the summary
>>> statistics like those in BigWig, but now I'm looking at the example data
>>> provided in Jim's source code, and see that the BigBed example source file
>>> has scores of "0".
>>>
>>> I'll investigate whether there is an issue in the Perl layer, but it could
>>> easily be a limitation in the library itself. Have you considered using a
>>> BedGraph file and indexing it with bedGraphToBigWig? I know that the
>>> Bio::DB::BigWig interface works perfectly to retrieve and summarize the
>>> scores.
>>>
>>> Lincoln
>>>
>>>
>>> On Sun, Jul 3, 2011 at 5:48 AM, Daniel Lang <
>>> Daniel.Lang at biologie.uni-freiburg.de> wrote:
>>>
>>>> Hi,
>>>>
>>>> quick question about the BigBed adaptor: Is it correct that the bin and
>>>> summary functions only return statistics about the number of features in
>>>> the defined intervals?
>>>> I was expecting them to deliver statistics about the score if the
>>>> respective bb file has a defined score field.
>>>> If this is true, does this also mean that I cannot plot the distribution
>>>> of scores in BigBed files in gbrowse?
>>>>
>>>> This is the first time I'm using BigBed, maybe I'm doing something
>>>> wrong...
>>>>
>>>> I had some trouble formatting the bed files correctly in order to see
>>>> the score in the features returned by the Bio::DB::BigBed::features()
>>>> routine. It seems the bigbed entries will only have a correctly assigned
>>>> score field if you also provide a non-empty name field. Initially I
>>>> thought that the order of columns is irrelevant if you use an .as file
>>>> in the bedToBigBed call, but that doesn't seem to be the case.
>>>>
>>>> Best,
>>>> Daniel
>>>> --
>>>>
>>>> Dr. Daniel Lang
>>>> University of Freiburg, Plant Biotechnology
>>>> Schaenzlestr. 1, D-79104 Freiburg
>>>> fax:        +49 761 203 6945
>>>> phone:      +49 761 203 6989
>>>> homepage:   http://www.plant-biotech.net/
>>>>           http://www.cosmoss.org/
>>>> e-mail :
>>>> daniel.lang at biologie.uni-freiburg.de
>>>>
>>>> #################################################
>>>> My software never has bugs.
>>>> It just develops random features.
>>>> #################################################
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> All of the data generated in your IT infrastructure is seriously valuable.
>>>> Why? It contains a definitive record of application performance, security
>>>> threats, fraudulent activity, and more. Splunk takes this data and makes
>>>> sense of it. IT sense. And common sense.
>>>> http://p.sf.net/sfu/splunk-d2d-c2
>>>> _______________________________________________
>>>> Gmod-gbrowse mailing list
>>>> Gmod-gbrowse at lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>>
>>>
>>>
>>>
>>> --
>>> Lincoln D. Stein
>>> Director, Informatics and Biocomputing Platform
>>> Ontario Institute for Cancer Research
>>> 101 College St., Suite 800
>>> Toronto, ON, Canada M5G0A3
>>> 416 673-8514
>>> Assistant: Renata Musa 
>>>
>>
>>
>>
>> -- 
>> Lincoln D. Stein
>> Director, Informatics and Biocomputing Platform
>> Ontario Institute for Cancer Research
>> 101 College St., Suite 800
>> Toronto, ON, Canada M5G0A3
>> 416 673-8514
>> Assistant: Renata Musa 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 

Dr. Daniel Lang
University of Freiburg, Plant Biotechnology
Schaenzlestr. 1, D-79104 Freiburg
fax:        +49 761 203 6945
phone:      +49 761 203 6989
homepage:   http://www.plant-biotech.net/
            http://www.cosmoss.org/
e-mail:     daniel.lang at biologie.uni-freiburg.de

#################################################
My software never has bugs.
It just develops random features.
#################################################




From p.j.a.cock at googlemail.com  Wed Jul  6 04:56:53 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 6 Jul 2011 09:56:53 +0100
Subject: [Bioperl-l] questions on analysising clustalw alignment result
	files
In-Reply-To: <4E13B071.5040002@mail.bnu.edu.cn>
References: <4E12EC63.7090807@mail.bnu.edu.cn>
	
	
	
	<4E13B071.5040002@mail.bnu.edu.cn>
Message-ID: 

On Wed, Jul 6, 2011 at 1:46 AM, Tao Zhu  wrote:
> Thank you for everyone. In fact I've solved the problem. It was maily due to
> the blanks in the sequence names. In the file "test.aln", the two sequences
> were named as "3R ? ? ?21150113-21154664" and "3RHet 2076433-2080968", both
> of which had a blank inside. If I delete the blanks, the script works!

But how did the blanks get there? Perhaps you have found a problem
in clustalw - older versions didn't do that, the sequences would have
been called just "3R" and "3RHet".

Peter


From wrp at virginia.edu  Wed Jul  6 15:03:37 2011
From: wrp at virginia.edu (William Pearson)
Date: Wed, 6 Jul 2011 15:03:37 -0400
Subject: [Bioperl-l] Course application deadline: CSHL Computational and
	Comparative Genomics
References: <119AEEB3-E558-4728-A3FC-DBE729E31846@virginia.edu>
Message-ID: <31C735F5-DFDE-48DA-AD0A-1C247F0065AD@virginia.edu>


Course announcement:

Cold Spring Harbor COMPUTATIONAL & COMPARATIVE GENOMICS

November 9 - 15, 2011
Application Deadline: July 15, 2011

----------------------------------------------------------------

INSTRUCTORS:
William Pearson, University of Virginia, Charlottesville, VA<
Lisa Stubbs, University of Illinois, Urbana, IL

This course presents a comprehensive overview of the theory and
practice of computational methods for the identification and
characterization of functional elements from DNA sequence data. The
course focuses on approaches for extracting the maximum amount of
information from protein and DNA sequence similarity through sequence
database searches, statistical analysis, and multiple sequence
alignment. Additional topics include:

* Alignment and analysis of "Next-Gen" sequence data
* The Galaxy environment for high-throughput analysis
* Identification of conserved signals in aligned and unaligned sequences
* Regulatory element and motif recognition
* Integration of genetic and sequence information in biological databases
* The ENSEMBL genome browser and BioMart

The course combines lectures with hands-on exercises; students are
encouraged to pose challenging sequence analysis problems using their
own data. The course is designed for biologists seeking advanced
training in biological sequence and genome analysis, computational
biology core resource directors and staff, and for scientists in other
disciplines, such as computer science, who wish to survey current
research problems in biological sequence analysis.  Advanced
programming skills are not required.

The lecture/lab schedule for the 2010 course can be found at
http://fasta.bioch.virginia.edu/cshl

Speakers in 2011 course will include:

Aaron Mackey, U. of Virginia, Next-Gen analysis pipelines

Bert Overduin, European Bioinformatics Institute, UK, ENSEMBL and BioMart

Frances Ouelette, Ontario Cancer Research Institute, Databases for
Biological Function

William Pearson, U. of Virginia, Similarity Searching, Multiple Alignment

Lisa Stubbs, U. of Illinois, Urbana, Genome browsing, Comparative genomics

James Taylor, Emory, Galaxy and genome analysis pipelines

The primary focus of the computational and comparative genomics course
is the theory and practice of algorithms used in computational
biology, with the goal of using current methods more effectively and
developing new algorithms.  Students more interested in the practical
aspects of software development are encouraged to apply to the course
on Programming for biology. Students who would like in-depth training
in the analysis of next-generation sequencing data (e.g., SNP calling
and the detection of structural variants) should apply to the course
on Advanced Sequencing Technologies & Applications.

----------------------------------------------------------------

To apply to the course, fill out and send in the form at:

http://meetings.cshl.edu/course/courseapp_instr.shtml




From ross at cuhk.edu.hk  Wed Jul  6 22:43:10 2011
From: ross at cuhk.edu.hk (Ross KK Leung)
Date: Thu, 7 Jul 2011 10:43:10 +0800
Subject: [Bioperl-l] limit the number of blast output per query
In-Reply-To: <4DFE907D.1000204@gmail.com>
References: 	<01f901cb7203$f66e4040$e34ac0c0$%yin@ucd.ie>			<004001cbe2d5$76598200$630c8600$@edu.hk>		<005301cbe31b$a3bee550$eb3caff0$@edu.hk>		<9CD1455E-88B4-4E2A-B3BC-398C10D5AAA9@tamu.edu>	<3E73745F-A687-4229-B71E-5C56B2D1FBAE@illinois.edu>		<009001cc2edc$09b80740$1d2815c0$@edu.hk>
	<4DFE907D.1000204@gmail.com>
Message-ID: <00d001cc3c4f$9bddd070$d3997150$@edu.hk>

I know this question should submit to BLAST help but it seems they have
already been overwhelmed by incoming emails. I wonder any bioperl users
happen to know how to limit the number of blast output per query. For
example, for human genome as a database to blast against, a single query can
generate 10,000+ hits. I have already supplied -b 30 -v 30 flags but
obviously the blastall from blast2.2.22 does not "obey" my instruction. 

The output files generated are usually larger than 100G+ but indeed the
final ones that I want usually are only of 10M-. Is there any way to help
save our Earth (Not exaggerated, energy is WASTED in a meaningless manner)?



From p.j.a.cock at googlemail.com  Thu Jul  7 05:03:34 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 7 Jul 2011 10:03:34 +0100
Subject: [Bioperl-l] limit the number of blast output per query
In-Reply-To: <00d001cc3c4f$9bddd070$d3997150$@edu.hk>
References: 
	<01f901cb7203$f66e4040$e34ac0c0$%yin@ucd.ie>
	
	
	<004001cbe2d5$76598200$630c8600$@edu.hk>
	
	<005301cbe31b$a3bee550$eb3caff0$@edu.hk>
	
	<9CD1455E-88B4-4E2A-B3BC-398C10D5AAA9@tamu.edu>
	<3E73745F-A687-4229-B71E-5C56B2D1FBAE@illinois.edu>
	
	<009001cc2edc$09b80740$1d2815c0$@edu.hk>
	<4DFE907D.1000204@gmail.com>
	<00d001cc3c4f$9bddd070$d3997150$@edu.hk>
Message-ID: 

On Thu, Jul 7, 2011 at 3:43 AM, Ross KK Leung  wrote:
> I know this question should submit to BLAST help but it seems they have
> already been overwhelmed by incoming emails. I wonder any bioperl users
> happen to know how to limit the number of blast output per query. For
> example, for human genome as a database to blast against, a single query can
> generate 10,000+ hits. I have already supplied -b 30 -v 30 flags but
> obviously the blastall from blast2.2.22 does not "obey" my instruction.
>
> The output files generated are usually larger than 100G+ but indeed the
> final ones that I want usually are only of 10M-. Is there any way to help
> save our Earth (Not exaggerated, energy is WASTED in a meaningless
> manner)?

Why are you using such an old version? blastall 2.2.25 is out and may
have fixed this (I expect there is a changelog somewhere), or better yet
at some point you should switch to blast+ rather than continuing with
legacy blast.

Blast+ gives you limits -num_alignments, and -num_descriptions and
-max_target_seqs (from memory the first two only apply to the plain
text output).

Also, perhaps some of the hit property limits like e-value might be
relevant for limiting the number of results.

Peter

From hrh at fmi.ch  Thu Jul  7 05:02:47 2011
From: hrh at fmi.ch (Hans-Rudolf Hotz)
Date: Thu, 07 Jul 2011 11:02:47 +0200
Subject: [Bioperl-l] limit the number of blast output per query
In-Reply-To: <00d001cc3c4f$9bddd070$d3997150$@edu.hk>
References: 	<01f901cb7203$f66e4040$e34ac0c0$%yin@ucd.ie>			<004001cbe2d5$76598200$630c8600$@edu.hk>		<005301cbe31b$a3bee550$eb3caff0$@edu.hk>		<9CD1455E-88B4-4E2A-B3BC-398C10D5AAA9@tamu.edu>	<3E73745F-A687-4229-B71E-5C56B2D1FBAE@illinois.edu>		<009001cc2edc$09b80740$1d2815c0$@edu.hk>	<4DFE907D.1000204@gmail.com>
	<00d001cc3c4f$9bddd070$d3997150$@edu.hk>
Message-ID: <4E157637.3020408@fmi.ch>

Hi

just double checking: are you really talking abut "10,000+ hits"? or do 
you mean "10,000+ HSPs" ('high-scoring segment pairs')?

I don't know how your genome database looks like, but assuming you have 
one sequence per chromosome, then you will get just 24 hits (ie each 
chromosome) and then depending on your query each hit will have a lot of 
HSPs.

As far as as I know, there is no way to limit the number of HSPs (you 
might try playing with the E value).

You can try using the tabular output format (this will reduce the file 
size) - or may be BLAST is not the right search tool for your task?


Regards, Hans


On 07/07/2011 04:43 AM, Ross KK Leung wrote:
> I know this question should submit to BLAST help but it seems they have
> already been overwhelmed by incoming emails. I wonder any bioperl users
> happen to know how to limit the number of blast output per query. For
> example, for human genome as a database to blast against, a single query can
> generate 10,000+ hits. I have already supplied -b 30 -v 30 flags but
> obviously the blastall from blast2.2.22 does not "obey" my instruction.
>
> The output files generated are usually larger than 100G+ but indeed the
> final ones that I want usually are only of 10M-. Is there any way to help
> save our Earth (Not exaggerated, energy is WASTED in a meaningless manner)?
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

From bernd.web at gmail.com  Thu Jul  7 11:03:25 2011
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 7 Jul 2011 17:03:25 +0200
Subject: [Bioperl-l] Bio::DB::Taxonomy root not present
Message-ID: 

Hi,

I noticed Bio::DB::Taxonomy does not contain the root of the tree,
while the NCBI node file does.
For example, the lineage "root; cellular organisms; Bacteria" stops at
"cellular organisms", which means there is no parent node of
"cellular organisms". (see code below).  Also
$taxdb->get_Taxonomy_Node(1) would not return the Bio::Taxon object
for root  (using BioPerl 1.6.9).

Was there a reason not to include the node "root" in the index files
for Bio::DB::Taxonomy?


Kind regards,
Bernd

use strict;
use File::Spec;
use Bio::DB::Taxonomy;

my $prefix = '/scratch/taxonomy/';
my $taxdb = Bio::DB::Taxonomy->new
    (-source => 'flatfile',
     -directory => File::Spec->catfile($prefix,'idx'),
     -nodesfile => File::Spec->catfile($prefix,'nodes.dmp'),
     -namesfile => File::Spec->catfile($prefix,'names.dmp')
     );


my $taxid = '2';
my $node = $taxdb->get_Taxonomy_Node($taxid);
$node = $taxdb->ancestor($node);
print $node->node_name, "\n"; #prints: cellular organisms
$node = $taxdb->ancestor($node);
print $node->node_name, "\n"; #error :Can't call method "node_name" on
an undefined value at taxdb.pl line...

From bosborne11 at verizon.net  Thu Jul  7 11:16:13 2011
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 07 Jul 2011 11:16:13 -0400
Subject: [Bioperl-l] Bio::DB::Taxonomy root not present
In-Reply-To: 
References: 
Message-ID: <1394A6A5-105F-4A74-AE18-3834126806EA@verizon.net>

Bernd,

Yes, good question. Currently if you want to traverse up the tree from any given node you have to be aware that the tree may end at "cellular organisms" or "other sequences" or "unclassified sequences" or "Viruses" or "Viroids" but not at "root", this can make for awkward programming.

Brian O.

On Jul 7, 2011, at 11:03 AM, Bernd Web wrote:

> Hi,
> 
> I noticed Bio::DB::Taxonomy does not contain the root of the tree,
> while the NCBI node file does.
> For example, the lineage "root; cellular organisms; Bacteria" stops at
> "cellular organisms", which means there is no parent node of
> "cellular organisms". (see code below).  Also
> $taxdb->get_Taxonomy_Node(1) would not return the Bio::Taxon object
> for root  (using BioPerl 1.6.9).
> 
> Was there a reason not to include the node "root" in the index files
> for Bio::DB::Taxonomy?
> 
> 
> Kind regards,
> Bernd
> 
> use strict;
> use File::Spec;
> use Bio::DB::Taxonomy;
> 
> my $prefix = '/scratch/taxonomy/';
> my $taxdb = Bio::DB::Taxonomy->new
>    (-source => 'flatfile',
>     -directory => File::Spec->catfile($prefix,'idx'),
>     -nodesfile => File::Spec->catfile($prefix,'nodes.dmp'),
>     -namesfile => File::Spec->catfile($prefix,'names.dmp')
>     );
> 
> 
> my $taxid = '2';
> my $node = $taxdb->get_Taxonomy_Node($taxid);
> $node = $taxdb->ancestor($node);
> print $node->node_name, "\n"; #prints: cellular organisms
> $node = $taxdb->ancestor($node);
> print $node->node_name, "\n"; #error :Can't call method "node_name" on
> an undefined value at taxdb.pl line...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From jason.stajich at gmail.com  Thu Jul  7 12:22:21 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Thu, 7 Jul 2011 09:22:21 -0700
Subject: [Bioperl-l] Bio::DB::Taxonomy root not present
In-Reply-To: <1394A6A5-105F-4A74-AE18-3834126806EA@verizon.net>
References: 
	<1394A6A5-105F-4A74-AE18-3834126806EA@verizon.net>
Message-ID: 

This is the code in the flatfile _build_index method that prevents it from being done - I don't remember the logic behind it but there was a reason for this code being put in.


  while () {
            chomp;	    
            my ($taxid, $name, $unique_name, $class) = split(/\t\|\t/,$_);
			# don't include the fake root node 'root' or 'all' with id 1
			next if $taxid == 1;

On Jul 7, 2011, at 8:16 AM, Brian Osborne wrote:

> Bernd,
> 
> Yes, good question. Currently if you want to traverse up the tree from any given node you have to be aware that the tree may end at "cellular organisms" or "other sequences" or "unclassified sequences" or "Viruses" or "Viroids" but not at "root", this can make for awkward programming.
> 
> Brian O.
> 
> On Jul 7, 2011, at 11:03 AM, Bernd Web wrote:
> 
>> Hi,
>> 
>> I noticed Bio::DB::Taxonomy does not contain the root of the tree,
>> while the NCBI node file does.
>> For example, the lineage "root; cellular organisms; Bacteria" stops at
>> "cellular organisms", which means there is no parent node of
>> "cellular organisms". (see code below).  Also
>> $taxdb->get_Taxonomy_Node(1) would not return the Bio::Taxon object
>> for root  (using BioPerl 1.6.9).
>> 
>> Was there a reason not to include the node "root" in the index files
>> for Bio::DB::Taxonomy?
>> 
>> 
>> Kind regards,
>> Bernd
>> 
>> use strict;
>> use File::Spec;
>> use Bio::DB::Taxonomy;
>> 
>> my $prefix = '/scratch/taxonomy/';
>> my $taxdb = Bio::DB::Taxonomy->new
>>   (-source => 'flatfile',
>>    -directory => File::Spec->catfile($prefix,'idx'),
>>    -nodesfile => File::Spec->catfile($prefix,'nodes.dmp'),
>>    -namesfile => File::Spec->catfile($prefix,'names.dmp')
>>    );
>> 
>> 
>> my $taxid = '2';
>> my $node = $taxdb->get_Taxonomy_Node($taxid);
>> $node = $taxdb->ancestor($node);
>> print $node->node_name, "\n"; #prints: cellular organisms
>> $node = $taxdb->ancestor($node);
>> print $node->node_name, "\n"; #error :Can't call method "node_name" on
>> an undefined value at taxdb.pl line...
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at illinois.edu  Thu Jul  7 12:28:48 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 7 Jul 2011 11:28:48 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy root not present
In-Reply-To: 
References: 
	<1394A6A5-105F-4A74-AE18-3834126806EA@verizon.net>
	
Message-ID: 

Is it an arbitrary root, and not a real taxonomically relevant one?

chris

On Jul 7, 2011, at 11:22 AM, Jason Stajich wrote:

> This is the code in the flatfile _build_index method that prevents it from being done - I don't remember the logic behind it but there was a reason for this code being put in.
> 
> 
>  while () {
>            chomp;	    
>            my ($taxid, $name, $unique_name, $class) = split(/\t\|\t/,$_);
> 			# don't include the fake root node 'root' or 'all' with id 1
> 			next if $taxid == 1;
> 
> On Jul 7, 2011, at 8:16 AM, Brian Osborne wrote:
> 
>> Bernd,
>> 
>> Yes, good question. Currently if you want to traverse up the tree from any given node you have to be aware that the tree may end at "cellular organisms" or "other sequences" or "unclassified sequences" or "Viruses" or "Viroids" but not at "root", this can make for awkward programming.
>> 
>> Brian O.
>> 
>> On Jul 7, 2011, at 11:03 AM, Bernd Web wrote:
>> 
>>> Hi,
>>> 
>>> I noticed Bio::DB::Taxonomy does not contain the root of the tree,
>>> while the NCBI node file does.
>>> For example, the lineage "root; cellular organisms; Bacteria" stops at
>>> "cellular organisms", which means there is no parent node of
>>> "cellular organisms". (see code below).  Also
>>> $taxdb->get_Taxonomy_Node(1) would not return the Bio::Taxon object
>>> for root  (using BioPerl 1.6.9).
>>> 
>>> Was there a reason not to include the node "root" in the index files
>>> for Bio::DB::Taxonomy?
>>> 
>>> 
>>> Kind regards,
>>> Bernd
>>> 
>>> use strict;
>>> use File::Spec;
>>> use Bio::DB::Taxonomy;
>>> 
>>> my $prefix = '/scratch/taxonomy/';
>>> my $taxdb = Bio::DB::Taxonomy->new
>>>  (-source => 'flatfile',
>>>   -directory => File::Spec->catfile($prefix,'idx'),
>>>   -nodesfile => File::Spec->catfile($prefix,'nodes.dmp'),
>>>   -namesfile => File::Spec->catfile($prefix,'names.dmp')
>>>   );
>>> 
>>> 
>>> my $taxid = '2';
>>> my $node = $taxdb->get_Taxonomy_Node($taxid);
>>> $node = $taxdb->ancestor($node);
>>> print $node->node_name, "\n"; #prints: cellular organisms
>>> $node = $taxdb->ancestor($node);
>>> print $node->node_name, "\n"; #error :Can't call method "node_name" on
>>> an undefined value at taxdb.pl line...
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at illinois.edu  Thu Jul  7 12:34:26 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 7 Jul 2011 11:34:26 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy root not present
In-Reply-To: <1394A6A5-105F-4A74-AE18-3834126806EA@verizon.net>
References: 
	<1394A6A5-105F-4A74-AE18-3834126806EA@verizon.net>
Message-ID: 

It's probably a good idea to revisit this if there are any questions.  I ran 'git blame' and the changes to the code Jason pointed out were from Sendu, so they appear to be recent (though that could also be showing up from simple reformatting of the code).

chris

On Jul 7, 2011, at 10:16 AM, Brian Osborne wrote:

> Bernd,
> 
> Yes, good question. Currently if you want to traverse up the tree from any given node you have to be aware that the tree may end at "cellular organisms" or "other sequences" or "unclassified sequences" or "Viruses" or "Viroids" but not at "root", this can make for awkward programming.
> 
> Brian O.
> 
> On Jul 7, 2011, at 11:03 AM, Bernd Web wrote:
> 
>> Hi,
>> 
>> I noticed Bio::DB::Taxonomy does not contain the root of the tree,
>> while the NCBI node file does.
>> For example, the lineage "root; cellular organisms; Bacteria" stops at
>> "cellular organisms", which means there is no parent node of
>> "cellular organisms". (see code below).  Also
>> $taxdb->get_Taxonomy_Node(1) would not return the Bio::Taxon object
>> for root  (using BioPerl 1.6.9).
>> 
>> Was there a reason not to include the node "root" in the index files
>> for Bio::DB::Taxonomy?
>> 
>> 
>> Kind regards,
>> Bernd
>> 
>> use strict;
>> use File::Spec;
>> use Bio::DB::Taxonomy;
>> 
>> my $prefix = '/scratch/taxonomy/';
>> my $taxdb = Bio::DB::Taxonomy->new
>>   (-source => 'flatfile',
>>    -directory => File::Spec->catfile($prefix,'idx'),
>>    -nodesfile => File::Spec->catfile($prefix,'nodes.dmp'),
>>    -namesfile => File::Spec->catfile($prefix,'names.dmp')
>>    );
>> 
>> 
>> my $taxid = '2';
>> my $node = $taxdb->get_Taxonomy_Node($taxid);
>> $node = $taxdb->ancestor($node);
>> print $node->node_name, "\n"; #prints: cellular organisms
>> $node = $taxdb->ancestor($node);
>> print $node->node_name, "\n"; #error :Can't call method "node_name" on
>> an undefined value at taxdb.pl line...
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at illinois.edu  Thu Jul  7 12:40:43 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 7 Jul 2011 11:40:43 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy root not present
In-Reply-To: <1394A6A5-105F-4A74-AE18-3834126806EA@verizon.net>
References: 
	<1394A6A5-105F-4A74-AE18-3834126806EA@verizon.net>
Message-ID: 

Okay, to reanswer in a more definitive way, this appears to have been added by Sendu in relation to these bug reports:

https://redmine.open-bio.org/issues/2061
https://redmine.open-bio.org/issues/2047

The main one is bug 2061, where this is present:

Bio::DB::Taxonomy::flatfile
---------------------------

? API-CHANGES
get_Children_Taxids is deprecated - method no longer part of the DB::Taxonomy interface, and superseded by each_Descendent (which is actually implemented by all databases).
? Implementation changes
No longer includes the fake root node 'root'; there are multiple roots now (10239, 12884, 12908, 29384 and 131567). This means when getting the lineage you no longer have to remove the root node. This is now consistent with the results possible with entrez. 
NB: You have to delete your current indexes before you will notice the change.

chris

On Jul 7, 2011, at 10:16 AM, Brian Osborne wrote:

> Bernd,
> 
> Yes, good question. Currently if you want to traverse up the tree from any given node you have to be aware that the tree may end at "cellular organisms" or "other sequences" or "unclassified sequences" or "Viruses" or "Viroids" but not at "root", this can make for awkward programming.
> 
> Brian O.
> 
> On Jul 7, 2011, at 11:03 AM, Bernd Web wrote:
> 
>> Hi,
>> 
>> I noticed Bio::DB::Taxonomy does not contain the root of the tree,
>> while the NCBI node file does.
>> For example, the lineage "root; cellular organisms; Bacteria" stops at
>> "cellular organisms", which means there is no parent node of
>> "cellular organisms". (see code below).  Also
>> $taxdb->get_Taxonomy_Node(1) would not return the Bio::Taxon object
>> for root  (using BioPerl 1.6.9).
>> 
>> Was there a reason not to include the node "root" in the index files
>> for Bio::DB::Taxonomy?
>> 
>> 
>> Kind regards,
>> Bernd
>> 
>> use strict;
>> use File::Spec;
>> use Bio::DB::Taxonomy;
>> 
>> my $prefix = '/scratch/taxonomy/';
>> my $taxdb = Bio::DB::Taxonomy->new
>>   (-source => 'flatfile',
>>    -directory => File::Spec->catfile($prefix,'idx'),
>>    -nodesfile => File::Spec->catfile($prefix,'nodes.dmp'),
>>    -namesfile => File::Spec->catfile($prefix,'names.dmp')
>>    );
>> 
>> 
>> my $taxid = '2';
>> my $node = $taxdb->get_Taxonomy_Node($taxid);
>> $node = $taxdb->ancestor($node);
>> print $node->node_name, "\n"; #prints: cellular organisms
>> $node = $taxdb->ancestor($node);
>> print $node->node_name, "\n"; #error :Can't call method "node_name" on
>> an undefined value at taxdb.pl line...
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From kellert at ohsu.edu  Thu Jul  7 12:54:01 2011
From: kellert at ohsu.edu (Tom Keller)
Date: Thu, 7 Jul 2011 09:54:01 -0700
Subject: [Bioperl-l] Message:2 limit the number of blast output per query
In-Reply-To: 
References: 
Message-ID: 

Set the E-value to a much lower number. 

Thomas (Tom) Keller, PhD
kellert at ohsu.edu
503.494.2442
6588 R Jones Hall (BSc/CROET)
www.ohsu.edu/xd/research/research-cores/dna-analysis/

On Jul 7, 2011, at 9:00 AM,  wrote:

> limit the number of blast output per query



From member at linkedin.com  Thu Jul  7 14:54:06 2011
From: member at linkedin.com (Michael Seewald via LinkedIn)
Date: Thu, 7 Jul 2011 18:54:06 +0000 (UTC)
Subject: [Bioperl-l] Invitation to connect on LinkedIn
Message-ID: <1194316715.2861202.1310064846053.JavaMail.app@ela4-bed39.prod>

LinkedIn
------------




    Michael Seewald requested to add you as a connection on LinkedIn:
  
------------------------------------------

Bolotin,,

I'd like to add you to my professional network on LinkedIn.

- Michael

Accept invitation from Michael Seewald
http://www.linkedin.com/e/5drwke-gpu2s8o1-43/uez6TYkHzbaXxXM-lUk23auFwJZodcPlXc2UWC0Ao8h/blk/I2944665663_2/1BpC5vrmRLoRZcjkkZt5YCpnlOt3RApnhMpmdzgmhxrSNBszYOnPcSdzkSdzgQej99bSRcjQpTi6oPbPwTe38Nc3kSd3cLrCBxbOYWrSlI/EML_comm_afe/

View invitation from Michael Seewald
http://www.linkedin.com/e/5drwke-gpu2s8o1-43/uez6TYkHzbaXxXM-lUk23auFwJZodcPlXc2UWC0Ao8h/blk/I2944665663_2/39vcPoSdjoSd3gVcAALqnpPbOYWrSlI/svi/ 
------------------------------------------

DID YOU KNOW LinkedIn can help you find the right service providers using recommendations from your trusted network? Using LinkedIn Services, you can take the risky guesswork out of selecting service providers by reading the recommendations of credible, trustworthy members of your network. 
http://www.linkedin.com/e/5drwke-gpu2s8o1-43/svp/inv-25/

 
-- 
(c) 2011, LinkedIn Corporation

From cjfields at illinois.edu  Thu Jul  7 15:30:36 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 7 Jul 2011 14:30:36 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy root not present
In-Reply-To: <657E799C-3744-415F-A1E2-215DB70463F4@verizon.net>
References: 
	<1394A6A5-105F-4A74-AE18-3834126806EA@verizon.net>
	
	<657E799C-3744-415F-A1E2-215DB70463F4@verizon.net>
Message-ID: 

I agree I view this as a bug just based on the principle of least surprise (I would expect the data to follow NCBI's w/o have any underlying changes).  Do you want to work on that? Might be interesting to see if anything else breaks...

chris

On Jul 7, 2011, at 2:22 PM, Brian Osborne wrote:

> All,
> 
> It's true that "root" is "fake" or non-existent. It's also true that having 5 trees instead of 1 is incorrect scientifically and awkward programmatically. Perhaps the most salient point is that "root" exists as a page in the NCBI Taxonomy and as an entry in the *dmp files, as Bernd says, and it has an tax id of 1. So if the goal is to be faithful to NCBI Taxonomy then it should be restored.
> 
> Brian O.
> 
> On Jul 7, 2011, at 12:40 PM, Chris Fields wrote:
> 
>> Okay, to reanswer in a more definitive way, this appears to have been added by Sendu in relation to these bug reports:
>> 
>> https://redmine.open-bio.org/issues/2061
>> https://redmine.open-bio.org/issues/2047
>> 
>> The main one is bug 2061, where this is present:
>> 
>> Bio::DB::Taxonomy::flatfile
>> ---------------------------
>> 
>> ? API-CHANGES
>> get_Children_Taxids is deprecated - method no longer part of the DB::Taxonomy interface, and superseded by each_Descendent (which is actually implemented by all databases).
>> ? Implementation changes
>> No longer includes the fake root node 'root'; there are multiple roots now (10239, 12884, 12908, 29384 and 131567). This means when getting the lineage you no longer have to remove the root node. This is now consistent with the results possible with entrez. 
>> NB: You have to delete your current indexes before you will notice the change.
>> 
>> chris
>> 
>> On Jul 7, 2011, at 10:16 AM, Brian Osborne wrote:
>> 
>>> Bernd,
>>> 
>>> Yes, good question. Currently if you want to traverse up the tree from any given node you have to be aware that the tree may end at "cellular organisms" or "other sequences" or "unclassified sequences" or "Viruses" or "Viroids" but not at "root", this can make for awkward programming.
>>> 
>>> Brian O.
>>> 
>>> On Jul 7, 2011, at 11:03 AM, Bernd Web wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I noticed Bio::DB::Taxonomy does not contain the root of the tree,
>>>> while the NCBI node file does.
>>>> For example, the lineage "root; cellular organisms; Bacteria" stops at
>>>> "cellular organisms", which means there is no parent node of
>>>> "cellular organisms". (see code below).  Also
>>>> $taxdb->get_Taxonomy_Node(1) would not return the Bio::Taxon object
>>>> for root  (using BioPerl 1.6.9).
>>>> 
>>>> Was there a reason not to include the node "root" in the index files
>>>> for Bio::DB::Taxonomy?
>>>> 
>>>> 
>>>> Kind regards,
>>>> Bernd
>>>> 
>>>> use strict;
>>>> use File::Spec;
>>>> use Bio::DB::Taxonomy;
>>>> 
>>>> my $prefix = '/scratch/taxonomy/';
>>>> my $taxdb = Bio::DB::Taxonomy->new
>>>> (-source => 'flatfile',
>>>>  -directory => File::Spec->catfile($prefix,'idx'),
>>>>  -nodesfile => File::Spec->catfile($prefix,'nodes.dmp'),
>>>>  -namesfile => File::Spec->catfile($prefix,'names.dmp')
>>>>  );
>>>> 
>>>> 
>>>> my $taxid = '2';
>>>> my $node = $taxdb->get_Taxonomy_Node($taxid);
>>>> $node = $taxdb->ancestor($node);
>>>> print $node->node_name, "\n"; #prints: cellular organisms
>>>> $node = $taxdb->ancestor($node);
>>>> print $node->node_name, "\n"; #error :Can't call method "node_name" on
>>>> an undefined value at taxdb.pl line...
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 



From bosborne11 at verizon.net  Thu Jul  7 15:22:12 2011
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 07 Jul 2011 15:22:12 -0400
Subject: [Bioperl-l] Bio::DB::Taxonomy root not present
In-Reply-To: 
References: 
	<1394A6A5-105F-4A74-AE18-3834126806EA@verizon.net>
	
Message-ID: <657E799C-3744-415F-A1E2-215DB70463F4@verizon.net>

All,

It's true that "root" is "fake" or non-existent. It's also true that having 5 trees instead of 1 is incorrect scientifically and awkward programmatically. Perhaps the most salient point is that "root" exists as a page in the NCBI Taxonomy and as an entry in the *dmp files, as Bernd says, and it has an tax id of 1. So if the goal is to be faithful to NCBI Taxonomy then it should be restored.

Brian O.

On Jul 7, 2011, at 12:40 PM, Chris Fields wrote:

> Okay, to reanswer in a more definitive way, this appears to have been added by Sendu in relation to these bug reports:
> 
> https://redmine.open-bio.org/issues/2061
> https://redmine.open-bio.org/issues/2047
> 
> The main one is bug 2061, where this is present:
> 
> Bio::DB::Taxonomy::flatfile
> ---------------------------
> 
> ? API-CHANGES
> get_Children_Taxids is deprecated - method no longer part of the DB::Taxonomy interface, and superseded by each_Descendent (which is actually implemented by all databases).
> ? Implementation changes
> No longer includes the fake root node 'root'; there are multiple roots now (10239, 12884, 12908, 29384 and 131567). This means when getting the lineage you no longer have to remove the root node. This is now consistent with the results possible with entrez. 
> NB: You have to delete your current indexes before you will notice the change.
> 
> chris
> 
> On Jul 7, 2011, at 10:16 AM, Brian Osborne wrote:
> 
>> Bernd,
>> 
>> Yes, good question. Currently if you want to traverse up the tree from any given node you have to be aware that the tree may end at "cellular organisms" or "other sequences" or "unclassified sequences" or "Viruses" or "Viroids" but not at "root", this can make for awkward programming.
>> 
>> Brian O.
>> 
>> On Jul 7, 2011, at 11:03 AM, Bernd Web wrote:
>> 
>>> Hi,
>>> 
>>> I noticed Bio::DB::Taxonomy does not contain the root of the tree,
>>> while the NCBI node file does.
>>> For example, the lineage "root; cellular organisms; Bacteria" stops at
>>> "cellular organisms", which means there is no parent node of
>>> "cellular organisms". (see code below).  Also
>>> $taxdb->get_Taxonomy_Node(1) would not return the Bio::Taxon object
>>> for root  (using BioPerl 1.6.9).
>>> 
>>> Was there a reason not to include the node "root" in the index files
>>> for Bio::DB::Taxonomy?
>>> 
>>> 
>>> Kind regards,
>>> Bernd
>>> 
>>> use strict;
>>> use File::Spec;
>>> use Bio::DB::Taxonomy;
>>> 
>>> my $prefix = '/scratch/taxonomy/';
>>> my $taxdb = Bio::DB::Taxonomy->new
>>>  (-source => 'flatfile',
>>>   -directory => File::Spec->catfile($prefix,'idx'),
>>>   -nodesfile => File::Spec->catfile($prefix,'nodes.dmp'),
>>>   -namesfile => File::Spec->catfile($prefix,'names.dmp')
>>>   );
>>> 
>>> 
>>> my $taxid = '2';
>>> my $node = $taxdb->get_Taxonomy_Node($taxid);
>>> $node = $taxdb->ancestor($node);
>>> print $node->node_name, "\n"; #prints: cellular organisms
>>> $node = $taxdb->ancestor($node);
>>> print $node->node_name, "\n"; #error :Can't call method "node_name" on
>>> an undefined value at taxdb.pl line...
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From bernd.web at gmail.com  Thu Jul  7 16:31:04 2011
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 7 Jul 2011 22:31:04 +0200
Subject: [Bioperl-l] Bio::DB::Taxonomy root not present
In-Reply-To: 
References: 
	<1394A6A5-105F-4A74-AE18-3834126806EA@verizon.net>
	
	<657E799C-3744-415F-A1E2-215DB70463F4@verizon.net>
	
Message-ID: 

I was indeed surprised to see these multiple trees.
The only exception for the root node is that its "parent" is also 1.
(nodes.dmp: 1	|	1	|	no rank	|	etc).

On Thu, Jul 7, 2011 at 9:30 PM, Chris Fields  wrote:
> I agree I view this as a bug just based on the principle of least surprise (I would expect the data to follow NCBI's w/o have any underlying changes). ?Do you want to work on that? Might be interesting to see if anything else breaks...
>
> chris
>
> On Jul 7, 2011, at 2:22 PM, Brian Osborne wrote:
>
>> All,
>>
>> It's true that "root" is "fake" or non-existent. It's also true that having 5 trees instead of 1 is incorrect scientifically and awkward programmatically. Perhaps the most salient point is that "root" exists as a page in the NCBI Taxonomy and as an entry in the *dmp files, as Bernd says, and it has an tax id of 1. So if the goal is to be faithful to NCBI Taxonomy then it should be restored.
>>
>> Brian O.
>>
>> On Jul 7, 2011, at 12:40 PM, Chris Fields wrote:
>>
>>> Okay, to reanswer in a more definitive way, this appears to have been added by Sendu in relation to these bug reports:
>>>
>>> https://redmine.open-bio.org/issues/2061
>>> https://redmine.open-bio.org/issues/2047
>>>
>>> The main one is bug 2061, where this is present:
>>>
>>> Bio::DB::Taxonomy::flatfile
>>> ---------------------------
>>>
>>> ? API-CHANGES
>>> get_Children_Taxids is deprecated - method no longer part of the DB::Taxonomy interface, and superseded by each_Descendent (which is actually implemented by all databases).
>>> ? Implementation changes
>>> No longer includes the fake root node 'root'; there are multiple roots now (10239, 12884, 12908, 29384 and 131567). This means when getting the lineage you no longer have to remove the root node. This is now consistent with the results possible with entrez.
>>> NB: You have to delete your current indexes before you will notice the change.
>>>
>>> chris
>>>
>>> On Jul 7, 2011, at 10:16 AM, Brian Osborne wrote:
>>>
>>>> Bernd,
>>>>
>>>> Yes, good question. Currently if you want to traverse up the tree from any given node you have to be aware that the tree may end at "cellular organisms" or "other sequences" or "unclassified sequences" or "Viruses" or "Viroids" but not at "root", this can make for awkward programming.
>>>>
>>>> Brian O.
>>>>
>>>> On Jul 7, 2011, at 11:03 AM, Bernd Web wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I noticed Bio::DB::Taxonomy does not contain the root of the tree,
>>>>> while the NCBI node file does.
>>>>> For example, the lineage "root; cellular organisms; Bacteria" stops at
>>>>> "cellular organisms", which means there is no parent node of
>>>>> "cellular organisms". (see code below). ?Also
>>>>> $taxdb->get_Taxonomy_Node(1) would not return the Bio::Taxon object
>>>>> for root ?(using BioPerl 1.6.9).
>>>>>
>>>>> Was there a reason not to include the node "root" in the index files
>>>>> for Bio::DB::Taxonomy?
>>>>>
>>>>>
>>>>> Kind regards,
>>>>> Bernd
>>>>>
>>>>> use strict;
>>>>> use File::Spec;
>>>>> use Bio::DB::Taxonomy;
>>>>>
>>>>> my $prefix = '/scratch/taxonomy/';
>>>>> my $taxdb = Bio::DB::Taxonomy->new
>>>>> (-source => 'flatfile',
>>>>> ?-directory => File::Spec->catfile($prefix,'idx'),
>>>>> ?-nodesfile => File::Spec->catfile($prefix,'nodes.dmp'),
>>>>> ?-namesfile => File::Spec->catfile($prefix,'names.dmp')
>>>>> ?);
>>>>>
>>>>>
>>>>> my $taxid = '2';
>>>>> my $node = $taxdb->get_Taxonomy_Node($taxid);
>>>>> $node = $taxdb->ancestor($node);
>>>>> print $node->node_name, "\n"; #prints: cellular organisms
>>>>> $node = $taxdb->ancestor($node);
>>>>> print $node->node_name, "\n"; #error :Can't call method "node_name" on
>>>>> an undefined value at taxdb.pl line...
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>


From bosborne11 at verizon.net  Thu Jul  7 15:35:47 2011
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 07 Jul 2011 15:35:47 -0400
Subject: [Bioperl-l] Bio::DB::Taxonomy root not present
In-Reply-To: 
References: 
	<1394A6A5-105F-4A74-AE18-3834126806EA@verizon.net>
	
	<657E799C-3744-415F-A1E2-215DB70463F4@verizon.net>
	
Message-ID: <9885C22C-A475-4102-A911-72E9F9059CF1@verizon.net>

OK, will take a look.

On Jul 7, 2011, at 3:30 PM, Chris Fields wrote:

> I agree I view this as a bug just based on the principle of least surprise (I would expect the data to follow NCBI's w/o have any underlying changes).  Do you want to work on that? Might be interesting to see if anything else breaks...
> 
> chris
> 
> On Jul 7, 2011, at 2:22 PM, Brian Osborne wrote:
> 
>> All,
>> 
>> It's true that "root" is "fake" or non-existent. It's also true that having 5 trees instead of 1 is incorrect scientifically and awkward programmatically. Perhaps the most salient point is that "root" exists as a page in the NCBI Taxonomy and as an entry in the *dmp files, as Bernd says, and it has an tax id of 1. So if the goal is to be faithful to NCBI Taxonomy then it should be restored.
>> 
>> Brian O.
>> 
>> On Jul 7, 2011, at 12:40 PM, Chris Fields wrote:
>> 
>>> Okay, to reanswer in a more definitive way, this appears to have been added by Sendu in relation to these bug reports:
>>> 
>>> https://redmine.open-bio.org/issues/2061
>>> https://redmine.open-bio.org/issues/2047
>>> 
>>> The main one is bug 2061, where this is present:
>>> 
>>> Bio::DB::Taxonomy::flatfile
>>> ---------------------------
>>> 
>>> ? API-CHANGES
>>> get_Children_Taxids is deprecated - method no longer part of the DB::Taxonomy interface, and superseded by each_Descendent (which is actually implemented by all databases).
>>> ? Implementation changes
>>> No longer includes the fake root node 'root'; there are multiple roots now (10239, 12884, 12908, 29384 and 131567). This means when getting the lineage you no longer have to remove the root node. This is now consistent with the results possible with entrez. 
>>> NB: You have to delete your current indexes before you will notice the change.
>>> 
>>> chris
>>> 
>>> On Jul 7, 2011, at 10:16 AM, Brian Osborne wrote:
>>> 
>>>> Bernd,
>>>> 
>>>> Yes, good question. Currently if you want to traverse up the tree from any given node you have to be aware that the tree may end at "cellular organisms" or "other sequences" or "unclassified sequences" or "Viruses" or "Viroids" but not at "root", this can make for awkward programming.
>>>> 
>>>> Brian O.
>>>> 
>>>> On Jul 7, 2011, at 11:03 AM, Bernd Web wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I noticed Bio::DB::Taxonomy does not contain the root of the tree,
>>>>> while the NCBI node file does.
>>>>> For example, the lineage "root; cellular organisms; Bacteria" stops at
>>>>> "cellular organisms", which means there is no parent node of
>>>>> "cellular organisms". (see code below).  Also
>>>>> $taxdb->get_Taxonomy_Node(1) would not return the Bio::Taxon object
>>>>> for root  (using BioPerl 1.6.9).
>>>>> 
>>>>> Was there a reason not to include the node "root" in the index files
>>>>> for Bio::DB::Taxonomy?
>>>>> 
>>>>> 
>>>>> Kind regards,
>>>>> Bernd
>>>>> 
>>>>> use strict;
>>>>> use File::Spec;
>>>>> use Bio::DB::Taxonomy;
>>>>> 
>>>>> my $prefix = '/scratch/taxonomy/';
>>>>> my $taxdb = Bio::DB::Taxonomy->new
>>>>> (-source => 'flatfile',
>>>>> -directory => File::Spec->catfile($prefix,'idx'),
>>>>> -nodesfile => File::Spec->catfile($prefix,'nodes.dmp'),
>>>>> -namesfile => File::Spec->catfile($prefix,'names.dmp')
>>>>> );
>>>>> 
>>>>> 
>>>>> my $taxid = '2';
>>>>> my $node = $taxdb->get_Taxonomy_Node($taxid);
>>>>> $node = $taxdb->ancestor($node);
>>>>> print $node->node_name, "\n"; #prints: cellular organisms
>>>>> $node = $taxdb->ancestor($node);
>>>>> print $node->node_name, "\n"; #error :Can't call method "node_name" on
>>>>> an undefined value at taxdb.pl line...
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From nadel at nabsys.com  Thu Jul  7 17:04:37 2011
From: nadel at nabsys.com (Mark Nadel)
Date: Thu, 7 Jul 2011 17:04:37 -0400
Subject: [Bioperl-l] position method in Bio::Restriction::Analysis
Message-ID: 

I am having trouble using the position method, and no sample code is
included in the documentation.

Here is my script:
*
*
*

use strict;

use Bio::Restriction::EnzymeCollection;

use Bio::Restriction::Analysis;

use Bio::DB::GenBank;

use Bio::Seq;

use Bio::SeqIO;

use Bio::Seq::RichSeq;

use Bio::Tools::SeqStats;



my $accension_number = 'M77815';  ##'U00096.2';


my $outputFile = "/Users/marknadel/Documents/UniqueCutters".$
accension_number.".txt";


open OUT, ">$outputFile" or die "Can't open $outputFile";


my $db = Bio::DB::GenBank->new();


my $seq = $db->get_Seq_by_acc($accension_number);


print ">";

print $seq->desc();

print "\tThe sequence is circular:";

print $seq->is_circular();



print "\n";


my $ra = Bio::Restriction::Analysis->new(-seq=>$seq);


my $all_cutters = $ra->cutters;


my $uniqe = $ra->unique_cutters;





foreach my $enz ($uniqe->each_enzyme()){

print $enz->name();

print OUT $enz->name();

print "\t";

print OUT "\t";

    my @cutpoint = $enz->position();

    #print $cutpoint;

   # print OUT $cutpoint;





   }

print "\n";

    print OUT "\n";

    close OUT;

---------------------


and here is the output:


>M13mp18 phage cloning vector. The sequence is circular:1

AasI Can't locate object method "position" via package
"Bio::Restriction::Enzyme"
at /Users/marknadel/Documents/workspace/adHoc/unique_cutters.pl line 42,
 line 532.


I had a similar problem before with another method  in this package and
someone was kind enough to give me the exact syntax.

Thanks in advance,

Mark

*

-- 
*Mark Nadel*
*Principal Scientist
*
NABsys Inc.
60 Clifford Street
Providence, RI  02903

Phone   401-276-9100 x204
Fax 401-276-9122

From Kevin.M.Brown at asu.edu  Thu Jul  7 17:26:10 2011
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 7 Jul 2011 14:26:10 -0700
Subject: [Bioperl-l] position method in Bio::Restriction::Analysis
References: 
Message-ID: <1A4207F8295607498283FE9E93B775B46B8D61@EX02.asurite.ad.asu.edu>

Try using the Deobfuscator. Not sure where you got the information that Restriction::Enzyme has a position method, but according to the docs for Bioperl-live it doesn't.

http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3A%3ARestriction%3A%3AEnzyme&sort_order=by+method&search_string=Bio%3A%3ARestriction

I think you're wanting the overhang_seq method which returns a Bio::Locatable object.


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org on behalf of Mark Nadel
Sent: Thu 7/7/2011 2:04 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] position method in Bio::Restriction::Analysis
 
I am having trouble using the position method, and no sample code is
included in the documentation.

Here is my script:
*
*
*

use strict;

use Bio::Restriction::EnzymeCollection;

use Bio::Restriction::Analysis;

use Bio::DB::GenBank;

use Bio::Seq;

use Bio::SeqIO;

use Bio::Seq::RichSeq;

use Bio::Tools::SeqStats;



my $accension_number = 'M77815';  ##'U00096.2';


my $outputFile = "/Users/marknadel/Documents/UniqueCutters".$
accension_number.".txt";


open OUT, ">$outputFile" or die "Can't open $outputFile";


my $db = Bio::DB::GenBank->new();


my $seq = $db->get_Seq_by_acc($accension_number);


print ">";

print $seq->desc();

print "\tThe sequence is circular:";

print $seq->is_circular();



print "\n";


my $ra = Bio::Restriction::Analysis->new(-seq=>$seq);


my $all_cutters = $ra->cutters;


my $uniqe = $ra->unique_cutters;





foreach my $enz ($uniqe->each_enzyme()){

print $enz->name();

print OUT $enz->name();

print "\t";

print OUT "\t";

    my @cutpoint = $enz->position();

    #print $cutpoint;

   # print OUT $cutpoint;





   }

print "\n";

    print OUT "\n";

    close OUT;

---------------------


and here is the output:


>M13mp18 phage cloning vector. The sequence is circular:1

AasI Can't locate object method "position" via package
"Bio::Restriction::Enzyme"
at /Users/marknadel/Documents/workspace/adHoc/unique_cutters.pl line 42,
 line 532.


I had a similar problem before with another method  in this package and
someone was kind enough to give me the exact syntax.

Thanks in advance,

Mark

*

-- 
*Mark Nadel*
*Principal Scientist
*
NABsys Inc.
60 Clifford Street
Providence, RI  02903

Phone   401-276-9100 x204
Fax 401-276-9122
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l




From hlapp at drycafe.net  Thu Jul  7 18:28:21 2011
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 7 Jul 2011 18:28:21 -0400
Subject: [Bioperl-l] Bio::DB::Taxonomy root not present
In-Reply-To: <657E799C-3744-415F-A1E2-215DB70463F4@verizon.net>
References: 
	<1394A6A5-105F-4A74-AE18-3834126806EA@verizon.net>
	
	<657E799C-3744-415F-A1E2-215DB70463F4@verizon.net>
Message-ID: <46BDE3BF-4F3C-4C13-BFD3-FA0797B63EDB@drycafe.net>


On Jul 7, 2011, at 3:22 PM, Brian Osborne wrote:

> It's also true that having 5 trees instead of 1 is incorrect  
> scientifically

That's a strong statement and I'm not sure I agree with this - let's  
keep in mind that these are taxonomies, not phylogenetic trees of all  
of life. Not every taxonomy has a node for "all of life" or for LUCA.  
For example, ITIS, one of the most widely used taxonomies if you're  
not dealing strictly with molecular data, does not - there is one  
"tree" for each kingdom of life. (Not that I want to recommend that as  
a good thing.)

I agree with your programming awkwardness argument, though I would  
add  that looking for a specific label of a node to identify the root  
is always a bad (because fragile) thing to do. A better way to  
identify a root node would be parent undefined, or being the same as  
the node itself. If the code did that for each the 5 or so children of  
'root', then the fake root could be removed.

At the very least code in Bio::DB::Taxonomy or anywhere else should  
not assume that there is a single root for a taxonomy. Because there  
are taxonomies for which this really isn't the case.

	-hilmar

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================





From hlapp at drycafe.net  Thu Jul  7 23:05:53 2011
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 7 Jul 2011 23:05:53 -0400
Subject: [Bioperl-l] Bio::DB::Taxonomy root not present
In-Reply-To: 
References: 
	<1394A6A5-105F-4A74-AE18-3834126806EA@verizon.net>
	
	<657E799C-3744-415F-A1E2-215DB70463F4@verizon.net>
	<46BDE3BF-4F3C-4C13-BFD3-FA0797B63EDB@drycafe.net>
	
Message-ID: 

Right, but do you agree that looking for the label 'root' shouldn't be  
the only way to identify it?

	-hilmar

On Jul 7, 2011, at 10:17 PM, Brian Osborne wrote

> Hilmar,
>
> Instead of addressing the side issues address what I named as the  
> most salient issue: if we base this module's behaviour on NCBI's  
> taxonomy - and all data says this module should mirror NCBI's  
> taxonomy - then a node called "root" should exist since it exists in  
> NCBI's data files. Right?
>
> BIO
>
> On Jul 7, 2011, at 6:28 PM, Hilmar Lapp wrote:
>
>>
>> On Jul 7, 2011, at 3:22 PM, Brian Osborne wrote:
>>
>>> It's also true that having 5 trees instead of 1 is incorrect  
>>> scientifically
>>
>> That's a strong statement and I'm not sure I agree with this -  
>> let's keep in mind that these are taxonomies, not phylogenetic  
>> trees of all of life. Not every taxonomy has a node for "all of  
>> life" or for LUCA. For example, ITIS, one of the most widely used  
>> taxonomies if you're not dealing strictly with molecular data, does  
>> not - there is one "tree" for each kingdom of life. (Not that I  
>> want to recommend that as a good thing.)
>>
>> I agree with your programming awkwardness argument, though I would  
>> add  that looking for a specific label of a node to identify the  
>> root is always a bad (because fragile) thing to do. A better way to  
>> identify a root node would be parent undefined, or being the same  
>> as the node itself. If the code did that for each the 5 or so  
>> children of 'root', then the fake root could be removed.
>>
>> At the very least code in Bio::DB::Taxonomy or anywhere else should  
>> not assume that there is a single root for a taxonomy. Because  
>> there are taxonomies for which this really isn't the case.
>>
>> 	-hilmar
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>> ===========================================================
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================





From bosborne11 at verizon.net  Thu Jul  7 22:17:52 2011
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 07 Jul 2011 22:17:52 -0400
Subject: [Bioperl-l] Bio::DB::Taxonomy root not present
In-Reply-To: <46BDE3BF-4F3C-4C13-BFD3-FA0797B63EDB@drycafe.net>
References: 
	<1394A6A5-105F-4A74-AE18-3834126806EA@verizon.net>
	
	<657E799C-3744-415F-A1E2-215DB70463F4@verizon.net>
	<46BDE3BF-4F3C-4C13-BFD3-FA0797B63EDB@drycafe.net>
Message-ID: 

Hilmar,

Instead of addressing the side issues address what I named as the most salient issue: if we base this module's behaviour on NCBI's taxonomy - and all data says this module should mirror NCBI's taxonomy - then a node called "root" should exist since it exists in NCBI's data files. Right?

BIO

On Jul 7, 2011, at 6:28 PM, Hilmar Lapp wrote:

> 
> On Jul 7, 2011, at 3:22 PM, Brian Osborne wrote:
> 
>> It's also true that having 5 trees instead of 1 is incorrect scientifically
> 
> That's a strong statement and I'm not sure I agree with this - let's keep in mind that these are taxonomies, not phylogenetic trees of all of life. Not every taxonomy has a node for "all of life" or for LUCA. For example, ITIS, one of the most widely used taxonomies if you're not dealing strictly with molecular data, does not - there is one "tree" for each kingdom of life. (Not that I want to recommend that as a good thing.)
> 
> I agree with your programming awkwardness argument, though I would add  that looking for a specific label of a node to identify the root is always a bad (because fragile) thing to do. A better way to identify a root node would be parent undefined, or being the same as the node itself. If the code did that for each the 5 or so children of 'root', then the fake root could be removed.
> 
> At the very least code in Bio::DB::Taxonomy or anywhere else should not assume that there is a single root for a taxonomy. Because there are taxonomies for which this really isn't the case.
> 
> 	-hilmar
> 
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at illinois.edu  Thu Jul  7 23:22:34 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 7 Jul 2011 22:22:34 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy root not present
In-Reply-To: 
References: 
	<1394A6A5-105F-4A74-AE18-3834126806EA@verizon.net>
	
	<657E799C-3744-415F-A1E2-215DB70463F4@verizon.net>
	<46BDE3BF-4F3C-4C13-BFD3-FA0797B63EDB@drycafe.net>
	
	
Message-ID: 

That is a good point, actually (as well as your earlier one re: how to determine whether a node is root or not), and is likely the reasoning Sendu used for doing this in the first place.  At the very least this behavior should be documented, though, as it is a little unexpected.

chris

On Jul 7, 2011, at 10:05 PM, Hilmar Lapp wrote:

> Right, but do you agree that looking for the label 'root' shouldn't be the only way to identify it?
> 
> 	-hilmar
> 
> On Jul 7, 2011, at 10:17 PM, Brian Osborne wrote
> 
>> Hilmar,
>> 
>> Instead of addressing the side issues address what I named as the most salient issue: if we base this module's behaviour on NCBI's taxonomy - and all data says this module should mirror NCBI's taxonomy - then a node called "root" should exist since it exists in NCBI's data files. Right?
>> 
>> BIO
>> 
>> On Jul 7, 2011, at 6:28 PM, Hilmar Lapp wrote:
>> 
>>> 
>>> On Jul 7, 2011, at 3:22 PM, Brian Osborne wrote:
>>> 
>>>> It's also true that having 5 trees instead of 1 is incorrect scientifically
>>> 
>>> That's a strong statement and I'm not sure I agree with this - let's keep in mind that these are taxonomies, not phylogenetic trees of all of life. Not every taxonomy has a node for "all of life" or for LUCA. For example, ITIS, one of the most widely used taxonomies if you're not dealing strictly with molecular data, does not - there is one "tree" for each kingdom of life. (Not that I want to recommend that as a good thing.)
>>> 
>>> I agree with your programming awkwardness argument, though I would add  that looking for a specific label of a node to identify the root is always a bad (because fragile) thing to do. A better way to identify a root node would be parent undefined, or being the same as the node itself. If the code did that for each the 5 or so children of 'root', then the fake root could be removed.
>>> 
>>> At the very least code in Bio::DB::Taxonomy or anywhere else should not assume that there is a single root for a taxonomy. Because there are taxonomies for which this really isn't the case.
>>> 
>>> 	-hilmar
>>> 
>>> -- 
>>> ===========================================================
>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>>> ===========================================================
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From Daniel.Lang at biologie.uni-freiburg.de  Fri Jul  8 04:18:59 2011
From: Daniel.Lang at biologie.uni-freiburg.de (Daniel Lang)
Date: Fri, 08 Jul 2011 10:18:59 +0200
Subject: [Bioperl-l] [Gmod-gbrowse]   scores in Bio::DB::BigBed
In-Reply-To: 
References: 
Message-ID: <4E16BD73.1040903@biologie.uni-freiburg.de>

Hi Timothy,

thanks a lot for sharing this great tool! It worked as you said:-)

Best,
Daniel

Am 06.07.2011 22:45, schrieb Timothy Parnell:
> Hi Daniel,
> 
> Since you have a need to collapse your data into useable genomic bins, I
> may have a tool that might help you. Have a look at this program
> http://code.google.com/p/biotoolbox/wiki/Pod_get_datasets
> (Disclosure: I am the author) This is normally used for data analysis, but
> you can also use it collapse data into single value bins.
> 
> You can collect scores from a BigBed file over genomic intervals and the
> scores will be combined in your favorite manner (mean, median, min, max,
> etc). For example, to take the median score value from all bed features in
> 500 bp windows across the genome, the command would look like this
> 
> get_datasets.pl --new --db chromosomes.gff3 --feature genome --win 500
> --method median --dataf my_data_file.bb --out output.txt
> 
> where chromsomes.gff3 is just a simple GFF3 file containing the
> chromosomes or contigs, and my_data_file.bb is your BigBed file. The other
> options simply tell the program to make a new genomic interval data file
> across the genome.
> 
> Once you have your data file, you can then convert it to a wig or bigWig
> file using data2wig.pl, found in the same biotoolbox collection.
> 
> Hope that helps you
> Tim
> 
> 
> On 7/6/11 1:54 AM, "Daniel Lang" 
> wrote:
> 
>> Hi all,
>>
>> thanks a lot for your input on this!
>>
>> I want to explore the repeat structure of our model genome derived by
>> lastz self-alignments (using %id as score).
>> Since this is a HUGE file and I initially wanted to have the ability to
>> access the information for individual repeat regions also in gbrowse, I
>> wanted to use BigBed. Having the data in hand, it seems not to be such a
>> good idea anyway since the resulting repeat graph is much more complex
>> that I expected. So summarizing using the score and/or coverage will do
>> just fine;-)
>>
>> But as they are repeats they're overlapping. So if I see it correctly
>> BigWig/BedGraph aren't an option. Due to the size limitations, I have
>> not stored individual CIGAR strings that I could use to generate full-
>> blown SAM files. Or can I use BAM without sequence/qual data?
>>
>> Or is there an existing tool that would allow me to collapse overlapping
>> ranges with average scores for use in BigWig?
>>
>> Otherwise, I'll have to live with the coverage graphs for visualization
>> in gbrowse and use Bio::DB::BigBed::features to look at conservation
>> score at individual loci.
>>
>> Chris, the proposed BP page would be extremely helpful :-D
>>
>> Again, thanks a lot!
>>
>> Best,
>> Daniel
>>
>> Am 04.07.2011 18:10, schrieb Chris Fields:
>>> I generally follow these rules where I want a common set of possibly
>>> volatile features (e.g. specific transcriptome analysis) separate from
>>> my main 'stable' feature database (e.g. gene models):
>>>
>>> 1) BigBed - lightweight bundle of simple features where the ranges may
>>> overlap, but I'm not concerned about score.  I have found BED/BigBed
>>> scores of limited use in most cases to me unless I scale the data (since
>>> they must be 0-1000 integer values).  Document it very well if you do
>>> any scaling! YMMV
>>>
>>> 2) SAM/BAM - bundle of (possibly overlapping) features where summary
>>> stats are needed.  I've seen these used for BLAST/BLAT runs, etc.
>>>
>>> 3) BigWig - quantitative data of fixed or varying ranges covering
>>> entire genome, ranges can't overlap
>>>
>>> 4) BedGraph - quantitative sparse data, ranges can't overlap (these are
>>> converted over to BigWig for GBrowse, though)
>>>
>>> 5) Of course, one can also set up separate DB::SF::Store databases as
>>> well depending on your needs (I have used both the SQLite and MySQL
>>> adaptors for this).
>>>
>>> I think this is almost begging for a 'best practices' chart/table
>>> somewhere, maybe a GBrowse 'cookbook' of common data representation
>>> cases.
>>>
>>> chris
>>>
>>> On Jul 4, 2011, at 8:22 AM, Lincoln Stein wrote:
>>>
>>>> I had a look at the output of bigBedSummary, which is from Jim Kent's
>>>> source
>>>> tree (no Perl involved), and it appears that the statistics it
>>>> provides are
>>>> limited to coverage; so I don't think you can do anything with the
>>>> scores if
>>>> you're using BigBed indexing. Have a look at BedGraph=>BigWig and see
>>>> if it
>>>> meets your needs.
>>>>
>>>> Lincoln
>>>>
>>>> On Mon, Jul 4, 2011 at 9:04 AM, Lincoln Stein
>>>> wrote:
>>>>
>>>>> Hi Dan,
>>>>>
>>>>> The documentation for BigBed is scanty; all I know about it is what is
>>>>> provided by the bigbed library is in Jim Kent's bigbed.h include
>>>>> file. I had
>>>>> thought that the scores in BED files would come through into the
>>>>> summary
>>>>> statistics like those in BigWig, but now I'm looking at the example
>>>>> data
>>>>> provided in Jim's source code, and see that the BigBed example source
>>>>> file
>>>>> has scores of "0".
>>>>>
>>>>> I'll investigate whether there is an issue in the Perl layer, but it
>>>>> could
>>>>> easily be a limitation in the library itself. Have you considered
>>>>> using a
>>>>> BedGraph file and indexing it with bedGraphToBigWig? I know that the
>>>>> Bio::DB::BigWig interface works perfectly to retrieve and summarize
>>>>> the
>>>>> scores.
>>>>>
>>>>> Lincoln
>>>>>
>>>>>
>>>>> On Sun, Jul 3, 2011 at 5:48 AM, Daniel Lang <
>>>>> Daniel.Lang at biologie.uni-freiburg.de> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> quick question about the BigBed adaptor: Is it correct that the bin
>>>>>> and
>>>>>> summary functions only return statistics about the number of
>>>>>> features in
>>>>>> the defined intervals?
>>>>>> I was expecting them to deliver statistics about the score if the
>>>>>> respective bb file has a defined score field.
>>>>>> If this is true, does this also mean that I cannot plot the
>>>>>> distribution
>>>>>> of scores in BigBed files in gbrowse?
>>>>>>
>>>>>> This is the first time I'm using BigBed, maybe I'm doing something
>>>>>> wrong...
>>>>>>
>>>>>> I had some trouble formatting the bed files correctly in order to see
>>>>>> the score in the features returned by the Bio::DB::BigBed::features()
>>>>>> routine. It seems the bigbed entries will only have a correctly
>>>>>> assigned
>>>>>> score field if you also provide a non-empty name field. Initially I
>>>>>> thought that the order of columns is irrelevant if you use an .as
>>>>>> file
>>>>>> in the bedToBigBed call, but that doesn't seem to be the case.
>>>>>>
>>>>>> Best,
>>>>>> Daniel
>>>>>> --
>>>>>>
>>>>>> Dr. Daniel Lang
>>>>>> University of Freiburg, Plant Biotechnology
>>>>>> Schaenzlestr. 1, D-79104 Freiburg
>>>>>> fax:        +49 761 203 6945
>>>>>> phone:      +49 761 203 6989
>>>>>> homepage:   http://www.plant-biotech.net/
>>>>>>           http://www.cosmoss.org/
>>>>>> e-mail :
>>>>>> daniel.lang at biologie.uni-freiburg.de
>>>>>>
>>>>>> #################################################
>>>>>> My software never has bugs.
>>>>>> It just develops random features.
>>>>>> #################################################
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------------
>>>>>> --------
>>>>>> All of the data generated in your IT infrastructure is seriously
>>>>>> valuable.
>>>>>> Why? It contains a definitive record of application performance,
>>>>>> security
>>>>>> threats, fraudulent activity, and more. Splunk takes this data and
>>>>>> makes
>>>>>> sense of it. IT sense. And common sense.
>>>>>> http://p.sf.net/sfu/splunk-d2d-c2
>>>>>> _______________________________________________
>>>>>> Gmod-gbrowse mailing list
>>>>>> Gmod-gbrowse at lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Lincoln D. Stein
>>>>> Director, Informatics and Biocomputing Platform
>>>>> Ontario Institute for Cancer Research
>>>>> 101 College St., Suite 800
>>>>> Toronto, ON, Canada M5G0A3
>>>>> 416 673-8514
>>>>> Assistant: Renata Musa 
>>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Lincoln D. Stein
>>>> Director, Informatics and Biocomputing Platform
>>>> Ontario Institute for Cancer Research
>>>> 101 College St., Suite 800
>>>> Toronto, ON, Canada M5G0A3
>>>> 416 673-8514
>>>> Assistant: Renata Musa 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> -- 
>>
>> Dr. Daniel Lang
>> University of Freiburg, Plant Biotechnology
>> Schaenzlestr. 1, D-79104 Freiburg
>> fax:        +49 761 203 6945
>> phone:      +49 761 203 6989
>> homepage:   http://www.plant-biotech.net/
>>            http://www.cosmoss.org/
>> e-mail:     daniel.lang at biologie.uni-freiburg.de
>>
>> #################################################
>> My software never has bugs.
>> It just develops random features.
>> #################################################
>>
>>
>>
>>
>> --------------------------------------------------------------------------
>> ----
>> All of the data generated in your IT infrastructure is seriously valuable.
>> Why? It contains a definitive record of application performance, security
>> threats, fraudulent activity, and more. Splunk takes this data and makes
>> sense of it. IT sense. And common sense.
>> http://p.sf.net/sfu/splunk-d2d-c2
>> _______________________________________________
>> Gmod-gbrowse mailing list
>> Gmod-gbrowse at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
> 

-- 

Dr. Daniel Lang
University of Freiburg, Plant Biotechnology
Schaenzlestr. 1, D-79104 Freiburg
fax:        +49 761 203 6945
phone:      +49 761 203 6989
homepage:   http://www.plant-biotech.net/
            http://www.cosmoss.org/
e-mail:     daniel.lang at biologie.uni-freiburg.de

#################################################
My software never has bugs.
It just develops random features.
#################################################




From Kevin.M.Brown at asu.edu  Fri Jul  8 10:56:05 2011
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Fri, 8 Jul 2011 07:56:05 -0700
Subject: [Bioperl-l] position method in Bio::Restriction::Analysis
References: <1A4207F8295607498283FE9E93B775B46B8D61@EX02.asurite.ad.asu.edu>
	
Message-ID: <1A4207F8295607498283FE9E93B775B46B8D62@EX02.asurite.ad.asu.edu>

Please keep replies on the list.

Yes, that is the wrong place to look as that is BioPerl v1.4 (notice the path) whereas the software you are trying to use is v1.6 (hopefully 1.6.9). So, those docs aren't "out of date" so much as they belong to a different version of the software.

-----Original Message-----
From: Mark Nadel [mailto:nadel at nabsys.com]
Sent: Thu 7/7/2011 2:50 PM
To: Kevin Brown
Subject: Re: [Bioperl-l] position method in Bio::Restriction::Analysis
 
I got the information from
http://doc.bioperl.org/releases/bioperl-1.4/Bio/Restriction/Analysis.html#POD5.
Is this the wrong place to be looking? I accidently said position instead of
positions, but that is not the problem.

On Thu, Jul 7, 2011 at 5:26 PM, Kevin Brown  wrote:

> Try using the Deobfuscator. Not sure where you got the information that
> Restriction::Enzyme has a position method, but according to the docs for
> Bioperl-live it doesn't.
>
>
> http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3A%3ARestriction%3A%3AEnzyme&sort_order=by+method&search_string=Bio%3A%3ARestriction
>
> I think you're wanting the overhang_seq method which returns a
> Bio::Locatable object.
>
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org on behalf of Mark Nadel
> Sent: Thu 7/7/2011 2:04 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] position method in Bio::Restriction::Analysis
>
> I am having trouble using the position method, and no sample code is
> included in the documentation.
>
> Here is my script:
> *
> *
> *
>
> use strict;
>
> use Bio::Restriction::EnzymeCollection;
>
> use Bio::Restriction::Analysis;
>
> use Bio::DB::GenBank;
>
> use Bio::Seq;
>
> use Bio::SeqIO;
>
> use Bio::Seq::RichSeq;
>
> use Bio::Tools::SeqStats;
>
>
>
> my $accension_number = 'M77815';  ##'U00096.2';
>
>
> my $outputFile = "/Users/marknadel/Documents/UniqueCutters".$
> accension_number.".txt";
>
>
> open OUT, ">$outputFile" or die "Can't open $outputFile";
>
>
> my $db = Bio::DB::GenBank->new();
>
>
> my $seq = $db->get_Seq_by_acc($accension_number);
>
>
> print ">";
>
> print $seq->desc();
>
> print "\tThe sequence is circular:";
>
> print $seq->is_circular();
>
>
>
> print "\n";
>
>
> my $ra = Bio::Restriction::Analysis->new(-seq=>$seq);
>
>
> my $all_cutters = $ra->cutters;
>
>
> my $uniqe = $ra->unique_cutters;
>
>
>
>
>
> foreach my $enz ($uniqe->each_enzyme()){
>
> print $enz->name();
>
> print OUT $enz->name();
>
> print "\t";
>
> print OUT "\t";
>
>    my @cutpoint = $enz->position();
>
>    #print $cutpoint;
>
>   # print OUT $cutpoint;
>
>
>
>
>
>   }
>
> print "\n";
>
>    print OUT "\n";
>
>    close OUT;
>
> ---------------------
>
>
> and here is the output:
>
>
> >M13mp18 phage cloning vector. The sequence is circular:1
>
> AasI Can't locate object method "position" via package
> "Bio::Restriction::Enzyme"
> at /Users/marknadel/Documents/workspace/adHoc/unique_cutters.pl line 42,
>  line 532.
>
>
> I had a similar problem before with another method  in this package and
> someone was kind enough to give me the exact syntax.
>
> Thanks in advance,
>
> Mark
>
> *
>
> --
> *Mark Nadel*
> *Principal Scientist
> *
> NABsys Inc.
> 60 Clifford Street
> Providence, RI  02903
>
> Phone   401-276-9100 x204
> Fax 401-276-9122
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



-- 
*Mark Nadel*
*Principal Scientist
*
NABsys Inc.
60 Clifford Street
Providence, RI  02903

Phone   401-276-9100 x204
Fax 401-276-9122





From cjfields at illinois.edu  Fri Jul  8 11:34:49 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 8 Jul 2011 10:34:49 -0500
Subject: [Bioperl-l] position method in Bio::Restriction::Analysis
In-Reply-To: <1A4207F8295607498283FE9E93B775B46B8D62@EX02.asurite.ad.asu.edu>
References: <1A4207F8295607498283FE9E93B775B46B8D61@EX02.asurite.ad.asu.edu>
	
	<1A4207F8295607498283FE9E93B775B46B8D62@EX02.asurite.ad.asu.edu>
Message-ID: <5ED7BD23-6B07-46B0-87DE-D95C211EB523@illinois.edu>

One can also look here:

http://doc.bioperl.org/bioperl-live/

I think that syncs to the github.  Thought, truthfully, I tend to simply perldoc it, or use search.cpan.org/search.metacpan.org (Chrome makes this easy).  MetaCPAN is nice but the search is still a little flaky.

http://search.cpan.org/perldoc?Bio::Restriction::Analysis
http://search.metacpan.org/#/showpod/Bio::Restriction::Analysis

chris

On Jul 8, 2011, at 9:56 AM, Kevin Brown wrote:

> Please keep replies on the list.
> 
> Yes, that is the wrong place to look as that is BioPerl v1.4 (notice the path) whereas the software you are trying to use is v1.6 (hopefully 1.6.9). So, those docs aren't "out of date" so much as they belong to a different version of the software.
> 
> -----Original Message-----
> From: Mark Nadel [mailto:nadel at nabsys.com]
> Sent: Thu 7/7/2011 2:50 PM
> To: Kevin Brown
> Subject: Re: [Bioperl-l] position method in Bio::Restriction::Analysis
> 
> I got the information from
> http://doc.bioperl.org/releases/bioperl-1.4/Bio/Restriction/Analysis.html#POD5.
> Is this the wrong place to be looking? I accidently said position instead of
> positions, but that is not the problem.
> 
> On Thu, Jul 7, 2011 at 5:26 PM, Kevin Brown  wrote:
> 
>> Try using the Deobfuscator. Not sure where you got the information that
>> Restriction::Enzyme has a position method, but according to the docs for
>> Bioperl-live it doesn't.
>> 
>> 
>> http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3A%3ARestriction%3A%3AEnzyme&sort_order=by+method&search_string=Bio%3A%3ARestriction
>> 
>> I think you're wanting the overhang_seq method which returns a
>> Bio::Locatable object.
>> 
>> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org on behalf of Mark Nadel
>> Sent: Thu 7/7/2011 2:04 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] position method in Bio::Restriction::Analysis
>> 
>> I am having trouble using the position method, and no sample code is
>> included in the documentation.
>> 
>> Here is my script:
>> *
>> *
>> *
>> 
>> use strict;
>> 
>> use Bio::Restriction::EnzymeCollection;
>> 
>> use Bio::Restriction::Analysis;
>> 
>> use Bio::DB::GenBank;
>> 
>> use Bio::Seq;
>> 
>> use Bio::SeqIO;
>> 
>> use Bio::Seq::RichSeq;
>> 
>> use Bio::Tools::SeqStats;
>> 
>> 
>> 
>> my $accension_number = 'M77815';  ##'U00096.2';
>> 
>> 
>> my $outputFile = "/Users/marknadel/Documents/UniqueCutters".$
>> accension_number.".txt";
>> 
>> 
>> open OUT, ">$outputFile" or die "Can't open $outputFile";
>> 
>> 
>> my $db = Bio::DB::GenBank->new();
>> 
>> 
>> my $seq = $db->get_Seq_by_acc($accension_number);
>> 
>> 
>> print ">";
>> 
>> print $seq->desc();
>> 
>> print "\tThe sequence is circular:";
>> 
>> print $seq->is_circular();
>> 
>> 
>> 
>> print "\n";
>> 
>> 
>> my $ra = Bio::Restriction::Analysis->new(-seq=>$seq);
>> 
>> 
>> my $all_cutters = $ra->cutters;
>> 
>> 
>> my $uniqe = $ra->unique_cutters;
>> 
>> 
>> 
>> 
>> 
>> foreach my $enz ($uniqe->each_enzyme()){
>> 
>> print $enz->name();
>> 
>> print OUT $enz->name();
>> 
>> print "\t";
>> 
>> print OUT "\t";
>> 
>>   my @cutpoint = $enz->position();
>> 
>>   #print $cutpoint;
>> 
>>  # print OUT $cutpoint;
>> 
>> 
>> 
>> 
>> 
>>  }
>> 
>> print "\n";
>> 
>>   print OUT "\n";
>> 
>>   close OUT;
>> 
>> ---------------------
>> 
>> 
>> and here is the output:
>> 
>> 
>>> M13mp18 phage cloning vector. The sequence is circular:1
>> 
>> AasI Can't locate object method "position" via package
>> "Bio::Restriction::Enzyme"
>> at /Users/marknadel/Documents/workspace/adHoc/unique_cutters.pl line 42,
>>  line 532.
>> 
>> 
>> I had a similar problem before with another method  in this package and
>> someone was kind enough to give me the exact syntax.
>> 
>> Thanks in advance,
>> 
>> Mark
>> 
>> *
>> 
>> --
>> *Mark Nadel*
>> *Principal Scientist
>> *
>> NABsys Inc.
>> 60 Clifford Street
>> Providence, RI  02903
>> 
>> Phone   401-276-9100 x204
>> Fax 401-276-9122
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> *Mark Nadel*
> *Principal Scientist
> *
> NABsys Inc.
> 60 Clifford Street
> Providence, RI  02903
> 
> Phone   401-276-9100 x204
> Fax 401-276-9122
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From ravi.devani89 at gmail.com  Sat Jul  9 13:28:32 2011
From: ravi.devani89 at gmail.com (Ravi Devani)
Date: Sat, 9 Jul 2011 22:58:32 +0530
Subject: [Bioperl-l] please help: bp_genbank2gff3.pl problems
Message-ID: 

i tried to convert the .gbk file of hydra genome from ncbi ftp site to .gff3
format for displaying it on gbrowse. .gbk file was 1.4gb in size while the
gff3 file gave the size of over 255gb before my hard disk was full and the
conversion stopped.. when i examined the gff3 file i found that the same
feature was repeated many times.. how to solve this problem..However the
.gbk file for mitochondrial genome of hydra got converted successfully using
the same script.. i have attached some part of the .gff3 file which shows
the same lines repeating many times..
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wgs.head50.gff
Type: application/octet-stream
Size: 6010 bytes
Desc: not available
URL: 

From bosborne11 at verizon.net  Sun Jul 10 13:38:18 2011
From: bosborne11 at verizon.net (Brian Osborne)
Date: Sun, 10 Jul 2011 13:38:18 -0400
Subject: [Bioperl-l] Bio::DB::Taxonomy root not present
In-Reply-To: 
References: 
	<1394A6A5-105F-4A74-AE18-3834126806EA@verizon.net>
	
	<657E799C-3744-415F-A1E2-215DB70463F4@verizon.net>
	<46BDE3BF-4F3C-4C13-BFD3-FA0797B63EDB@drycafe.net>
	
	
Message-ID: <50C9866F-1309-4006-88CF-D60AADAB24C2@verizon.net>


is_root(), or the equivalent?


On Jul 7, 2011, at 11:05 PM, Hilmar Lapp wrote:

> Right, but do you agree that looking for the label 'root' shouldn't be the only way to identify it?
> 
> 	-hilmar
> 
> On Jul 7, 2011, at 10:17 PM, Brian Osborne wrote
> 
>> Hilmar,
>> 
>> Instead of addressing the side issues address what I named as the most salient issue: if we base this module's behaviour on NCBI's taxonomy - and all data says this module should mirror NCBI's taxonomy - then a node called "root" should exist since it exists in NCBI's data files. Right?
>> 
>> BIO
>> 
>> On Jul 7, 2011, at 6:28 PM, Hilmar Lapp wrote:
>> 
>>> 
>>> On Jul 7, 2011, at 3:22 PM, Brian Osborne wrote:
>>> 
>>>> It's also true that having 5 trees instead of 1 is incorrect scientifically
>>> 
>>> That's a strong statement and I'm not sure I agree with this - let's keep in mind that these are taxonomies, not phylogenetic trees of all of life. Not every taxonomy has a node for "all of life" or for LUCA. For example, ITIS, one of the most widely used taxonomies if you're not dealing strictly with molecular data, does not - there is one "tree" for each kingdom of life. (Not that I want to recommend that as a good thing.)
>>> 
>>> I agree with your programming awkwardness argument, though I would add  that looking for a specific label of a node to identify the root is always a bad (because fragile) thing to do. A better way to identify a root node would be parent undefined, or being the same as the node itself. If the code did that for each the 5 or so children of 'root', then the fake root could be removed.
>>> 
>>> At the very least code in Bio::DB::Taxonomy or anywhere else should not assume that there is a single root for a taxonomy. Because there are taxonomies for which this really isn't the case.
>>> 
>>> 	-hilmar
>>> 
>>> -- 
>>> ===========================================================
>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>>> ===========================================================
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From nestor at linuxmail.org  Mon Jul 11 04:47:03 2011
From: nestor at linuxmail.org (Nestor Zaburannyi)
Date: Mon, 11 Jul 2011 10:47:03 +0200
Subject: [Bioperl-l] Fastq - is it flush, or not?
Message-ID: <1787809581.20110711104703@linuxmail.org>

Dear All.

I need to reverse-complement a huge *.fastq file. I try to

while (my $seq = $in->next_seq)
    {
    $out->write_seq($seq->revcom);
    }

However, it stops with:

------------- EXCEPTION -------------
MSG: Can not get a reverse complement. The object is not flush.
STACK Bio::Seq::Meta::Array::revcom /usr/share/perl5/Bio/Seq/Meta/Array.pm:648


Adding

$seq->force_flush('1');

helps, but is it necessary and proper way? Sequences seem to be "flush".

Sincerely yours
Nestor


From cjfields at illinois.edu  Mon Jul 11 08:17:07 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 11 Jul 2011 07:17:07 -0500
Subject: [Bioperl-l] Fastq - is it flush, or not?
In-Reply-To: <1787809581.20110711104703@linuxmail.org>
References: <1787809581.20110711104703@linuxmail.org>
Message-ID: <1BB8B87F-37EA-499C-BA13-9F5CE470FD05@illinois.edu>

The error appears to be in revcom(), so my guess is something with clipping of the quality data?  Hard so say for sure, though, w/o seeing the data and testing it.  

chris

On Jul 11, 2011, at 3:47 AM, Nestor Zaburannyi wrote:

> Dear All.
> 
> I need to reverse-complement a huge *.fastq file. I try to
> 
> while (my $seq = $in->next_seq)
>    {
>    $out->write_seq($seq->revcom);
>    }
> 
> However, it stops with:
> 
> ------------- EXCEPTION -------------
> MSG: Can not get a reverse complement. The object is not flush.
> STACK Bio::Seq::Meta::Array::revcom /usr/share/perl5/Bio/Seq/Meta/Array.pm:648
> 
> 
> Adding
> 
> $seq->force_flush('1');
> 
> helps, but is it necessary and proper way? Sequences seem to be "flush".
> 
> Sincerely yours
> Nestor
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From sheena.scroggins at gmail.com  Mon Jul 11 13:28:27 2011
From: sheena.scroggins at gmail.com (Sheena Scroggins)
Date: Mon, 11 Jul 2011 10:28:27 -0700
Subject: [Bioperl-l] Bioperl reorganization update
Message-ID: 

The project is off to a great start. Google Summer of Code is at the
midpoint, read about the progress at http://techomics.com/

Sheena

From statonse at uga.edu  Mon Jul 11 14:33:19 2011
From: statonse at uga.edu (Evan Staton)
Date: Mon, 11 Jul 2011 14:33:19 -0400
Subject: [Bioperl-l] Bio::Index::Fastq indexing options
Message-ID: 

Hi bioperl-l,

I have been experimenting with Bio::Index::Fastq, and I realize from
previous conversations
on
the list that this module will not work efficiently with the scale of Fastq
files being produced by modern sequencing instruments. Based on that
conversation, it appears that the module uses DB_File currently and it was
mentioned that AnyDBM_File could be used to allow the use of SQLite. Has any
progress been made towards this goal? I am aware there are other indexing
solutions, and I would offer to help out but it appears this transition has
already been worked out in detail on the
HOWTO
page.
However, I don't see the changes reflected in github for this module. I was
curious if the performance improvements did not justify making changes for
this module or if there were other explanations.

Sorry if I missed relevant conversations on the topic, but I'd like to help
with this if I can.

Thanks,

Evan

From cjfields at illinois.edu  Mon Jul 11 14:44:06 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 11 Jul 2011 13:44:06 -0500
Subject: [Bioperl-l] Bio::Index::Fastq indexing options
In-Reply-To: 
References: 
Message-ID: 

Evan,

I don't think it was performance-related, just that no one has had time to really work on it (the main dev behind the push for this, Mark Jensen, is still around but has been very busy with his new job).  I do know (from my own attempts at implementing this) a few roadblocks were the use of DB_File-specific constants in some modules.

If you or anyone else wants to work on this feel free, it would be more than welcome.

chris

I fully support a move to either AnyDBM_File or DBD::SQLite specifically.
On Jul 11, 2011, at 1:33 PM, Evan Staton wrote:

> Hi bioperl-l,
> 
> I have been experimenting with Bio::Index::Fastq, and I realize from
> previous conversations
> on
> the list that this module will not work efficiently with the scale of Fastq
> files being produced by modern sequencing instruments. Based on that
> conversation, it appears that the module uses DB_File currently and it was
> mentioned that AnyDBM_File could be used to allow the use of SQLite. Has any
> progress been made towards this goal? I am aware there are other indexing
> solutions, and I would offer to help out but it appears this transition has
> already been worked out in detail on the
> HOWTO
> page.
> However, I don't see the changes reflected in github for this module. I was
> curious if the performance improvements did not justify making changes for
> this module or if there were other explanations.
> 
> Sorry if I missed relevant conversations on the topic, but I'd like to help
> with this if I can.
> 
> Thanks,
> 
> Evan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at illinois.edu  Mon Jul 11 14:48:41 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 11 Jul 2011 13:48:41 -0500
Subject: [Bioperl-l] Bio::Index::Fastq indexing options
In-Reply-To: 
References: 
	
Message-ID: 

Forgot to mention, for more info on this, see:

http://www.bioperl.org/wiki/HOWTO:SQLite_for_BioPerl_indexing

This is also filed in Redmine as a bug I believe.

chris

On Jul 11, 2011, at 1:44 PM, Chris Fields wrote:

> Evan,
> 
> I don't think it was performance-related, just that no one has had time to really work on it (the main dev behind the push for this, Mark Jensen, is still around but has been very busy with his new job).  I do know (from my own attempts at implementing this) a few roadblocks were the use of DB_File-specific constants in some modules.
> 
> If you or anyone else wants to work on this feel free, it would be more than welcome.
> 
> chris
> 
> I fully support a move to either AnyDBM_File or DBD::SQLite specifically.
> On Jul 11, 2011, at 1:33 PM, Evan Staton wrote:
> 
>> Hi bioperl-l,
>> 
>> I have been experimenting with Bio::Index::Fastq, and I realize from
>> previous conversations
>> on
>> the list that this module will not work efficiently with the scale of Fastq
>> files being produced by modern sequencing instruments. Based on that
>> conversation, it appears that the module uses DB_File currently and it was
>> mentioned that AnyDBM_File could be used to allow the use of SQLite. Has any
>> progress been made towards this goal? I am aware there are other indexing
>> solutions, and I would offer to help out but it appears this transition has
>> already been worked out in detail on the
>> HOWTO
>> page.
>> However, I don't see the changes reflected in github for this module. I was
>> curious if the performance improvements did not justify making changes for
>> this module or if there were other explanations.
>> 
>> Sorry if I missed relevant conversations on the topic, but I'd like to help
>> with this if I can.
>> 
>> Thanks,
>> 
>> Evan
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From p.j.a.cock at googlemail.com  Mon Jul 11 15:47:49 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 11 Jul 2011 20:47:49 +0100
Subject: [Bioperl-l] Bio::Index::Fastq indexing options
In-Reply-To: 
References: 
	
	
Message-ID: 

On Mon, Jul 11, 2011 at 7:48 PM, Chris Fields  wrote:
> Forgot to mention, for more info on this, see:
>
> http://www.bioperl.org/wiki/HOWTO:SQLite_for_BioPerl_indexing
>
> This is also filed in Redmine as a bug I believe.
>
> chris

We've got SQLite based indexing for Biopython's SeqIO module,
and I'd be keen to try and share the same format (table names,
column names etc). I view this as an extension to the existing
flat file and BDB backends for OBDA - what I did for Biopython
with SQLite was very much inspired by that design.

http://obda.open-bio.org/
http://lists.open-bio.org/pipermail/open-bio-l/2009-August/000561.html

If any BioPerl folk are at the CodeFest this week before BOSC,
this could be a good project:

http://www.open-bio.org/wiki/Codefest_2011

Peter

From nestor at linuxmail.org  Mon Jul 11 17:28:44 2011
From: nestor at linuxmail.org (Nestor Zaburannyi)
Date: Mon, 11 Jul 2011 23:28:44 +0200
Subject: [Bioperl-l] Fastq - is it flush, or not?
In-Reply-To: <1BB8B87F-37EA-499C-BA13-9F5CE470FD05@illinois.edu>
References: <1787809581.20110711104703@linuxmail.org>
	<1BB8B87F-37EA-499C-BA13-9F5CE470FD05@illinois.edu>
Message-ID: <3910164964.20110711232844@linuxmail.org>

Chris,

cat myfile.fastq | grep -v ^@ | grep -v ^+$ | awk '{ print $0 " = " length($0) }' | grep -v 50

gives empty string and that means every sequence AND every quality string has exactly 50 characters. So, no clipping present, that's for sure. Any other ideas anyone?

Sincerely yours
Nestor



Monday, July 11, 2011, 2:17:07 PM, you wrote:

> The error appears to be in revcom(), so my guess is something with clipping of the quality data?  Hard so say for sure, though, w/o seeing the data and testing it.  

> chris

> On Jul 11, 2011, at 3:47 AM, Nestor Zaburannyi wrote:

>> Dear All.

>> I need to reverse-complement a huge *.fastq file. I try to

>> while (my $seq = $in->next_seq)
>>    {
>>    $out->write_seq($seq->revcom);
>>    }

>> However, it stops with:

>> ------------- EXCEPTION -------------
>> MSG: Can not get a reverse complement. The object is not flush.
>> STACK Bio::Seq::Meta::Array::revcom /usr/share/perl5/Bio/Seq/Meta/Array.pm:648


>> Adding

>> $seq->force_flush('1');

>> helps, but is it necessary and proper way? Sequences seem to be "flush".

>> Sincerely yours
>> Nestor

>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l




-- 
? ???????,
 Nestor                            mailto:nestor at linuxmail.org



From cjfields at illinois.edu  Mon Jul 11 18:19:21 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 11 Jul 2011 17:19:21 -0500
Subject: [Bioperl-l] Bio::Index::Fastq indexing options
In-Reply-To: 
References: 
	
	
	
Message-ID: <6DCE569E-1F02-4D67-B33C-936BE7B5EE31@illinois.edu>

On Jul 11, 2011, at 2:47 PM, Peter Cock wrote:

> On Mon, Jul 11, 2011 at 7:48 PM, Chris Fields  wrote:
>> Forgot to mention, for more info on this, see:
>> 
>> http://www.bioperl.org/wiki/HOWTO:SQLite_for_BioPerl_indexing
>> 
>> This is also filed in Redmine as a bug I believe.
>> 
>> chris
> 
> We've got SQLite based indexing for Biopython's SeqIO module,
> and I'd be keen to try and share the same format (table names,
> column names etc). I view this as an extension to the existing
> flat file and BDB backends for OBDA - what I did for Biopython
> with SQLite was very much inspired by that design.
> 
> http://obda.open-bio.org/
> http://lists.open-bio.org/pipermail/open-bio-l/2009-August/000561.html

I think this is a good plan to follow, at least makes the end-point index less reliant on a specific implementation.

> If any BioPerl folk are at the CodeFest this week before BOSC,
> this could be a good project:
> 
> http://www.open-bio.org/wiki/Codefest_2011
> 
> Peter

Unfortunately I'm not going this year, but maybe open an IRC channel or something similar so we can participate?  (though the time difference may limit that somewhat).  I am making active plans for attending next year, though.

chris



From j_martin at lbl.gov  Mon Jul 11 18:24:12 2011
From: j_martin at lbl.gov (Joel Martin)
Date: Mon, 11 Jul 2011 15:24:12 -0700
Subject: [Bioperl-l] Fastq - is it flush, or not?
In-Reply-To: <3910164964.20110711232844@linuxmail.org>
References: <1787809581.20110711104703@linuxmail.org>
	<1BB8B87F-37EA-499C-BA13-9F5CE470FD05@illinois.edu>
	<3910164964.20110711232844@linuxmail.org>
Message-ID: 

i'd use fastx toolkit( fastx_reverse_complement),  or at least you
could validate the bug with that.  and your regex can exclude valid
sequences.  this find a problematic line.

#!/usr/bin/perl -w
use strict;
my $slen;
while ( <> ) {
  $_=<>;
  $slen=length;
  <>;$_=<>;
  if( length != $slen ) {
    print "quality length != ", $slen - 1, " at line $.\nbad->$_";
  }
}


2011/7/11 Nestor Zaburannyi :
> Chris,
>
> cat myfile.fastq | grep -v ^@ | grep -v ^+$ | awk '{ print $0 " = " length($0) }' | grep -v 50
>
> gives empty string and that means every sequence AND every quality string has exactly 50 characters. So, no clipping present, that's for sure. Any other ideas anyone?
>
> Sincerely yours
> Nestor
>
>
>
> Monday, July 11, 2011, 2:17:07 PM, you wrote:
>
>> The error appears to be in revcom(), so my guess is something with clipping of the quality data? ?Hard so say for sure, though, w/o seeing the data and testing it.
>
>> chris
>
>> On Jul 11, 2011, at 3:47 AM, Nestor Zaburannyi wrote:
>
>>> Dear All.
>
>>> I need to reverse-complement a huge *.fastq file. I try to
>
>>> while (my $seq = $in->next_seq)
>>> ? ?{
>>> ? ?$out->write_seq($seq->revcom);
>>> ? ?}
>
>>> However, it stops with:
>
>>> ------------- EXCEPTION -------------
>>> MSG: Can not get a reverse complement. The object is not flush.
>>> STACK Bio::Seq::Meta::Array::revcom /usr/share/perl5/Bio/Seq/Meta/Array.pm:648
>
>
>>> Adding
>
>>> $seq->force_flush('1');
>
>>> helps, but is it necessary and proper way? Sequences seem to be "flush".
>
>>> Sincerely yours
>>> Nestor
>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> --
> ? ???????,
> ?Nestor ? ? ? ? ? ? ? ? ? ? ? ? ? ?mailto:nestor at linuxmail.org
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From p.j.a.cock at googlemail.com  Mon Jul 11 18:43:10 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 11 Jul 2011 23:43:10 +0100
Subject: [Bioperl-l] Bio::Index::Fastq indexing options
In-Reply-To: <6DCE569E-1F02-4D67-B33C-936BE7B5EE31@illinois.edu>
References: 
	
	
	
	<6DCE569E-1F02-4D67-B33C-936BE7B5EE31@illinois.edu>
Message-ID: 

On Mon, Jul 11, 2011 at 11:19 PM, Chris Fields  wrote:
> On Jul 11, 2011, at 2:47 PM, Peter Cock wrote:
>>
>> We've got SQLite based indexing for Biopython's SeqIO module,
>> and I'd be keen to try and share the same format (table names,
>> column names etc). I view this as an extension to the existing
>> flat file and BDB backends for OBDA - what I did for Biopython
>> with SQLite was very much inspired by that design.
>>
>> http://obda.open-bio.org/
>> http://lists.open-bio.org/pipermail/open-bio-l/2009-August/000561.html
>
> I think this is a good plan to follow, at least makes the end-point
> index less reliant on a specific implementation.

Feel free to ping me if this goes quiet - I had been meaning to
write up the Biopython SQlite SeqIO index table design, but had
not got round to it.

>> If any BioPerl folk are at the CodeFest this week before BOSC,
>> this could be a good project:
>>
>> http://www.open-bio.org/wiki/Codefest_2011
>>
>> Peter
>
> Unfortunately I'm not going this year, but maybe open an IRC
> channel or something similar so we can participate? ?(though the
> time difference may limit that somewhat). ?I am making active
> plans for attending next year, though.

We can look into that (CC'ing Brad who is organising the
CodeFest 2011 event).

Looking ahead to next year, ISMB 2012 will be July 15-17, at
Long Beach, California  (pre-conference SIGs like BOSC
etc July 13-14). I would hope to attend, but will have to see
about budgets - I was hoping it would in Europe again ;)

Peter


From lrg_ml at gmx.net  Mon Jul 11 20:33:30 2011
From: lrg_ml at gmx.net (Lutz Gehlen)
Date: Tue, 12 Jul 2011 12:33:30 +1200
Subject: [Bioperl-l] Getting started with bioperl-db
Message-ID: <201107121233.30696.lrg_ml@gmx.net>

Hello everybody,
I have a local BioSQL database and would like to extract data using 
the bioperl-db distribution. However, I am new to BioPerl and am 
really struggling with it. I hope that someone can point me to the 
right documentation, because it seems pretty hard to find.

My first question is about the installation of bioperl-db. The docu 
says that you need to install the BioSQL package first. What ist the 
BioSQL package? Is it a Perl module? The docu refers to biosql.org, 
but the only thing it seems to offer for download is the BioSQL 
schema. Is this what is meant with the BioSQL package? The 
http://bioperl.org/wiki/Bioperl-db page refers to http://obda.open-
bio.org/ for the "BioSQL package", but this page also only refers to 
the BioSQL schema at biosql.org. Plus, this page has not been 
updated since 2002. So I'm a bit lost here. If I call Build.PL of 
bioperl-db and just claim that I have the BioSQL package, test suite 
and installation seem to work without errors, but am I really set up 
correctly?

The next problems come when using bioperl-db. As I said, I already 
have a BioSQL database. My questions are of the type like "Is there 
a gene that overlaps with the region 15000-17000 on chromosome XY?" 
or "What are the coordinates of gene ABC123?" I haven't found any 
documentation on bioperl-db that would help me do that. From my 
search I rather got the impression that I have to understand a 
substantial part of BioPerl first before I can get going. However, 
it is very difficult to identify which parts of the HUGE BioPerl 
project I have to work through. Plus I haven't found any 
comprehensive documentation of BioPerl at all.

I would be very thankful if someone could give me some hints on 
where to start.

Thank you very much for your help
Lutz

From cjfields at illinois.edu  Mon Jul 11 22:04:30 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 11 Jul 2011 21:04:30 -0500
Subject: [Bioperl-l] Getting started with bioperl-db
In-Reply-To: <201107121233.30696.lrg_ml@gmx.net>
References: <201107121233.30696.lrg_ml@gmx.net>
Message-ID: <36D3A446-3E6C-4DF9-9611-21FA85DAC25F@illinois.edu>


On Jul 11, 2011, at 7:33 PM, Lutz Gehlen wrote:

> Hello everybody,
> I have a local BioSQL database and would like to extract data using 
> the bioperl-db distribution. However, I am new to BioPerl and am 
> really struggling with it. I hope that someone can point me to the 
> right documentation, because it seems pretty hard to find.
> 
> My first question is about the installation of bioperl-db. The docu 
> says that you need to install the BioSQL package first. What ist the 
> BioSQL package? Is it a Perl module?

No, it is the BioSQL schema.

> The docu refers to biosql.org, 
> but the only thing it seems to offer for download is the BioSQL 
> schema. Is this what is meant with the BioSQL package?

Yes.

> The 
> http://bioperl.org/wiki/Bioperl-db page refers to http://obda.open-
> bio.org/ for the "BioSQL package", but this page also only refers to 
> the BioSQL schema at biosql.org. Plus, this page has not been 
> updated since 2002.

That's correct (both the BioSQL website and the ODBA page.  It should probably be redirected to point to the biosql.org wiki.

> So I'm a bit lost here. If I call Build.PL of 
> bioperl-db and just claim that I have the BioSQL package, test suite 
> and installation seem to work without errors, but am I really set up 
> correctly?

BioSQL is distributed separately, mainly b/c it is supposed to be Bio*-agnostic.  The BioSQL distribution itself comes with some basic documentation to help; I think the biosql.org site also has information.  If you want the latest (developer) version of BioSQL it is now available on github (there are some small updates I think):

https://github.com/biosql/biosql

Installation instructions are here:

https://github.com/biosql/biosql/blob/master/INSTALL

> The next problems come when using bioperl-db. As I said, I already 
> have a BioSQL database. My questions are of the type like "Is there 
> a gene that overlaps with the region 15000-17000 on chromosome XY?" 
> or "What are the coordinates of gene ABC123?" I haven't found any 
> documentation on bioperl-db that would help me do that. From my 
> search I rather got the impression that I have to understand a 
> substantial part of BioPerl first before I can get going. However, 
> it is very difficult to identify which parts of the HUGE BioPerl 
> project I have to work through.

It does involve some overhead.

> Plus I haven't found any 
> comprehensive documentation of BioPerl at all.

I find that a bit hard to believe (the 'at all' part).  Yes, some parts are less-than-optimally documented, but the wiki has quite a bit.  See here:

http://www.bioperl.org/wiki/Main_Page

under 'Documentation'.  The various HOWTO's are a good place to start.  Also, in general all modules come with some synopsis code.

BioPerl is not simple, but neither are the types of data that are represented.  In some cases it's probably over-engineered, but most users find it works for them.

> I would be very thankful if someone could give me some hints on 
> where to start.
> 
> Thank you very much for your help
> Lutz

The best place to start is the HOWTO's, and this mail list (the archives on Gmane are searchable and typically have many answers to questions).

chris

From statonse at uga.edu  Tue Jul 12 10:59:05 2011
From: statonse at uga.edu (Evan Staton)
Date: Tue, 12 Jul 2011 10:59:05 -0400
Subject: [Bioperl-l] Bio::Index::Fastq indexing options
In-Reply-To: <763c4d6af35e4a62982aae7c701504bd@CH1PRD0202HT024.namprd02.prod.outlook.com>
References: 
	
	
	
	<6DCE569E-1F02-4D67-B33C-936BE7B5EE31@illinois.edu>
	<763c4d6af35e4a62982aae7c701504bd@CH1PRD0202HT024.namprd02.prod.outlook.com>
Message-ID: 

Hi,

Thank you Peter and Chris for your comments. It is encouraging to see
interest in this topic and the potential to move towards a standard.
Unfortunately, I'm not really familiar with OBDA or Biopython's SQLite based
indexing. I will try to make some progress on understanding these formats,
and watch the list to see if there has been any progress on this topic from
the Codefest or elsewhere.

Thanks,

Evan

On Mon, Jul 11, 2011 at 6:43 PM, Peter Cock wrote:

> On Mon, Jul 11, 2011 at 11:19 PM, Chris Fields 
> wrote:
> > On Jul 11, 2011, at 2:47 PM, Peter Cock wrote:
> >>
> >> We've got SQLite based indexing for Biopython's SeqIO module,
> >> and I'd be keen to try and share the same format (table names,
> >> column names etc). I view this as an extension to the existing
> >> flat file and BDB backends for OBDA - what I did for Biopython
> >> with SQLite was very much inspired by that design.
> >>
> >> http://obda.open-bio.org/
> >> http://lists.open-bio.org/pipermail/open-bio-l/2009-August/000561.html
> >
> > I think this is a good plan to follow, at least makes the end-point
> > index less reliant on a specific implementation.
>
> Feel free to ping me if this goes quiet - I had been meaning to
> write up the Biopython SQlite SeqIO index table design, but had
> not got round to it.
>
> >> If any BioPerl folk are at the CodeFest this week before BOSC,
> >> this could be a good project:
> >>
> >> http://www.open-bio.org/wiki/Codefest_2011
> >>
> >> Peter
> >
> > Unfortunately I'm not going this year, but maybe open an IRC
> > channel or something similar so we can participate?  (though the
> > time difference may limit that somewhat).  I am making active
> > plans for attending next year, though.
>
> We can look into that (CC'ing Brad who is organising the
> CodeFest 2011 event).
>
> Looking ahead to next year, ISMB 2012 will be July 15-17, at
> Long Beach, California  (pre-conference SIGs like BOSC
> etc July 13-14). I would hope to attend, but will have to see
> about budgets - I was hoping it would in Europe again ;)
>
> Peter
>
>
>

From awitney at sgul.ac.uk  Tue Jul 12 14:04:21 2011
From: awitney at sgul.ac.uk (Adam Witney)
Date: Tue, 12 Jul 2011 19:04:21 +0100
Subject: [Bioperl-l] how to find SeqFeature at specific sequence location
Message-ID: 

Hi,

Is there an easy way of finding the SeqFeatures at a specific base location on a Bio::Seq? I guess I can do it by calling get_all_SeqFeatures and testing start/stop coordinates, but just wondered if there was a better way.

Thanks

Adam

From Russell.Smithies at agresearch.co.nz  Tue Jul 12 19:12:20 2011
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 13 Jul 2011 11:12:20 +1200
Subject: [Bioperl-l] how to find SeqFeature at specific sequence location
In-Reply-To: 
References: 
Message-ID: <18DF7D20DFEC044098A1062202F5FFF3396074D2CB@exchsth.agresearch.co.nz>

I've done it before by taking a segment of the sequence and looking for features in that.

my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',-dsn=> 'hapmap:gbrowsemysql')or die "Can't open database:",Bio::DB::GFF->error,"\n";
my $segment = $db->segment(-class=>'Chromosome',-name=> "Chr1", -start=>$start, -end=>$end);
my @repeats = $segment->features(-types=> ['match:UCSC_REPEATMASK']);

--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Adam Witney
> Sent: Wednesday, 13 July 2011 6:04 a.m.
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] how to find SeqFeature at specific sequence
> location
> 
> Hi,
> 
> Is there an easy way of finding the SeqFeatures at a specific base
> location on a Bio::Seq? I guess I can do it by calling
> get_all_SeqFeatures and testing start/stop coordinates, but just
> wondered if there was a better way.
> 
> Thanks
> 
> Adam
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Tue Jul 12 20:06:25 2011
From: cjfields at illinois.edu (Chris UI)
Date: Tue, 12 Jul 2011 19:06:25 -0500
Subject: [Bioperl-l] how to find SeqFeature at specific sequence location
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF3396074D2CB@exchsth.agresearch.co.nz>
References: 
	<18DF7D20DFEC044098A1062202F5FFF3396074D2CB@exchsth.agresearch.co.nz>
Message-ID: <38E4AD0A-5D35-4B1F-8E0D-5696B07AFD77@illinois.edu>

One should be able to do this via Bio::DB::SeqFeature::Store.  However, going from a Bio::Seq to that is the tricky part.

chris

Sent from my iPad

On Jul 12, 2011, at 6:12 PM, "Smithies, Russell"  wrote:

> I've done it before by taking a segment of the sequence and looking for features in that.
> 
> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',-dsn=> 'hapmap:gbrowsemysql')or die "Can't open database:",Bio::DB::GFF->error,"\n";
> my $segment = $db->segment(-class=>'Chromosome',-name=> "Chr1", -start=>$start, -end=>$end);
> my @repeats = $segment->features(-types=> ['match:UCSC_REPEATMASK']);
> 
> --Russell
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Adam Witney
>> Sent: Wednesday, 13 July 2011 6:04 a.m.
>> To: bioperl-l at bioperl.org
>> Subject: [Bioperl-l] how to find SeqFeature at specific sequence
>> location
>> 
>> Hi,
>> 
>> Is there an easy way of finding the SeqFeatures at a specific base
>> location on a Bio::Seq? I guess I can do it by calling
>> get_all_SeqFeatures and testing start/stop coordinates, but just
>> wondered if there was a better way.
>> 
>> Thanks
>> 
>> Adam
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lrg_ml at gmx.net  Wed Jul 13 01:50:51 2011
From: lrg_ml at gmx.net (Lutz Gehlen)
Date: Wed, 13 Jul 2011 17:50:51 +1200
Subject: [Bioperl-l] Getting started with bioperl-db
In-Reply-To: <36D3A446-3E6C-4DF9-9611-21FA85DAC25F@illinois.edu>
References: <201107121233.30696.lrg_ml@gmx.net>
	<36D3A446-3E6C-4DF9-9611-21FA85DAC25F@illinois.edu>
Message-ID: <201107131750.51914.lrg_ml@gmx.net>

Hello Chris,
thank you for your reply.

On Tuesday, July 12, 2011 14:04:30 Chris Fields wrote:
> On Jul 11, 2011, at 7:33 PM, Lutz Gehlen wrote:
> > The next problems come when using bioperl-db. As I said, I
> > already have a BioSQL database. My questions are of the type
> > like "Is there a gene that overlaps with the region
> > 15000-17000 on chromosome XY?" or "What are the coordinates of
> > gene ABC123?" I haven't found any documentation on bioperl-db
> > that would help me do that. From my search I rather got the
> > impression that I have to understand a substantial part of
> > BioPerl first before I can get going. However, it is very
> > difficult to identify which parts of the HUGE BioPerl project
> > I have to work through.
> 
> It does involve some overhead.
> 
> > Plus I haven't found any
> > comprehensive documentation of BioPerl at all.
> 
> I find that a bit hard to believe (the 'at all' part).  Yes, some
> parts are less-than-optimally documented, but the wiki has quite
> a bit.  See here:
> 
> http://www.bioperl.org/wiki/Main_Page

I would like to apologize for that criticism. It was inaccurate and 
likely to offend which was not my intention at all. You are right, 
there is a lot of documentation (not for bioperl-db, though, as far 
as I know), for the inexperienced user, it is just very hard to get 
an overview.

Please don't get me wrong. It might sound that I just came here to 
complain. This is not the case. I appreciate the massive work that 
has been done and I am well aware that the developers have no 
obligation at all to make it easier for other people to get into 
BioPerl.

However, in my case, I have abandoned the attempt for now. For the 
simple scenarios that I have the cost-benefit ratio is just too bad. 
I will rather query the BioSQL database directly.

Thanks again for your help
Lutz

From galbo at digitus.itk.ppke.hu  Wed Jul 13 06:58:03 2011
From: galbo at digitus.itk.ppke.hu (=?utf-8?B?R2FsYsOhdHMgQm9yaXN6?=)
Date: Wed, 13 Jul 2011 12:58:03 +0200 (CEST)
Subject: [Bioperl-l] Bioperl 1.6.1 problem
Message-ID: 

Dear Sir or Madam!

I'm new to Bioperl and got two probably very obvious problem (but I can't
find the solution). I installed bioperl 1.6.1 to Win XP (ActivePerl 5.12).
I hope I wrote to the appropriate e-mail address to discuss my problem.

I copy my code here and the error message:

#!/usr/bin/perl

use warnings;
use Bio::Perl;

$seq = get_sequence('swissprot', 'P43780');
write_sequence(">P43.fasta",'fasta',$seq);


-------------------------(run the script)----------------------

C:\Documents and Settings\Galb?ts Borisz>perl biotest.perl

------------- EXCEPTION -------------
MSG: WebDBSeqI Error - check query sequences!

STACK Bio::DB::WebDBSeqI::get_seq_stream
C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:50
8
STACK Bio::DB::WebDBSeqI::get_Stream_by_acc
C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm
:314
STACK Bio::DB::WebDBSeqI::get_Seq_by_acc
C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:18
6
STACK Bio::Perl::get_sequence C:/Perl/site/lib/Bio/Perl.pm:520
STACK toplevel biotest.perl:11
-------------------------------------

(if I use EMBL database, it works well)





My second problem is the following (script):

#!/usr/bin/perl
use warnings;
use Bio::Perl;

$blast_result=blast_sequence(CTTCCCTTTACCTGTGCACCACTCCCTAATAAATTCATCTCCATTGGGAAA);
write_blast(">bl.txt",$blast_result);

-------------------------(run the script)----------------------

...........................ck to sign
in"onclick="MyNCBI_auto_submit('http://www.ncbi.nlm.nih.gov/sites/myn
cbi/?back_url=http%3A//blast.ncbi.nlm.nih.gov/Blast.cgi%3FALIGNMENTS%3D50%26ALIG
NMENT%5FVIEW%3DPairwise%26CMD%3DPut%26COMPOSITION%5FBASED%5FSTATISTICS%3Doff%26D
ATABASE%3Dnr%26DESCRIPTIONS%3D100%26ERROR%3DMessage%2BID%252324%2BError%253A%2BF
ailed%2Bto%2Bread%2Bthe%2BBlast%2Bquery%253A%2BNucleotide%2BFASTA%2Bprovided%2Bf
or%2Bprotein%2Bsequence%26EXPECT%3D1e%2D10%26FILTER%3DL%26FORMAT%5FOBJECT%3DAlig
v/" title="National Library of
Me............................................. (quite long part)
dicine">NLM |      NIH |      DHHS   

Copyright | Disclaimer | Privacy | Accessibility | Contact | Send feedback

--------------------------------------------------- Submitted Blast for [blast-sequence-temp-id] It seems it works but the output file (bl.txt) is always empty, whatever is the input sequence. The problem is I have no idea what should I expect when I run the script. Thank you for you help in advance. Your sincerely Borisz Galb?ts (Hungary) From awitney at sgul.ac.uk Wed Jul 13 18:03:28 2011 From: awitney at sgul.ac.uk (Adam Witney) Date: Wed, 13 Jul 2011 23:03:28 +0100 Subject: [Bioperl-l] how to find SeqFeature at specific sequence location In-Reply-To: <38E4AD0A-5D35-4B1F-8E0D-5696B07AFD77@illinois.edu> References: <18DF7D20DFEC044098A1062202F5FFF3396074D2CB@exchsth.agresearch.co.nz> <38E4AD0A-5D35-4B1F-8E0D-5696B07AFD77@illinois.edu> Message-ID: <118215D0-A837-4D72-BA8F-548A2CF5CAF6@sgul.ac.uk> Thanks for your replies guys. I was trying to do it from the BioSQL schema and Bio::DB::BioDB but after some further investigation it looks like Bio::DB::GFF or Bio::DB::SeqFeature::Store are the only ways to do it Thanks adam On 13 Jul 2011, at 01:06, Chris UI wrote: > One should be able to do this via Bio::DB::SeqFeature::Store. However, going from a Bio::Seq to that is the tricky part. > > chris > > Sent from my iPad > > On Jul 12, 2011, at 6:12 PM, "Smithies, Russell" wrote: > >> I've done it before by taking a segment of the sequence and looking for features in that. >> >> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',-dsn=> 'hapmap:gbrowsemysql')or die "Can't open database:",Bio::DB::GFF->error,"\n"; >> my $segment = $db->segment(-class=>'Chromosome',-name=> "Chr1", -start=>$start, -end=>$end); >> my @repeats = $segment->features(-types=> ['match:UCSC_REPEATMASK']); >> >> --Russell >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Adam Witney >>> Sent: Wednesday, 13 July 2011 6:04 a.m. >>> To: bioperl-l at bioperl.org >>> Subject: [Bioperl-l] how to find SeqFeature at specific sequence >>> location >>> >>> Hi, >>> >>> Is there an easy way of finding the SeqFeatures at a specific base >>> location on a Bio::Seq? I guess I can do it by calling >>> get_all_SeqFeatures and testing start/stop coordinates, but just >>> wondered if there was a better way. >>> >>> Thanks >>> >>> Adam >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jul 13 23:05:12 2011 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 13 Jul 2011 22:05:12 -0500 Subject: [Bioperl-l] Bioperl 1.6.1 problem In-Reply-To: References: Message-ID: <6231395D-0F59-43ED-953F-ACF6591D046D@illinois.edu> This works for me using the latest release on CPAN (1.6.901). I believe the error is a change in the URL which was fixed post-1.6.1. chris On Jul 13, 2011, at 5:58 AM, Galb?ts Borisz wrote: > Dear Sir or Madam! > > I'm new to Bioperl and got two probably very obvious problem (but I can't > find the solution). I installed bioperl 1.6.1 to Win XP (ActivePerl 5.12). > I hope I wrote to the appropriate e-mail address to discuss my problem. > > I copy my code here and the error message: > > #!/usr/bin/perl > > use warnings; > use Bio::Perl; > > $seq = get_sequence('swissprot', 'P43780'); > write_sequence(">P43.fasta",'fasta',$seq); > > > -------------------------(run the script)---------------------- > > C:\Documents and Settings\Galb?ts Borisz>perl biotest.perl > > ------------- EXCEPTION ------------- > MSG: WebDBSeqI Error - check query sequences! > > STACK Bio::DB::WebDBSeqI::get_seq_stream > C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:50 > 8 > STACK Bio::DB::WebDBSeqI::get_Stream_by_acc > C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm > :314 > STACK Bio::DB::WebDBSeqI::get_Seq_by_acc > C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:18 > 6 > STACK Bio::Perl::get_sequence C:/Perl/site/lib/Bio/Perl.pm:520 > STACK toplevel biotest.perl:11 > ------------------------------------- > > (if I use EMBL database, it works well) > > > > > > My second problem is the following (script): > > #!/usr/bin/perl > use warnings; > use Bio::Perl; > > $blast_result=blast_sequence(CTTCCCTTTACCTGTGCACCACTCCCTAATAAATTCATCTCCATTGGGAAA); > write_blast(">bl.txt",$blast_result); > > -------------------------(run the script)---------------------- > > ...........................ck to sign > in"onclick="MyNCBI_auto_submit('http://www.ncbi.nlm.nih.gov/sites/myn > cbi/?back_url=http%3A//blast.ncbi.nlm.nih.gov/Blast.cgi%3FALIGNMENTS%3D50%26ALIG > NMENT%5FVIEW%3DPairwise%26CMD%3DPut%26COMPOSITION%5FBASED%5FSTATISTICS%3Doff%26D > ATABASE%3Dnr%26DESCRIPTIONS%3D100%26ERROR%3DMessage%2BID%252324%2BError%253A%2BF > ailed%2Bto%2Bread%2Bthe%2BBlast%2Bquery%253A%2BNucleotide%2BFASTA%2Bprovided%2Bf > or%2Bprotein%2Bsequence%26EXPECT%3D1e%2D10%26FILTER%3DL%26FORMAT%5FOBJECT%3DAlig > v/" title="National Library of > Me............................................. (quite long part) > dicine">NLM | NIH | DHHS

href='http://www.ncb > i.nlm.nih.gov/About/disclaimer.html' title='NCBI intellectual > property stat > ement'>Copyright | href='http://www.ncbi.nlm.nih.gov/About/disclaim > er.html#disclaimer' title='About liability, endorsements, external > links, p > op-up advertisements'>Disclaimer | href='http://www.nlm.nih.gov/pri > vacy.html' title='NLM privacy policy'>Privacy | href='http://w > ww.ncbi.nlm.nih.gov/About/accessibility.html' title='About using NCBI > resou > rces with assistive technology'>Accessibility | href='http://www.ncb > i.nlm.nih.gov/About/glance/contact_info.html' title='How to get help, > submi > t data, or provide feedback'>Contact | href='mailto:blast-help at ncbi. > nlm.nih.gov' title='How to get help, submit data, or provide > feedback'>Send > feedback

> --------------------------------------------------- > Submitted Blast for [blast-sequence-temp-id] > > > > It seems it works but the output file (bl.txt) is always empty, whatever > is the input sequence. The problem is I have no idea what should I expect > when I run the script. > > > > Thank you for you help in advance. > > Your sincerely Borisz Galb?ts (Hungary) > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Thu Jul 14 04:05:07 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 14 Jul 2011 02:05:07 -0600 Subject: [Bioperl-l] Fwd: URGENT! Update of Bio::DB::SwissProt References: <4E1E9F93.7050005@isb-sib.ch> Message-ID: <67267DFC-4EE5-4E1C-A7AC-7AC2B94D8975@gmail.com> Okay, I am forwarding to the mailing list, I guess no one has updated the module in a while to reflect this change. Am I right in thinking the right way to get a Swissprot record by accession is now like this -- perhaps a developer can refactor the code to be use this URL base soon. http://www.uniprot.org/uniprot/P22815.txt Thanks. Jason Begin forwarded message: > From: Severine Duvaud > Date: July 14, 2011 1:49:39 AM MDT > To: jason at bioperl.org > Subject: URGENT! Update of Bio::DB::SwissProt > > Dear Jason, > > We have just received an e-mail from one of our users complaining about the fact that sprot-retrieve-list.pl is missing on the new ExPASy: > ===================================== > For example, BioPerl's Bio::DB::SwissProt module keeps querying > 'http://**.expasy.org/cgi-bin/sprot-retrieve-list.pl' which seems to > be now a dead link. > ===================================== > sprot-retrieve-list.pl has been deprecated since 2008. > The UniProtKB entry retrieval is now supported by the UniProt website only: > http://www.uniprot.org/batch/ > Could you please update your module? > Many thanks for your help. > Best regards, > > Severine Duvaud, ExPASy web team / UniProt Consortium. From fs5 at sanger.ac.uk Thu Jul 14 11:08:46 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 14 Jul 2011 16:08:46 +0100 Subject: [Bioperl-l] limit the number of blast output per query In-Reply-To: <4E157637.3020408@fmi.ch> References: <01f901cb7203$f66e4040$e34ac0c0$%yin@ucd.ie> <004001cbe2d5$76598200$630c8600$@edu.hk> <005301cbe31b$a3bee550$eb3caff0$@edu.hk> <9CD1455E-88B4-4E2A-B3BC-398C10D5AAA9@tamu.edu> <3E73745F-A687-4229-B71E-5C56B2D1FBAE@illinois.edu> <009001cc2edc$09b80740$1d2815c0$@edu.hk> <4DFE907D.1000204@gmail.com> <00d001cc3c4f$9bddd070$d3997150$@edu.hk> <4E157637.3020408@fmi.ch> Message-ID: <1310656126.26243.158.camel@deskpro15336.internal.sanger.ac.uk> -b limits the number of output alignments, not sure why it isn't working for you. How many HSPs do you actually get? Is there a chance that the output is just so large because of a large number of queries and you are not seeing 10k+ HSPs but results? Can you post the entire blastall command? The large number of hits looks like you are blasting short sequences, there are better algorithms for short sequence alignments, so if that's what you are doing I could give you some hints for alternative software to try. Frank On Thu, 2011-07-07 at 11:02 +0200, Hans-Rudolf Hotz wrote: > Hi > > just double checking: are you really talking abut "10,000+ hits"? or do > you mean "10,000+ HSPs" ('high-scoring segment pairs')? > > I don't know how your genome database looks like, but assuming you have > one sequence per chromosome, then you will get just 24 hits (ie each > chromosome) and then depending on your query each hit will have a lot of > HSPs. > > As far as as I know, there is no way to limit the number of HSPs (you > might try playing with the E value). > > You can try using the tabular output format (this will reduce the file > size) - or may be BLAST is not the right search tool for your task? > > > Regards, Hans > > > On 07/07/2011 04:43 AM, Ross KK Leung wrote: > > I know this question should submit to BLAST help but it seems they have > > already been overwhelmed by incoming emails. I wonder any bioperl users > > happen to know how to limit the number of blast output per query. For > > example, for human genome as a database to blast against, a single query can > > generate 10,000+ hits. I have already supplied -b 30 -v 30 flags but > > obviously the blastall from blast2.2.22 does not "obey" my instruction. > > > > The output files generated are usually larger than 100G+ but indeed the > > final ones that I want usually are only of 10M-. Is there any way to help > > save our Earth (Not exaggerated, energy is WASTED in a meaningless manner)? > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From hartzell at alerce.com Thu Jul 14 19:05:48 2011 From: hartzell at alerce.com (George Hartzell) Date: Thu, 14 Jul 2011 16:05:48 -0700 Subject: [Bioperl-l] help with Bio::SearchIO::hmmer bug Message-ID: <19999.30284.535898.685294@gargle.gargle.HOWL> Hi All, I just filed https://redmine.open-bio.org/issues/3264, which describes a problem with the hmmer output parser, when used in conjunction with Bio::Tools::Run::Hmmer wrapper. It's not a crisis, but it's one of the things that's keeping me from doing automated builds of things that use bioperl. A fix would be great. Thanks, g. From avilella at gmail.com Thu Jul 14 21:02:40 2011 From: avilella at gmail.com (Albert Vilella) Date: Fri, 15 Jul 2011 02:02:40 +0100 Subject: [Bioperl-l] fwding question blast vs blast+ Message-ID: http://biostar.stackexchange.com/questions/10295/bioperl-has-different-behaviours-in-parsing-blast-and-blast-result From bi_my_heart at hotmail.com Thu Jul 14 03:54:14 2011 From: bi_my_heart at hotmail.com (jmi k) Date: Thu, 14 Jul 2011 03:54:14 -0400 Subject: [Bioperl-l] can't retrieve description using Bio::DB::EntrezGene Message-ID: Hi, I'm trying to retrieve the description of a gene's file using Bio::DB::EntrezGene. Here is the relevant code from my program: use Bio::DB::EntrezGene;use Bio::ASN1::EntrezGene; # I think I have to include this my $gb = Bio::DB::EntrezGene->new;my $geneobj = Bio::Seq->new();...$geneobj = $gb->get_Seq_by_id($geneid);my $genedesc = $geneobj->desc();print $genedesc; # but nothing shows up :( I've found a similar discussion at http://old.nabble.com/Bio::DB::EntrezGene-or-Bio::DB::Query::GenBank-to-obtain-sequence-metadata-without-sequence-td25816381.html but I don't understand why they can't use Bio::DB::EntrezGene directly. Thanks in advance! Regards,Jamie From ross at cuhk.edu.hk Thu Jul 14 19:05:27 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Fri, 15 Jul 2011 07:05:27 +0800 Subject: [Bioperl-l] limit the number of blast output per query In-Reply-To: <1310656126.26243.158.camel@deskpro15336.internal.sanger.ac.uk> References: <01f901cb7203$f66e4040$e34ac0c0$%yin@ucd.ie> <004001cbe2d5$76598200$630c8600$@edu.hk> <005301cbe31b$a3bee550$eb3caff0$@edu.hk> <9CD1455E-88B4-4E2A-B3BC-398C10D5AAA9@tamu.edu> <3E73745F-A687-4229-B71E-5C56B2D1FBAE@illinois.edu> <009001cc2edc$09b80740$1d2815c0$@edu.hk> <4DFE907D.1000204@gmail.com> <00d001cc3c4f$9bddd070$d3997150$@edu.hk> <4E157637.3020408@fmi.ch> <1310656126.26243.158.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <004701cc427a$85187e40$8f497ac0$@edu.hk> blastall -p blastn -F F -b 50 -v 50 -i query.fna -d genomes.fna -e 1e-30 -o output.tab -m 8 -a 8 as what Hans-Rudolf Hotz commented, I guess there are just too many HSPs for hitting human genome and all my downstream analysis programs rely on blastall m8 output, now I just don't know how to adapt to other new formats shortly. -----Original Message----- From: Frank Schwach [mailto:fs5 at sanger.ac.uk] Sent: 2011??7??14?? 23:09 To: Hans-Rudolf Hotz Cc: Ross KK Leung; bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] limit the number of blast output per query -b limits the number of output alignments, not sure why it isn't working for you. How many HSPs do you actually get? Is there a chance that the output is just so large because of a large number of queries and you are not seeing 10k+ HSPs but results? Can you post the entire blastall command? The large number of hits looks like you are blasting short sequences, there are better algorithms for short sequence alignments, so if that's what you are doing I could give you some hints for alternative software to try. Frank On Thu, 2011-07-07 at 11:02 +0200, Hans-Rudolf Hotz wrote: > Hi > > just double checking: are you really talking abut "10,000+ hits"? or do > you mean "10,000+ HSPs" ('high-scoring segment pairs')? > > I don't know how your genome database looks like, but assuming you have > one sequence per chromosome, then you will get just 24 hits (ie each > chromosome) and then depending on your query each hit will have a lot of > HSPs. > > As far as as I know, there is no way to limit the number of HSPs (you > might try playing with the E value). > > You can try using the tabular output format (this will reduce the file > size) - or may be BLAST is not the right search tool for your task? > > > Regards, Hans > > > On 07/07/2011 04:43 AM, Ross KK Leung wrote: > > I know this question should submit to BLAST help but it seems they have > > already been overwhelmed by incoming emails. I wonder any bioperl users > > happen to know how to limit the number of blast output per query. For > > example, for human genome as a database to blast against, a single query can > > generate 10,000+ hits. I have already supplied -b 30 -v 30 flags but > > obviously the blastall from blast2.2.22 does not "obey" my instruction. > > > > The output files generated are usually larger than 100G+ but indeed the > > final ones that I want usually are only of 10M-. Is there any way to help > > save our Earth (Not exaggerated, energy is WASTED in a meaningless manner)? > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From carandraug+dev at gmail.com Thu Jul 14 22:48:03 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Fri, 15 Jul 2011 03:48:03 +0100 Subject: [Bioperl-l] can't retrieve description using Bio::DB::EntrezGene In-Reply-To: References: Message-ID: On 14 July 2011 08:54, jmi k wrote: > > Hi, > I'm trying to retrieve the description of a gene's file using Bio::DB::EntrezGene. ?Here is the relevant code from my program: > use Bio::DB::EntrezGene;use Bio::ASN1::EntrezGene; # I think I have to include this > I've found a similar discussion at http://old.nabble.com/Bio::DB::EntrezGene-or-Bio::DB::Query::GenBank-to-obtain-sequence-metadata-without-sequence-td25816381.html but I don't understand why they can't use Bio::DB::EntrezGene directly. > Thanks in advance! > Regards,Jamie Hi Jamie, Your code works fine for me. Please always paste ALL of your code, not only the part where you think the error is. It's doesn't take long to do so, you get the answer to your problem faster and everyone wastes less time in the end. If it's a long piece of code use pastebin http://pastebin.com/ Also, please paste it properly formatted, not all in one line. Despite the fact that works, you're doing some unnecessary things such as creating a Bio::Seq object that you then write over it with the get_seq_by_id method. This should already return the sequence object, no need to create it first. Anyway, here's how to do it with Bio::DB::EUtilities (it's not the answer to your question but if you're having trouble with EntrezGene and don't mind which module to use to get the job done...) use Bio::DB::EUtilities; my $eutil = Bio::DB::EUtilities->new( -eutil => 'esummary', -db => 'gene', -id => $ref_to_array_with_uids ); while (my $docsum = $eutil->next_DocSum) { my ($description) = $docsum->get_contents_by_name('Description'); my ($summary) = $docsum->get_contents_by_name('Summary'); say $description; say $summary; } You can also use the methods to_string on $docsum to get a nice view of it and what contents you have retrieved. The get_Item_by_name method is also handy. Also, replace say with print and a newline if you're not using an up to date version of perl or upgrade your versoin of perl ;) Carn? From cjfields at illinois.edu Thu Jul 14 23:56:42 2011 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 14 Jul 2011 22:56:42 -0500 Subject: [Bioperl-l] fwding question blast vs blast+ In-Reply-To: References: Message-ID: <9901366B-52C3-4506-88E8-84F23C5C2D0B@illinois.edu> Having a bug report and example data helps tremendously. chris On Jul 14, 2011, at 8:02 PM, Albert Vilella wrote: > http://biostar.stackexchange.com/questions/10295/bioperl-has-different-behaviours-in-parsing-blast-and-blast-result > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Fri Jul 15 00:07:42 2011 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 15 Jul 2011 16:07:42 +1200 Subject: [Bioperl-l] fwding question blast vs blast+ In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF3396074D2FF@exchsth.agresearch.co.nz> Works fine for me parsing blast+ 2.2.24 and bioperl 1.61. Have you done a diff on the output from the 2 versions of blast+ to see what's changed? I suspect NCBI has changed the output slightly and broken our parser. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Albert Vilella > Sent: Friday, 15 July 2011 1:03 p.m. > To: bioperl-l > Subject: [Bioperl-l] fwding question blast vs blast+ > > http://biostar.stackexchange.com/questions/10295/bioperl-has-different- > behaviours-in-parsing-blast-and-blast-result > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From fs5 at sanger.ac.uk Fri Jul 15 04:26:17 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Fri, 15 Jul 2011 09:26:17 +0100 Subject: [Bioperl-l] limit the number of blast output per query In-Reply-To: <004701cc427a$85187e40$8f497ac0$@edu.hk> References: <01f901cb7203$f66e4040$e34ac0c0$%yin@ucd.ie> <004001cbe2d5$76598200$630c8600$@edu.hk> <005301cbe31b$a3bee550$eb3caff0$@edu.hk> <9CD1455E-88B4-4E2A-B3BC-398C10D5AAA9@tamu.edu> <3E73745F-A687-4229-B71E-5C56B2D1FBAE@illinois.edu> <009001cc2edc$09b80740$1d2815c0$@edu.hk> <4DFE907D.1000204@gmail.com> <00d001cc3c4f$9bddd070$d3997150$@edu.hk> <4E157637.3020408@fmi.ch> <1310656126.26243.158.camel@deskpro15336.internal.sanger.ac.uk> <004701cc427a$85187e40$8f497ac0$@edu.hk> Message-ID: <1310718378.26243.170.camel@deskpro15336.internal.sanger.ac.uk> So how many HSPs do you get? I noticed that your database file is called "genomes", is that a collection of many genomes then? Could you break that up? I guess you are mapping short sequences, possibly repetitive ones too, so the question is whether you would even gain anything from limiting the output number if you then still couldn't say where in the genome your sequence belongs to? Try one of the many short-read alignment programs out there, Something like SSAHA, SMALT, BWA or bowtie (there are many more...) and make use of bam files for storing large amounts of alignments in compressed form. There are the "samtools" to work with these files and there is a BioPerl wrapper for that called BIo::DB::SAM. It should not be too difficutl to adapt your scripts and you can always ask for help here. Cheers, Frank On Fri, 2011-07-15 at 07:05 +0800, Ross KK Leung wrote: > blastall -p blastn -F F -b 50 -v 50 -i query.fna -d genomes.fna -e 1e-30 -o > output.tab -m 8 -a 8 > > as what Hans-Rudolf Hotz commented, I guess there are just too many HSPs for > hitting human genome and all my downstream analysis programs rely on > blastall m8 output, now I just don't know how to adapt to other new formats > shortly. > > -----Original Message----- > From: Frank Schwach [mailto:fs5 at sanger.ac.uk] > Sent: 2011?7?14? 23:09 > To: Hans-Rudolf Hotz > Cc: Ross KK Leung; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] limit the number of blast output per query > > -b limits the number of output alignments, not sure why it isn't working > for you. How many HSPs do you actually get? Is there a chance that the > output is just so large because of a large number of queries and you are > not seeing 10k+ HSPs but results? > Can you post the entire blastall command? The large number of hits looks > like you are blasting short sequences, there are better algorithms for > short sequence alignments, so if that's what you are doing I could give > you some hints for alternative software to try. > Frank > > > > On Thu, 2011-07-07 at 11:02 +0200, Hans-Rudolf Hotz wrote: > > Hi > > > > just double checking: are you really talking abut "10,000+ hits"? or do > > you mean "10,000+ HSPs" ('high-scoring segment pairs')? > > > > I don't know how your genome database looks like, but assuming you have > > one sequence per chromosome, then you will get just 24 hits (ie each > > chromosome) and then depending on your query each hit will have a lot of > > HSPs. > > > > As far as as I know, there is no way to limit the number of HSPs (you > > might try playing with the E value). > > > > You can try using the tabular output format (this will reduce the file > > size) - or may be BLAST is not the right search tool for your task? > > > > > > Regards, Hans > > > > > > On 07/07/2011 04:43 AM, Ross KK Leung wrote: > > > I know this question should submit to BLAST help but it seems they have > > > already been overwhelmed by incoming emails. I wonder any bioperl users > > > happen to know how to limit the number of blast output per query. For > > > example, for human genome as a database to blast against, a single query > can > > > generate 10,000+ hits. I have already supplied -b 30 -v 30 flags but > > > obviously the blastall from blast2.2.22 does not "obey" my instruction. > > > > > > The output files generated are usually larger than 100G+ but indeed the > > > final ones that I want usually are only of 10M-. Is there any way to > help > > > save our Earth (Not exaggerated, energy is WASTED in a meaningless > manner)? > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From tzhu at mail.bnu.edu.cn Sun Jul 17 03:46:07 2011 From: tzhu at mail.bnu.edu.cn (Tao Zhu) Date: Sun, 17 Jul 2011 15:46:07 +0800 Subject: [Bioperl-l] Bug or special design of the 'length' method for Bio::Seq ? Message-ID: <4E22933F.4010607@mail.bnu.edu.cn> Hi,everyone Suppose a protein sequence like: >Protein MAASEHRCVGCGFRVKSLF* Do you think the length of such sequence is 19 or 20? In my opinion, the star "*" is only a terminal symbol of a protein sequence, so it shouldn't be counted into protein length. But in fact the "length" method of Bio::Seq results in length of 20. -- Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing 100875, China Email: tzhu at mail.bnu.edu.cn From p.j.a.cock at googlemail.com Sun Jul 17 09:40:40 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 17 Jul 2011 14:40:40 +0100 Subject: [Bioperl-l] Bug or special design of the 'length' method for Bio::Seq ? In-Reply-To: <4E22933F.4010607@mail.bnu.edu.cn> References: <4E22933F.4010607@mail.bnu.edu.cn> Message-ID: This is deliberately giving the length of the string (Biopython does the same). Have you considered what would you expect for this example sequence? i.e. Where you translate a whole sequence including all the stop codons? >Translation MAASEHRCVGCGFRVKSLF*AMKLMNO*P It is a practical decision to give the length including the stop symbols, so that the sequence behaves like a Perl string. Peter On 7/17/11, Tao Zhu wrote: > Hi,everyone > Suppose a protein sequence like: > > >Protein > MAASEHRCVGCGFRVKSLF* > > Do you think the length of such sequence is 19 or 20? In my opinion, the > star "*" is only a terminal symbol of a protein sequence, so it > shouldn't be counted into protein length. But in fact the "length" > method of Bio::Seq results in length of 20. > > -- > Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing > 100875, China > Email: tzhu at mail.bnu.edu.cn > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Sun Jul 17 10:20:48 2011 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 17 Jul 2011 09:20:48 -0500 Subject: [Bioperl-l] Bug or special design of the 'length' method for Bio::Seq ? In-Reply-To: References: <4E22933F.4010607@mail.bnu.edu.cn> Message-ID: <5C692A10-1CA8-4173-819A-E5F6738EDE99@illinois.edu> length() is defined in BioPerl as 'Get the length of the sequence in number of symbols (bases or amino acids)'. We count '*' as a translated codon and as part of length() for the reasons Peter mentions. One can also set the length for a 'virtual' sequence (no actual sequence present), but if a sequence is present it's not supposed to lie either (e.g. you can't just set it to anything). chris On Jul 17, 2011, at 8:40 AM, Peter Cock wrote: > This is deliberately giving the length of the string (Biopython does the same). > > Have you considered what would you expect for this example sequence? > i.e. Where you translate a whole sequence including all the stop > codons? > >> Translation > MAASEHRCVGCGFRVKSLF*AMKLMNO*P > > It is a practical decision to give the length including the stop > symbols, so that the sequence behaves like a Perl string. > > Peter > > On 7/17/11, Tao Zhu wrote: >> Hi,everyone >> Suppose a protein sequence like: >> >>> Protein >> MAASEHRCVGCGFRVKSLF* >> >> Do you think the length of such sequence is 19 or 20? In my opinion, the >> star "*" is only a terminal symbol of a protein sequence, so it >> shouldn't be counted into protein length. But in fact the "length" >> method of Bio::Seq results in length of 20. >> >> -- >> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing >> 100875, China >> Email: tzhu at mail.bnu.edu.cn >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From nestor at linuxmail.org Sun Jul 17 13:33:24 2011 From: nestor at linuxmail.org (Nestor Zaburannyi) Date: Sun, 17 Jul 2011 19:33:24 +0200 Subject: [Bioperl-l] Fastq quality values as strings Message-ID: <1752783251.20110717193324@linuxmail.org> Dear all, I want to read quality strings from *.fastq file just as they are, without any conversion to numeric values. Right now, i am reading every 4-th line of the file, but this could break easily. Is there any way to do this within SeqIO parser? Sincerely yours Nestor From carandraug+dev at gmail.com Sun Jul 17 16:38:33 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Sun, 17 Jul 2011 21:38:33 +0100 Subject: [Bioperl-l] can't retrieve description using Bio::DB::EntrezGene In-Reply-To: References: Message-ID: Hi you should use pastebin http://pastebin.com/ for such long pieces of code. Well, there's many ways to go around your problem. Depending on the amount of info you need from the gene entry and how much you really need to use the Bio::DB::EntrezGene (note that Bio::DB::EntrezGene doesn't actual return the DNA sequence. You''ll still need Bio::DB::Eutilities or Bio::DB::Genbank to get them) 1) If the module you use is not important, you only part of the info from the record (gene coordinates and contig accession number, name, etc...) and will need later to actually fetch the DNA sequences, use esummary from eutilities. This will make the code faster (since it downloads less information from the database) and easier to read. If you want to know what info is available from there run the following code which tells what you get for the gene 3014. use Bio::DB::EUtilities; my $eutil = Bio::DB::EUtilities->new( -eutil => 'esummary', -db => 'gene', -id => [3014], ); say $eutil->next_DocSum->to_string; For genes whose UID has been replaced (such as 724021) , there will be a value for 'CurrentID' and 'Status' will be 1. You can also use this module to get the actual sequences if you need later http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F 2) if you really want to use Bio::DB::EntrezGene and the name is all you want to extract from the record (this is the answer to your original question) then here's how to fix your code. The field you're looking for is the Bio::Seq object (in $gene, not in $uncaptured as you thought). It is however, in the annotations of the sequence. So you need to get the annotations, then use the right key to get the annotation you want. It will come in a hash tree so again you'll need a key. The following code should work. It's quite confusing, but take a look at this http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Getting_the_Annotations my $gene = $genein->next_seq; my $annotations = $gene->annotation; my ($anno) = $anno_collection->get_Annotations('Official Full Name'); my $hash_ref = $annotations->hash_tree; my ($key) = keys %{$hash_ref}; my $name = $hash_ref->{$key}; say $name; If you ever get confused, use the Data::Dumper module to see where things are. For example, on the code above you could do the following: my $gene = $genein->next_seq; use Data::Dumper; print Dumper $gene; 3) also if you want to use Bio::DB::Entrezgene, but want to extract more, I wouldn't look into the Bio::Seq object but into the Bio::ASN1::EntrezGene object directly. It's still a mess, so use Data::Dumper if you ever get lost on it to find your way. here's the code to do it that way ($response here is a string with all of the ASN1 entrezgene file, newlines and everything) use Bio::ASN1::EntrezGene; ## This use of the open function requires perl 5.8.0 or later open(my $seq_fh, "<", \$response) or die "Could not open sequences string for reading: $!"; my $parser = Bio::ASN1::EntrezGene->new(-fh => $seq_fh); while(my $result = $parser->next_seq){ $result = $result->[0] if(ref($result) eq 'ARRAY'); ## Data::Dumper can be used to look into the structure and find where things are # use Data::Dumper; # print Dumper ($result); foreach my $p (@{$result->{'properties'}}){ $p = $p->[0] if(ref($p) eq 'ARRAY'); next unless ($p->{'label'} && $p->{'label'} eq 'Nomenclature'); foreach my $pp (@{$p->{'properties'}}){ $pp = $pp->[0] if(ref($pp) eq 'ARRAY'); $name = $pp->{'text'} if ($pp->{'label'} && $pp->{'label'} eq 'Official Full Name'); $symbol = $pp->{'text'} if ($pp->{'label'} && $pp->{'label'} eq 'Official Symbol'); } } } I use this piece of code on the program that I'm currently writing. I also extract a bunch of more stuff from the entrezgene file. Take a look at http://pastebin.com/LVB7QpxZ if want to make some sense of it. This part is between lines 381 and 399. Part of what you're writing I already wrote there no need to reinvent the wheel. Hopefully is commented well enough for you to understand. Carn? From cjfields at illinois.edu Sun Jul 17 17:41:45 2011 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 17 Jul 2011 16:41:45 -0500 Subject: [Bioperl-l] Fastq quality values as strings In-Reply-To: <1752783251.20110717193324@linuxmail.org> References: <1752783251.20110717193324@linuxmail.org> Message-ID: <59FDB568-DAF8-4C82-8DEA-F63E39A4F71B@illinois.edu> If you are using the latest SeqIO parser, I believe the FASTQ string is returned via next_dataset() only (the raw qual string is passed in to the constructor but is ignored). This should be pretty fast as no objects are created (the returned data is a hashref), and the qual string key is '-raw_quality'. chris On Jul 17, 2011, at 12:33 PM, Nestor Zaburannyi wrote: > Dear all, > > I want to read quality strings from *.fastq file just as they are, without any conversion to numeric values. Right now, i am reading every 4-th line of the file, but this could break easily. Is there any way to do this within SeqIO parser? > > Sincerely yours > Nestor > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From carandraug+dev at gmail.com Sun Jul 17 21:57:33 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Mon, 18 Jul 2011 02:57:33 +0100 Subject: [Bioperl-l] new bioperl script Message-ID: Hi I've been working on a program to download sequences. I saw the scripts in bioperl and I thought it could be a good addition to them. The idea behind it is that given a list of queries, it searches the gene database and then downloads the gene sequence and it's associated products (only downloads the reference sequences although I may add support for alternates later). There's options to have extra upstream or downstream base pairs, and the naming of the sequence files (such as using the gene name rather than the UID. Or use the UIDs or accessions of related sequences). It's still not completely finished. Currently it only searches on entrezgene database but I'll be extending it to Ensembl this week. Would you be interested on something like this? I pasted it here http://pastebin.com/D3sY7hLb Carn? Draug From carandraug+dev at gmail.com Mon Jul 18 07:21:00 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Mon, 18 Jul 2011 12:21:00 +0100 Subject: [Bioperl-l] new bioperl script In-Reply-To: <7220e2ad1a4.4e242c37@ucd.ie> References: <7220e2ad1a4.4e242c37@ucd.ie> Message-ID: 2011/7/18 Jun Yin : > Hi, > > See this page for existing BioPerl modules retrieving sequences and > annotations from NCBI database. > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > > Ensembl also has BioPerl like APIs for batch downloading sequences and > annotations. So there is no need to implement it again. > http://www.ensembl.org/info/data/api.html > > Cheers, > Jun Hi Jun yes, I'm using EUtilities. Still, I didn't found that straightforward unless you already have the accession or ID. This script makes the search as if you were on the website (using eutilities), gets all info from the genes including gene products and then downloads those as well (again, using the BioPerl modules). Carn? From hlapp at drycafe.net Mon Jul 18 07:36:15 2011 From: hlapp at drycafe.net (Hilmar Lapp) Date: Mon, 18 Jul 2011 07:36:15 -0400 Subject: [Bioperl-l] Getting started with bioperl-db In-Reply-To: <201107131750.51914.lrg_ml@gmx.net> References: <201107121233.30696.lrg_ml@gmx.net> <36D3A446-3E6C-4DF9-9611-21FA85DAC25F@illinois.edu> <201107131750.51914.lrg_ml@gmx.net> Message-ID: <92EE41A0-2342-4493-B46F-EC304288388E@drycafe.net> Hi Lutz, Yes, indeed there is significantly less documentation for BioPerl-db, though there is quite a bit in the perldoc of the modules. The best starting points are probably Bio::DB::BioDB, Bio::DB::BioSQL::DBAdaptor, and Bio::DB::BioSQL::BasePersistenceAdaptor. Also there is the 2003 slideshow linked that explains the basic overview. Thanks for pointing out the errors on the wiki page, I've fixed them now. And finally, querying BioSQL directly is a perfectly fine solution. I've done that many times. -hilmar Sent with a tap. On Jul 13, 2011, at 1:50 AM, Lutz Gehlen wrote: > Hello Chris, > thank you for your reply. > > On Tuesday, July 12, 2011 14:04:30 Chris Fields wrote: >> On Jul 11, 2011, at 7:33 PM, Lutz Gehlen wrote: >>> The next problems come when using bioperl-db. As I said, I >>> already have a BioSQL database. My questions are of the type >>> like "Is there a gene that overlaps with the region >>> 15000-17000 on chromosome XY?" or "What are the coordinates of >>> gene ABC123?" I haven't found any documentation on bioperl-db >>> that would help me do that. From my search I rather got the >>> impression that I have to understand a substantial part of >>> BioPerl first before I can get going. However, it is very >>> difficult to identify which parts of the HUGE BioPerl project >>> I have to work through. >> >> It does involve some overhead. >> >>> Plus I haven't found any >>> comprehensive documentation of BioPerl at all. >> >> I find that a bit hard to believe (the 'at all' part). Yes, some >> parts are less-than-optimally documented, but the wiki has quite >> a bit. See here: >> >> http://www.bioperl.org/wiki/Main_Page > > I would like to apologize for that criticism. It was inaccurate and > likely to offend which was not my intention at all. You are right, > there is a lot of documentation (not for bioperl-db, though, as far > as I know), for the inexperienced user, it is just very hard to get > an overview. > > Please don't get me wrong. It might sound that I just came here to > complain. This is not the case. I appreciate the massive work that > has been done and I am well aware that the developers have no > obligation at all to make it easier for other people to get into > BioPerl. > > However, in my case, I have abandoned the attempt for now. For the > simple scenarios that I have the cost-benefit ratio is just too bad. > I will rather query the BioSQL database directly. > > Thanks again for your help > Lutz > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jun.yin at ucd.ie Mon Jul 18 06:51:03 2011 From: jun.yin at ucd.ie (Jun Yin) Date: Mon, 18 Jul 2011 12:51:03 +0200 Subject: [Bioperl-l] new bioperl script In-Reply-To: References: Message-ID: <7220e2ad1a4.4e242c37@ucd.ie> Hi, See this page for existing BioPerl modules retrieving sequences and annotations from NCBI database. http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook Ensembl also has BioPerl like APIs for batch downloading sequences and annotations. So there is no need to implement it again. http://www.ensembl.org/info/data/api.html Cheers, Jun ----- Original Message ----- From: Carn? Draug Date: Monday, July 18, 2011 4:26 am Subject: [Bioperl-l] new bioperl script To: bioperl-l at bioperl.org > Hi > > I've been working on a program to download sequences. I saw the > scripts in bioperl and I thought it could be a good addition to them. > The idea behind it is that given a list of queries, it searches the > gene database and then downloads the gene sequence and it's associated > products (only downloads the reference sequences although I may add > support for alternates later). > > There's options to have extra upstream or downstream base pairs, and > the naming of the sequence files (such as using the gene name rather > than the UID. Or use the UIDs or accessions of related sequences). > It's still not completely finished. Currently it only searches on > entrezgene database but I'll be extending it to Ensembl this week. > > Would you be interested on something like this? I pasted it here > http://pastebin.com/D3sY7hLb > > Carn? Draug > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From tiffanie.moss at gmail.com Mon Jul 18 12:20:10 2011 From: tiffanie.moss at gmail.com (Tiffanie Moss) Date: Mon, 18 Jul 2011 12:20:10 -0400 Subject: [Bioperl-l] newby Q: fetch sequence from one database and blast against another and retrieve coordinates Message-ID: i have two fasta files I want to use as databases - a contig file and a scaffold file (from the assembled contigs). For the sequences I am interested in, I have the coordinates to the contig database and I need the corresponding coordinates to the scaffold database. The contigs in the contig file are identified as partials of a scaffold (ie. scaffold1000-3, scaffold1000-4, etc) and the scaffolds are listed in the scaffold file in singletons (ie. scaffold1000, scaffold 1002, etc). I want a script that can use the coordinates to the contig file to fetch the sequence and then blast that sequence against the corresponding scaffold in the scaffold file and provide the coordinates (start and stop). Then I want to compare these coordinates to another file containing EST scaffold location coordinates in order to determine if my sequences of interest are located in these regions.Can anyone guide me as to where I can find a perl or bioperl script that I can manipulate to do this. I've started a script that will compare the scaffold coordinates of the two files, but first I need to extract the sequences from the contig file and get the corresponding Scaffold location coordinates. Many thanks in advance. -- Tiffanie Yael Moss PhD candidate Case Western Reserve University Department of Biology 2080 Adelbert Road, Millis 127 Cleveland, Ohio 44106-7080 Fax: (216) 368-4672 Ph: (216) 368-5301 From bi_my_heart at hotmail.com Sun Jul 17 02:35:56 2011 From: bi_my_heart at hotmail.com (jmi k) Date: Sun, 17 Jul 2011 02:35:56 -0400 Subject: [Bioperl-l] can't retrieve description using Bio::DB::EntrezGene In-Reply-To: References: , , Message-ID: Carn?, Thank you for your reply. Actually, I have pasted my code with proper formatting, but it seems like hotmail didn't do a good job of keeping those newlines :P I was afraid the code to my whole program would be too long to read. Anyway, sorry for any inconvenience caused. After reading your reply, I tried my code with a different gene id (2), and realized that desc() does retrieve something, but it's not what I expected. Specifically, I want to retrieve the gene description from this entrezgene record, along with other genes from M. smegmatis: (hope it shows properly...or please see http://www.ncbi.nlm.nih.gov/gene?term=4535615) dnaN DNA polymerase III subunit beta[Mycobacterium smegmatis str. MC2 155] Gene ID: 4535615, created on 30-Nov-2006 SUMMARY ------------------------------------------------------------------------------------------------- Gene description: DNA polymerase III subunit beta Locus tag: MSMEG_0001 Gene type: protein coding RefSeq status: VALIDATED Organism: Mycobacterium smegmatis str. MC2 155 Lineage: Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales; Corynebacterineae; Mycobacteriaceae; Mycobacterium (In my previous email, I used gene id 1234567 because I substituted a variable to avoid confusion, as I didn't paste the entire code.)I discovered by using gene id 2 that desc() retrieves the Summary (under the bigger SUMMARY heading) entry of entrezgene records: SUMMARY ------------------------------------------------------------------------------------------------- Official Symbol: A2M (provided by HGNC) Official full name: alpha-2-macroglobulin (provided by HGNC) Primary source: HGNC:7 See related: Ensembl:ENSG00000175899; HPRD:00072; MIM:103950 Gene type: protein coding RefSeq status: REVIEWED Organism: Homo sapiens Lineage: Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo Also known as: CPAMD5; FWP007; S863-7; DKFZp779B086 Summary: Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. [provided by RefSeq] but what I wanted is the Gene description (DNA polymerase III subunit beta).So I modified my code to look like this: (Now this is the code for my entire program...please ignore the commented-out code near the end...I am painstakingly working on it. The code most relevant to my question is in green.) #!/usr/bin/perl#mutation-finder#by: jamieuse warnings;use strict;use lib "/usr/share/perl5/";use Bio::Seq;use Bio::SeqIO;use Getopt::Long;use Bio::DB::EntrezGene;use Bio::ASN1::EntrezGene; #not sure if I have to include thismy %options;GetOptions(\%options, 'i=s', 'o=s', 'headline', 'cutoff=i');# declare the perl command line flagsprint "What is the threshold for detecting mutations? Enter a fraction (0-1): ";my $threshold = ;my ($Filein, $Fileout, $Flag, $cutoff);my $headline = '';if ($options{i}) { if ($options{i} =~ /(.*)/) {# input file: /home/jmi/Documents/E5_NC_008596.complete.pileup open IN, $+ or die "Can't open the file: $!"; }}my $seqin = Bio::SeqIO->new(-format => 'GenBank', -file=>'/home/jmi/Documents/NC_008596.gb');my $seqobj = $seqin->next_seq();my $dbentrezgene = Bio::DB::EntrezGene->new();my @feat = ($seqobj->get_SeqFeatures()); #get feature objects NOT subfeatures...tag?if ($options{o}) { if ($options{o} =~ /(.*)/) {# output file: /home/jmi/Documents/E5_NC_008596.complete.pileup.output.?percent die "Can't open the file: $!" unless open OUT, ">$+"; }}if ($options{cutoff}) { if ($options{cutoff} =~ /(.*)/) { $cutoff = $+; print OUT "Cutoff: ",$cutoff,"\t"; }}if ($options{headline}) { print OUT "Threshold (0-1): ",$threshold,"\nCoordinate\tDepth\t\tReference Base\t# of ref. base\t% ref. base\tTop 1 base\t# of top 1 base\t% top 1 base\tTop 2 base\t# of top 2 base\t% top 2 base\tGene name\tGene ID\t\tGene full name\tStrand\t\tGene begin\tGene end\n";}my ($refbase, $refbasecount, $mutbase, $mut2base, $coordinate, $vMax,$v2Max, $k, $v, $length, $tbd, $top1percentage, $top2percentage, $refpercentage, $start, $end, @name, @db_xref, $strand, $strandsign, @genefullname, $isgene);my $readbases = "";my $geneid = 0; #preset for programming reasonswhile (my $in = ) {#store coordinate if ($in =~ /\|\t([0-9]*)\t/) { $coordinate = $+; if (defined($cutoff)) { last if $coordinate > $cutoff; } }#store reference base in $refbase if ($in =~ /\t([A-Z])\t/) { $refbase = $+; }# store and process read bases in $readbases if ($in =~ /\t[ATCG]\t[0-9]*\t([[:punct:][:alnum:]]*)\t/) { $readbases = $+; $readbases =~ s/\^.|\*|\$//g; while ($readbases =~ m/(?<=[\+\-])\d+/g) { my $pos = pos($readbases); $tbd = substr $readbases, $pos, $&; $readbases =~ s/[\+\-]\d+$tbd//; } } my $count1 = $readbases =~ tr/,.//; $length = length($readbases); my $count3 = $length - $count1; $readbases = uc($readbases); unless (($count3 <= $threshold*$length) || ($length <= 1)) { # if reach threshold, identify mutation $readbases =~ s/[,\.]/$refbase/g; my @readbasesarray = split //,$readbases; my %count2 = (); map{ $count2{$_}++ } @readbasesarray; $vMax = 0; $refbasecount = 0; while(($k,$v) = each %count2) { #store base in key and count in value if ($k eq $refbase) { $refbasecount = $v; $refpercentage = $refbasecount/$length*100; $refpercentage = sprintf '%.2f', $refpercentage; } if ($v > $vMax) { $mutbase = $k; $vMax = $v; #number of occurences of top 1 base } $top1percentage = $vMax/$length*100; $top1percentage = sprintf '%.2f', $top1percentage; } $v2Max = 0; while(($k,$v) = each %count2) { if ($v > $v2Max && $k ne $mutbase) { $mut2base = $k; $v2Max = $v; #number of occurences of top 2 base } $top2percentage = $v2Max/$length*100; $top2percentage = sprintf '%.2f', $top2percentage; } for my $f (@feat) { if ($f->primary_tag eq 'gene') { if ($coordinate>=$f->start && $coordinate<=$f->end) { $isgene = 1; @db_xref = $f->get_tag_values("db_xref"); if (($db_xref[-1] =~ /([0-9]+)/) && ($+ != $geneid)) { $geneid = $+; print $geneid, " yay got geneid \n";# my $geneobj = $dbentrezgene->get_Seq_by_id($geneid); ## This is the code I showed you in my last email. I have replaced this with the following code in green.# $genefullname = $geneobj->desc(); my $genein = $dbentrezgene->get_Stream_by_id([$geneid]) or die; #I used this function because I need a Bio::SeqIO object in between, in order... my ($gene,$genestructure,$uncaptured) = $genein->next_seq; #to get all the data from the record, and I hope Gene description falls into $uncaptured. reference: http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/SeqIO/entrezgene.pm#DESCRIPTION @genefullname = @$uncaptured; #Please scroll to end of code.# my @firstarray = $genefullname[1];# print $firstarray[$_],"first!\n" for (@firstarray);# my @secondarray = $genefullname[3];# print $secondarray[$_],"second!\n" for (@secondarray);# my @thirdarray = $genefullname[4];# print $thirdarray[$_],"third!\n" for (@thirdarray);# for my $i (@genefullname) {# if (ref($i)) {# if (ref($genefullname[$i]) eq 'SCALAR') {# print ${$i}," SCALAR ref\n";# } elsif (ref($genefullname[$i]) eq 'ARRAY') {# for my $j(@{$genefullname[$i]}) {# print $j,"yayayay\n";# }# my @array = @{$_};# foreach my $a (@array) {# print length(@array)," LENGTH\n";# print $array[$a]," from ARRAY ref\n";# }# } elsif (ref($_) eq 'HASH') {# my %hash = %{$genefullname[$_]};# foreach my $key(keys %hash) {# print $key," => ",$hash{$key}," from HASH ref\n";# }# }# } else {# print $i," anything else\n";# }# print "going to top of loop again\n"; if ($f->has_tag("gene")) { @name = $f->get_tag_values("gene"); } $strand = $f->strand(); if ($strand == 1) {$strandsign = '+';} elsif ($strand == -1) {$strandsign = '-';} $start = $f->start; $end = $f->end; } } } } unless ($isgene) {undef @name; undef @db_xref; undef $strand; undef $strandsign; undef $start; undef $end; $geneid = 1; undef @genefullname;} #don't undef $geneid!# store results in output file print OUT "$coordinate\t\t$length\t\t$refbase\t\t$refbasecount\t\t$refpercentage\t\t$mutbase\t\t$vMax\t\t$top1percentage\t\t$mut2base\t\t$v2Max\t\t$top2percentage\t\t at name\t\t$geneid\t\t at genefullname\t$strandsign\t\t$start\t\t$end\n"; }} # print "Common name: ",$seqobj->display_id,"\nUnique implementation key: ",$seqobj->primary_id,"\ndescription: ",$seqobj->desc,"\naccession #: ",$seqobj->accession_number,"\nalphabet: ",$seqobj->alphabet(),"\n"; The problem now is that I can't print the elements of @genefullname directly, because it is a mixed array: >From genefullname: From genefullname: ARRAY(0x3ddd8a0)From genefullname: 1From genefullname: ARRAY(0x3ddbfd8)From genefullname: ARRAY(0x3ddbcc0)From genefullname: CP000480From genefullname: ARRAY(0x3ddd8a0)From genefullname: 1From genefullname: ARRAY(0x3ddbcf0)From genefullname: HASH(0x3de0808)From genefullname: ARRAY(0x3de4788)From genefullname: Bio::SeqIO::entrezgene=HASH(0x3db8b00) Please, if you would let me know how to display the Gene description properly, I would appreciate it very much. And thanks for the help so far. Regards,Jamie > From: carandraug+dev at gmail.com > Date: Fri, 15 Jul 2011 03:48:03 +0100 > Subject: Re: [Bioperl-l] can't retrieve description using Bio::DB::EntrezGene > To: bi_my_heart at hotmail.com > CC: bioperl-l at bioperl.org > > On 14 July 2011 08:54, jmi k wrote: > > > > Hi, > > I'm trying to retrieve the description of a gene's file using Bio::DB::EntrezGene. Here is the relevant code from my program: > > use Bio::DB::EntrezGene;use Bio::ASN1::EntrezGene; # I think I have to include this > > I've found a similar discussion at http://old.nabble.com/Bio::DB::EntrezGene-or-Bio::DB::Query::GenBank-to-obtain-sequence-metadata-without-sequence-td25816381.html but I don't understand why they can't use Bio::DB::EntrezGene directly. > > Thanks in advance! > > Regards,Jamie > > Hi Jamie, > > Your code works fine for me. Please always paste ALL of your code, not > only the part where you think the error is. It's doesn't take long to > do so, you get the answer to your problem faster and everyone wastes > less time in the end. If it's a long piece of code use pastebin > http://pastebin.com/ Also, please paste it properly formatted, not all > in one line. > > Despite the fact that works, you're doing some unnecessary things such > as creating a Bio::Seq object that you then write over it with the > get_seq_by_id method. This should already return the sequence object, > no need to create it first. > > Anyway, here's how to do it with Bio::DB::EUtilities (it's not the > answer to your question but if you're having trouble with EntrezGene > and don't mind which module to use to get the job done...) > > use Bio::DB::EUtilities; > my $eutil = Bio::DB::EUtilities->new( > -eutil => 'esummary', > -db => 'gene', > -id => $ref_to_array_with_uids > ); > > while (my $docsum = $eutil->next_DocSum) { > my ($description) = $docsum->get_contents_by_name('Description'); > my ($summary) = $docsum->get_contents_by_name('Summary'); > say $description; > say $summary; > } > > You can also use the methods to_string on $docsum to get a nice view > of it and what contents you have retrieved. The get_Item_by_name > method is also handy. > > Also, replace say with print and a newline if you're not using an up > to date version of perl or upgrade your versoin of perl ;) > > Carn? From hollenbv at onid.orst.edu Mon Jul 18 19:48:21 2011 From: hollenbv at onid.orst.edu (Vicky Hollenbeck) Date: Mon, 18 Jul 2011 16:48:21 -0700 Subject: [Bioperl-l] Error message installing BioPerl on Windows x86 with ActivePerl 5.12.4 Message-ID: <4E24C645.90104@onid.orst.edu> Hello, I am getting two error messages after following the instructions for installing BioPerl from command line according to http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows. I have all the repositories listed in the table under 'Perl 5.10': *Can't find any package that provides Convert::Binary::C for Bundle-BioPerl Core *Can't find any package that provides DB_file:: for Bundle-BioPerl Core I originally was doing installation via the GUI, but when I got a few error messages, went on to the command line version. I was able to fix the SOAP error message by following the instuctions there, but can't seem to fix the other two listed above. Thank you in advance. Vicky Hollenbeck USDA Agricultural Research Service Corvallis, OR From scott at scottcain.net Tue Jul 19 00:25:07 2011 From: scott at scottcain.net (Scott Cain) Date: Tue, 19 Jul 2011 00:25:07 -0400 Subject: [Bioperl-l] Error message installing BioPerl on Windows x86 with ActivePerl 5.12.4 In-Reply-To: <4E24C645.90104@onid.orst.edu> References: <4E24C645.90104@onid.orst.edu> Message-ID: Hi Vicky, Annoyingly, ActiveState doesn't include DB_file with their build of Perl anymore, even though it is a core perl module. I believe you can get it for perl 5.10 via a different repo. Try this on the command line: ppm rep add trouchelle.com http://trouchelle.com/ppm10/ and then install DB_file with ppm. Please let us know if that works so we can update the wiki. Good luck, Scott On Mon, Jul 18, 2011 at 7:48 PM, Vicky Hollenbeck wrote: > Hello, > > I am getting two error messages after following the instructions for > installing BioPerl from command line according to > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows. ?I have all the > repositories listed in the table under 'Perl 5.10': > > *Can't find any package that provides Convert::Binary::C for Bundle-BioPerl > Core > *Can't find any package that provides DB_file:: for Bundle-BioPerl Core > > I originally was doing installation via the GUI, but when I got a few error > messages, went on to the command line version. ?I was able to fix the SOAP > error message by following the instuctions there, but can't seem to fix the > other two listed above. > > Thank you in advance. > > Vicky Hollenbeck > USDA Agricultural Research Service > Corvallis, OR > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Tue Jul 19 00:28:59 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 18 Jul 2011 23:28:59 -0500 Subject: [Bioperl-l] Error message installing BioPerl on Windows x86 with ActivePerl 5.12.4 In-Reply-To: References: <4E24C645.90104@onid.orst.edu> Message-ID: We *really* need to switch to AnyDBM_File, it would solve a lot of headaches for win users. chris On Jul 18, 2011, at 11:25 PM, Scott Cain wrote: > Hi Vicky, > > Annoyingly, ActiveState doesn't include DB_file with their build of > Perl anymore, even though it is a core perl module. I believe you can > get it for perl 5.10 via a different repo. Try this on the command > line: > > ppm rep add trouchelle.com http://trouchelle.com/ppm10/ > > and then install DB_file with ppm. Please let us know if that works > so we can update the wiki. > > Good luck, > Scott > > > On Mon, Jul 18, 2011 at 7:48 PM, Vicky Hollenbeck > wrote: >> Hello, >> >> I am getting two error messages after following the instructions for >> installing BioPerl from command line according to >> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows. I have all the >> repositories listed in the table under 'Perl 5.10': >> >> *Can't find any package that provides Convert::Binary::C for Bundle-BioPerl >> Core >> *Can't find any package that provides DB_file:: for Bundle-BioPerl Core >> >> I originally was doing installation via the GUI, but when I got a few error >> messages, went on to the command line version. I was able to fix the SOAP >> error message by following the instuctions there, but can't seem to fix the >> other two listed above. >> >> Thank you in advance. >> >> Vicky Hollenbeck >> USDA Agricultural Research Service >> Corvallis, OR >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From liuli at ioz.ac.cn Tue Jul 19 04:34:39 2011 From: liuli at ioz.ac.cn (Li Liu) Date: Tue, 19 Jul 2011 16:34:39 +0800 Subject: [Bioperl-l] perl or bioperl script to calculate Kimura 2-parameter (K2P) distance of two dna sequences Message-ID: hello all, I am a beginner of bioperl. I have to calculate k2p distance of two dna sequences (mitochondrial CO1). I can't deal with this problem. Would anybody help me to write the script? Thanks a lot for your kindness. -- Beat regards Li Liu From cjfields at illinois.edu Tue Jul 19 08:52:39 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 19 Jul 2011 07:52:39 -0500 Subject: [Bioperl-l] perl or bioperl script to calculate Kimura 2-parameter (K2P) distance of two dna sequences In-Reply-To: References: Message-ID: <421746B7-FF5A-4DBD-AED1-11C99003C2CE@illinois.edu> You need to post a more BioPerl-specific question. Just an aside: it seems you just want someone to write the script for you; if not you should rephrase this. chris On Jul 19, 2011, at 3:34 AM, Li Liu wrote: > hello all, > > I am a beginner of bioperl. I have to calculate k2p distance of two > dna sequences (mitochondrial CO1). > > I can't deal with this problem. Would anybody help me to write the > script? Thanks a lot for your kindness. > > > -- > > Beat regards > > Li Liu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From eddr666 at gmail.com Tue Jul 19 09:38:25 2011 From: eddr666 at gmail.com (Eden Dr) Date: Tue, 19 Jul 2011 16:38:25 +0300 Subject: [Bioperl-l] Bio::Index:SwissPfam file loading problem Message-ID: Hi I'm trying to use Bio::Index::SwissPfam module as in the documentation's example. The file : swisspfam from pfam ftp, current_release: ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release the code: use strict; use Getopt::Long; use File::Basename; use Pg; use sigtrap 'handler' => \&handle_interrupt, 'INT'; use Bio::SeqIO; use Bio::Index::SwissPfam; my $pfam_domains_fasta_file = $ARGV[0]; my $inx = Bio::Index::SwissPfam->new('-filename' => $pfam_domains_fasta_file); my $seq = $inx->fetch('005L_IIV3'); # Returns stream while( <$seq> ) { if(/^>/) { print; last; } } exit; and I get the following error message: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Can't open 'DB_File' dbm file 'swisspfam' : Inappropriate file type or format STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.10.1/Bio/Root/Root.pm:368 STACK: Bio::Index::Abstract::open_dbm /usr/local/lib/perl5/site_perl/5.10.1/Bio/Index/Abstract.pm:399 STACK: Bio::Index::Abstract::new /usr/local/lib/perl5/site_perl/5.10.1/Bio/Index/Abstract.pm:163 STACK: parse_proteins_file.pl:34 ----------------------------------------------------------- what do I do wrong? Thanks Eden From hollenbv at onid.orst.edu Tue Jul 19 18:56:27 2011 From: hollenbv at onid.orst.edu (Vicky Hollenbeck) Date: Tue, 19 Jul 2011 15:56:27 -0700 Subject: [Bioperl-l] Error message installing BioPerl on Windows x86 with ActivePerl 5.12.4 In-Reply-To: References: <4E24C645.90104@onid.orst.edu> Message-ID: <4E260B9B.3020103@onid.orst.edu> Hi Scott, Thank you. I added the trouchelle.com/ppm10/ as well as trouchelle.com/ppm12/ repositories. When I say install ppm DB_file I get a reply that no packages are missing. Then I installed BioPerl over. I still get the error messages below. When doing a ppm list, the DB_file shows in the list, but nothing under the 'files' column. Still it looks like BioPerl installed. I am interested in running scripts with Bio:SearchIO and I don't see any lines for use DB_file, so maybe it is a mute point for my purposes. Just a note that I'm running ActivePerl 5.12.4 as it is the 'Community' version on their website. Vicky Scott Cain wrote: > Hi Vicky, > > Annoyingly, ActiveState doesn't include DB_file with their build of > Perl anymore, even though it is a core perl module. I believe you can > get it for perl 5.10 via a different repo. Try this on the command > line: > > ppm rep add trouchelle.com http://trouchelle.com/ppm10/ > > and then install DB_file with ppm. Please let us know if that works > so we can update the wiki. > > Good luck, > Scott > > > On Mon, Jul 18, 2011 at 7:48 PM, Vicky Hollenbeck > wrote: > >> Hello, >> >> I am getting two error messages after following the instructions for >> installing BioPerl from command line according to >> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows. I have all the >> repositories listed in the table under 'Perl 5.10': >> >> *Can't find any package that provides Convert::Binary::C for Bundle-BioPerl >> Core >> *Can't find any package that provides DB_file:: for Bundle-BioPerl Core >> >> I originally was doing installation via the GUI, but when I got a few error >> messages, went on to the command line version. I was able to fix the SOAP >> error message by following the instuctions there, but can't seem to fix the >> other two listed above. >> >> Thank you in advance. >> >> Vicky Hollenbeck >> USDA Agricultural Research Service >> Corvallis, OR >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > > From cjfields at illinois.edu Tue Jul 19 21:22:32 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 19 Jul 2011 20:22:32 -0500 Subject: [Bioperl-l] Error message installing BioPerl on Windows x86 with ActivePerl 5.12.4 In-Reply-To: <4E260B9B.3020103@onid.orst.edu> References: <4E24C645.90104@onid.orst.edu> <4E260B9B.3020103@onid.orst.edu> Message-ID: The problem, truthfully, is that the number of our Windows users is much higher than our Windows developers. Just a quick question: have you tried Strawberry Perl? It's supposed to be much more UNIX-like in it's behavior than ActivePerl. chris On Jul 19, 2011, at 5:56 PM, Vicky Hollenbeck wrote: > Hi Scott, > > Thank you. I added the trouchelle.com/ppm10/ as well as trouchelle.com/ppm12/ repositories. When I say install ppm DB_file I get a reply that no packages are missing. Then I installed BioPerl over. I still get the error messages below. When doing a ppm list, the DB_file shows in the list, but nothing under the 'files' column. Still it looks like BioPerl installed. I am interested in running scripts with Bio:SearchIO and I don't see any lines for use DB_file, so maybe it is a mute point for my purposes. Just a note that I'm running ActivePerl 5.12.4 as it is the 'Community' version on their website. > > Vicky > > Scott Cain wrote: >> Hi Vicky, >> >> Annoyingly, ActiveState doesn't include DB_file with their build of >> Perl anymore, even though it is a core perl module. I believe you can >> get it for perl 5.10 via a different repo. Try this on the command >> line: >> >> ppm rep add trouchelle.com http://trouchelle.com/ppm10/ >> >> and then install DB_file with ppm. Please let us know if that works >> so we can update the wiki. >> >> Good luck, >> Scott >> >> >> On Mon, Jul 18, 2011 at 7:48 PM, Vicky Hollenbeck >> wrote: >> >>> Hello, >>> >>> I am getting two error messages after following the instructions for >>> installing BioPerl from command line according to >>> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows. I have all the >>> repositories listed in the table under 'Perl 5.10': >>> >>> *Can't find any package that provides Convert::Binary::C for Bundle-BioPerl >>> Core >>> *Can't find any package that provides DB_file:: for Bundle-BioPerl Core >>> >>> I originally was doing installation via the GUI, but when I got a few error >>> messages, went on to the command line version. I was able to fix the SOAP >>> error message by following the instuctions there, but can't seem to fix the >>> other two listed above. >>> >>> Thank you in advance. >>> >>> Vicky Hollenbeck >>> USDA Agricultural Research Service >>> Corvallis, OR >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Tue Jul 19 21:22:45 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 19 Jul 2011 18:22:45 -0700 Subject: [Bioperl-l] Bio::Index:SwissPfam file loading problem In-Reply-To: References: Message-ID: <16C6E551-646B-4652-8991-3099176645A8@gmail.com> As the documentation says, you want to do this: my $inx = Bio::Index::SwissPfam->new('-filename' => $Index_File_Name, '-write_flag' => 'WRITE'); # $inx->make_index(@ARGV); $inx->make_index($swisspfam_file); # in your case do this Where $Index_File_Name is the name of the index file you want to create and $swisspfam_file is the name of the datafile you want to process. Hope that helps. Jason On Jul 19, 2011, at 6:38 AM, Eden Dr wrote: > Hi > > I'm trying to use Bio::Index::SwissPfam module as in the documentation's > example. > The file : swisspfam from pfam ftp, current_release: > ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release > the code: > use strict; > use Getopt::Long; > use File::Basename; > use Pg; > use sigtrap 'handler' => \&handle_interrupt, 'INT'; > use Bio::SeqIO; > use Bio::Index::SwissPfam; > > my $pfam_domains_fasta_file = $ARGV[0]; > > my $inx = Bio::Index::SwissPfam->new('-filename' => > $pfam_domains_fasta_file); > my $seq = $inx->fetch('005L_IIV3'); # Returns stream > while( <$seq> ) { > if(/^>/) { > print; > last; > } > } > exit; > > and I get the following error message: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Can't open 'DB_File' dbm file 'swisspfam' : Inappropriate file type or > format > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/lib/perl5/site_perl/5.10.1/Bio/Root/Root.pm:368 > STACK: Bio::Index::Abstract::open_dbm > /usr/local/lib/perl5/site_perl/5.10.1/Bio/Index/Abstract.pm:399 > STACK: Bio::Index::Abstract::new > /usr/local/lib/perl5/site_perl/5.10.1/Bio/Index/Abstract.pm:163 > STACK: parse_proteins_file.pl:34 > ----------------------------------------------------------- > > what do I do wrong? > > Thanks > Eden > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From thomas.sharpton at gmail.com Tue Jul 19 22:45:42 2011 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Tue, 19 Jul 2011 19:45:42 -0700 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: Message-ID: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> Hi Scott, Thanks for writing. I'm on the road at the moment so I have to be briefer and less thorough than I'd like to be. What you are observing is not the intended behavior. Oddly, it's not what I recall obtaining in my tests on this software, though I was mostly interested in hmmsearch at the time and may have been sloppier than I should have been when it came to hmmscan. What version of HMMER3 you're using? There have been some small formatting changes in the past that might be causing a burp in the parser, though I'm doubting it. Kai Blin wrote some test scripts (found here: bioperl-live/t/SearchIO/ hmmer.t) that, if I recall correctly, evaluate query/hit coordinates. It might be worth giving this a shot if you haven't already. Also, if you don't mind, I'm happy to run your code on your output file on my end. It might help me diagnose the problem. Sorry this is being a thorn in your side! I've cc'ed the list in case anyone else has insight into this matter. Best, Thomas On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > Hi Thomas, > > I'm using modules in the bipoerl-hmmer3 git repository to parse > hmmscan > reports. When I parse the files and walk through the HSP's like: > > while (my $hit = $rslt->next_model) { > > while (my $domain = $hit->next_hsp) { > > And retrieve the "hit" coordinates like: > > print "hit coords: ", $domain->start('hit'), "-", $domain- > >end('hit'), > "\n"; > > The coordinates returned correspond to what I would call the "query", > since they are for the sequence I fed to hmmscan to search the profile > database. Likewise, when retrieving the query coordinates like > $domain->start('query'), I get what I consider the "hit" coordinates, > since they are for the domain profile. Is this the intended behavior? > > Thanks. > > scott > > -- > Scott A. Givan > Associate Director > Informatics Research Core Facility > 240e Bond Life Sciences Center > Research Assistant Professor > Molecular Microbiology and Immunology > University of Missouri, Columbia > > TEL 573-882-2948 > FAX 573-884-9676 > http://ircf.rnet.missouri.edu > > > From cjfields at illinois.edu Tue Jul 19 23:34:06 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 19 Jul 2011 22:34:06 -0500 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> Message-ID: <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> This might be a disconnect between the HMMER3 version in bioperl-live and the one in Kai's bioperl-hmmer3 repo. I believe the one in bioperl-live is newer. Scott, can you give that a try? chris On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > Hi Scott, > > Thanks for writing. I'm on the road at the moment so I have to be briefer and less thorough than I'd like to be. > > What you are observing is not the intended behavior. Oddly, it's not what I recall obtaining in my tests on this software, though I was mostly interested in hmmsearch at the time and may have been sloppier than I should have been when it came to hmmscan. > > What version of HMMER3 you're using? There have been some small formatting changes in the past that might be causing a burp in the parser, though I'm doubting it. > > Kai Blin wrote some test scripts (found here: bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate query/hit coordinates. It might be worth giving this a shot if you haven't already. > > Also, if you don't mind, I'm happy to run your code on your output file on my end. It might help me diagnose the problem. > > Sorry this is being a thorn in your side! I've cc'ed the list in case anyone else has insight into this matter. > > Best, > Thomas > > On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > >> Hi Thomas, >> >> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan >> reports. When I parse the files and walk through the HSP's like: >> >> while (my $hit = $rslt->next_model) { >> >> while (my $domain = $hit->next_hsp) { >> >> And retrieve the "hit" coordinates like: >> >> print "hit coords: ", $domain->start('hit'), "-", $domain->end('hit'), >> "\n"; >> >> The coordinates returned correspond to what I would call the "query", >> since they are for the sequence I fed to hmmscan to search the profile >> database. Likewise, when retrieving the query coordinates like >> $domain->start('query'), I get what I consider the "hit" coordinates, >> since they are for the domain profile. Is this the intended behavior? >> >> Thanks. >> >> scott >> >> -- >> Scott A. Givan >> Associate Director >> Informatics Research Core Facility >> 240e Bond Life Sciences Center >> Research Assistant Professor >> Molecular Microbiology and Immunology >> University of Missouri, Columbia >> >> TEL 573-882-2948 >> FAX 573-884-9676 >> http://ircf.rnet.missouri.edu >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From givans at missouri.edu Wed Jul 20 00:23:20 2011 From: givans at missouri.edu (Givan, Scott A.) Date: Tue, 19 Jul 2011 23:23:20 -0500 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: I'll try the bioperl-live version. Thanks guys. Scott Givan 541-740-4685 Sent from an iPhone (so expect typos). On Jul 19, 2011, at 10:34 PM, "Chris Fields" wrote: > This might be a disconnect between the HMMER3 version in bioperl-live and the one in Kai's bioperl-hmmer3 repo. I believe the one in bioperl-live is newer. Scott, can you give that a try? > > chris > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > >> Hi Scott, >> >> Thanks for writing. I'm on the road at the moment so I have to be briefer and less thorough than I'd like to be. >> >> What you are observing is not the intended behavior. Oddly, it's not what I recall obtaining in my tests on this software, though I was mostly interested in hmmsearch at the time and may have been sloppier than I should have been when it came to hmmscan. >> >> What version of HMMER3 you're using? There have been some small formatting changes in the past that might be causing a burp in the parser, though I'm doubting it. >> >> Kai Blin wrote some test scripts (found here: bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate query/hit coordinates. It might be worth giving this a shot if you haven't already. >> >> Also, if you don't mind, I'm happy to run your code on your output file on my end. It might help me diagnose the problem. >> >> Sorry this is being a thorn in your side! I've cc'ed the list in case anyone else has insight into this matter. >> >> Best, >> Thomas >> >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: >> >>> Hi Thomas, >>> >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan >>> reports. When I parse the files and walk through the HSP's like: >>> >>> while (my $hit = $rslt->next_model) { >>> >>> while (my $domain = $hit->next_hsp) { >>> >>> And retrieve the "hit" coordinates like: >>> >>> print "hit coords: ", $domain->start('hit'), "-", $domain->end('hit'), >>> "\n"; >>> >>> The coordinates returned correspond to what I would call the "query", >>> since they are for the sequence I fed to hmmscan to search the profile >>> database. Likewise, when retrieving the query coordinates like >>> $domain->start('query'), I get what I consider the "hit" coordinates, >>> since they are for the domain profile. Is this the intended behavior? >>> >>> Thanks. >>> >>> scott >>> >>> -- >>> Scott A. Givan >>> Associate Director >>> Informatics Research Core Facility >>> 240e Bond Life Sciences Center >>> Research Assistant Professor >>> Molecular Microbiology and Immunology >>> University of Missouri, Columbia >>> >>> TEL 573-882-2948 >>> FAX 573-884-9676 >>> http://ircf.rnet.missouri.edu >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at gmail.com Wed Jul 20 12:00:33 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Wed, 20 Jul 2011 09:00:33 -0700 Subject: [Bioperl-l] Bio::DB::SwissProt In-Reply-To: <82C18DC9-CB8F-4D57-B032-322A100E3981@mol-med.uni-freiburg.de> References: <82C18DC9-CB8F-4D57-B032-322A100E3981@mol-med.uni-freiburg.de> Message-ID: <373C82CA-F319-48D7-A498-E6DFE798FA36@gmail.com> It is just waiting for someone to code it up. It requires a different, simpler approach. I don't have any time to do this but we welcome volunteers. The simple approach is just to open a file handle and use GET cmd to open the URL with the accession number per the msg I sent before. That filehandle goes to SeqIO and you don't need the Bio::DB module. You can also get Swissprot data via genbank DB handle. Jason Stajich On Jul 20, 2011, at 4:23 AM, Oliver Schilling wrote: > Dear Jason, > > as thankfully pointed out by you on http://old.nabble.com/Fwd%3A-URGENT%21-Update-of-Bio%3A%3ADB%3A%3ASwissProt-td32059436.html#a32059436, > > it seems that Bio::DB::SwissProt is not supported by expasy any more. In a perl script, I used to access Swissprot data and annotations by > > $sp = Bio::DB::SwissProt->new('-servertype' => 'expasy', > '-hostlocation' => 'switzerland'); > $seq = $sp->get_Seq_by_id($id_swiss); > > This of course does not work any more. Will Bio::DB::SwissProt be adapted to the new expasy site in the near future? > > Thanks a lot! > > Oliver > > > Dr. Oliver Schilling > Group Leader & Emmy-Noether Research Fellow > Institute for Molecular Medicine and Cell Research > University of Freiburg > Stefan-Meier-Str. 17, Room 02 027 > D-79104 Freiburg, Germany > Tel: +49 761 203 9615 > email: oliver.schilling at mol-med.uni-freiburg.de From hollenbv at onid.orst.edu Wed Jul 20 12:25:59 2011 From: hollenbv at onid.orst.edu (Vicky Hollenbeck) Date: Wed, 20 Jul 2011 09:25:59 -0700 Subject: [Bioperl-l] Error message installing BioPerl on Windows x86 with ActivePerl 5.12.4 In-Reply-To: References: <4E24C645.90104@onid.orst.edu> <4E260B9B.3020103@onid.orst.edu> Message-ID: <4E270197.3000700@onid.orst.edu> Hi Chris, I haven't tried Strawberry Perl. I thought about it, but I didn't find any instructions on installing BioPerl with Strawberry Perl and frankly I need the walkthrough. Perhaps you know of a site with instructions for this? Vicky Chris Fields wrote: > The problem, truthfully, is that the number of our Windows users is much higher than our Windows developers. Just a quick question: have you tried Strawberry Perl? It's supposed to be much more UNIX-like in it's behavior than ActivePerl. > > chris > > On Jul 19, 2011, at 5:56 PM, Vicky Hollenbeck wrote: > > >> Hi Scott, >> >> Thank you. I added the trouchelle.com/ppm10/ as well as trouchelle.com/ppm12/ repositories. When I say install ppm DB_file I get a reply that no packages are missing. Then I installed BioPerl over. I still get the error messages below. When doing a ppm list, the DB_file shows in the list, but nothing under the 'files' column. Still it looks like BioPerl installed. I am interested in running scripts with Bio:SearchIO and I don't see any lines for use DB_file, so maybe it is a mute point for my purposes. Just a note that I'm running ActivePerl 5.12.4 as it is the 'Community' version on their website. >> >> Vicky >> >> Scott Cain wrote: >> >>> Hi Vicky, >>> >>> Annoyingly, ActiveState doesn't include DB_file with their build of >>> Perl anymore, even though it is a core perl module. I believe you can >>> get it for perl 5.10 via a different repo. Try this on the command >>> line: >>> >>> ppm rep add trouchelle.com http://trouchelle.com/ppm10/ >>> >>> and then install DB_file with ppm. Please let us know if that works >>> so we can update the wiki. >>> >>> Good luck, >>> Scott >>> >>> >>> On Mon, Jul 18, 2011 at 7:48 PM, Vicky Hollenbeck >>> wrote: >>> >>> >>>> Hello, >>>> >>>> I am getting two error messages after following the instructions for >>>> installing BioPerl from command line according to >>>> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows. I have all the >>>> repositories listed in the table under 'Perl 5.10': >>>> >>>> *Can't find any package that provides Convert::Binary::C for Bundle-BioPerl >>>> Core >>>> *Can't find any package that provides DB_file:: for Bundle-BioPerl Core >>>> >>>> I originally was doing installation via the GUI, but when I got a few error >>>> messages, went on to the command line version. I was able to fix the SOAP >>>> error message by following the instuctions there, but can't seem to fix the >>>> other two listed above. >>>> >>>> Thank you in advance. >>>> >>>> Vicky Hollenbeck >>>> USDA Agricultural Research Service >>>> Corvallis, OR >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > From jason.stajich at gmail.com Wed Jul 20 12:52:07 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Wed, 20 Jul 2011 09:52:07 -0700 Subject: [Bioperl-l] Bio::DB::SwissProt In-Reply-To: References: <82C18DC9-CB8F-4D57-B032-322A100E3981@mol-med.uni-freiburg.de> <373C82CA-F319-48D7-A498-E6DFE798FA36@gmail.com> Message-ID: <352244D7-EF4B-4DA2-8B80-F04AD7F442CB@gmail.com> Please keep asking your questions on the mailing list - I can't answer everyone's questions and others learn from this exchange. You do it just like it says in the documentation, here's a loop showing you can retrieve the same sequence by two different IDs, swissprot ID and accession number: #!/usr/bin/perl -w use strict; use Bio::DB::GenPept; use Bio::SeqIO; my $db = Bio::DB::GenPept->new; my $out = Bio::SeqIO->new(-format => 'swiss', -fh => \*STDOUT); for my $id ( qw(P22815 BOSS_DROME) ) { my $seq = $db->get_Seq_by_acc($id); if( $seq ) { $out->write_seq($seq); } else { warn("no seq $id\n"); } } Alternatively you can do this to get the sequence and turn it into an object but you have to use accession number: (curl or GET as part of libwwww perl will work) open(my $fh => 'curl http://www.uniprot.org/uniprot/P22815.txt') || die $!; my $seqio = Bio::SeqIO->new(-fh => $fh, -format => 'swiss'); while(my $seq = $seqio->next_seq ) { print $seq->id, "\n"; } -jason On Jul 20, 2011, at 9:21 AM, Oliver Schilling wrote: > Thanks a lot for your reply! > > Sorry for asking - but do you have an example on how to retrieve Swissprot data via a genbank DB handle? > > Thanks > > Oliver > > > On Jul 20, 2011, at 18:00 PM, Jason Stajich wrote: > >> It is just waiting for someone to code it up. It requires a different, simpler approach. >> >> I don't have any time to do this but we welcome volunteers. >> >> The simple approach is just to open a file handle and use GET cmd to open the URL with the accession number per the msg I sent before. That filehandle goes to SeqIO and you don't need the Bio::DB module. >> >> You can also get Swissprot data via genbank DB handle. >> >> Jason Stajich >> >> >> On Jul 20, 2011, at 4:23 AM, Oliver Schilling wrote: >> >>> Dear Jason, >>> >>> as thankfully pointed out by you on http://old.nabble.com/Fwd%3A-URGENT%21-Update-of-Bio%3A%3ADB%3A%3ASwissProt-td32059436.html#a32059436, >>> >>> it seems that Bio::DB::SwissProt is not supported by expasy any more. In a perl script, I used to access Swissprot data and annotations by >>> >>> $sp = Bio::DB::SwissProt->new('-servertype' => 'expasy', >>> '-hostlocation' => 'switzerland'); >>> $seq = $sp->get_Seq_by_id($id_swiss); >>> >>> This of course does not work any more. Will Bio::DB::SwissProt be adapted to the new expasy site in the near future? >>> >>> Thanks a lot! >>> >>> Oliver >>> >>> >>> Dr. Oliver Schilling >>> Group Leader & Emmy-Noether Research Fellow >>> Institute for Molecular Medicine and Cell Research >>> University of Freiburg >>> Stefan-Meier-Str. 17, Room 02 027 >>> D-79104 Freiburg, Germany >>> Tel: +49 761 203 9615 >>> email: oliver.schilling at mol-med.uni-freiburg.de > From cjfields at illinois.edu Wed Jul 20 14:02:08 2011 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 20 Jul 2011 13:02:08 -0500 Subject: [Bioperl-l] Bio::DB::SwissProt In-Reply-To: <373C82CA-F319-48D7-A498-E6DFE798FA36@gmail.com> References: <82C18DC9-CB8F-4D57-B032-322A100E3981@mol-med.uni-freiburg.de> <373C82CA-F319-48D7-A498-E6DFE798FA36@gmail.com> Message-ID: <1D8DC1C9-9CE6-44BF-83EA-E6E563C348B2@illinois.edu> We need to go ahead and remove the expasy code from SwissProt if it's no longer supported. Any takers? chris On Jul 20, 2011, at 11:00 AM, Jason Stajich wrote: > It is just waiting for someone to code it up. It requires a different, simpler approach. > > I don't have any time to do this but we welcome volunteers. > > The simple approach is just to open a file handle and use GET cmd to open the URL with the accession number per the msg I sent before. That filehandle goes to SeqIO and you don't need the Bio::DB module. > > You can also get Swissprot data via genbank DB handle. > > Jason Stajich > > > On Jul 20, 2011, at 4:23 AM, Oliver Schilling wrote: > >> Dear Jason, >> >> as thankfully pointed out by you on http://old.nabble.com/Fwd%3A-URGENT%21-Update-of-Bio%3A%3ADB%3A%3ASwissProt-td32059436.html#a32059436, >> >> it seems that Bio::DB::SwissProt is not supported by expasy any more. In a perl script, I used to access Swissprot data and annotations by >> >> $sp = Bio::DB::SwissProt->new('-servertype' => 'expasy', >> '-hostlocation' => 'switzerland'); >> $seq = $sp->get_Seq_by_id($id_swiss); >> >> This of course does not work any more. Will Bio::DB::SwissProt be adapted to the new expasy site in the near future? >> >> Thanks a lot! >> >> Oliver >> >> >> Dr. Oliver Schilling >> Group Leader & Emmy-Noether Research Fellow >> Institute for Molecular Medicine and Cell Research >> University of Freiburg >> Stefan-Meier-Str. 17, Room 02 027 >> D-79104 Freiburg, Germany >> Tel: +49 761 203 9615 >> email: oliver.schilling at mol-med.uni-freiburg.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From akki.coool2 at gmail.com Thu Jul 21 07:50:38 2011 From: akki.coool2 at gmail.com (Akash) Date: Thu, 21 Jul 2011 12:50:38 +0100 Subject: [Bioperl-l] Perl Message-ID: HI I am a student of University of Exeter and doing my master's in Bioinformatics. I am working on perl and here I need to input the "fasta" file in perl and then I have to remove the contigs which have more than "200" base pairs. So can you help me how to do this? Looking for your positive reply With Regards Akash From lmrodriguezr at gmail.com Thu Jul 21 08:04:48 2011 From: lmrodriguezr at gmail.com (=?ISO-8859-1?Q?Luis=2DMiguel_Rodr=EDguez_Rojas?=) Date: Thu, 21 Jul 2011 14:04:48 +0200 Subject: [Bioperl-l] Perl In-Reply-To: References: Message-ID: Hello Akash, That can be done with a very short script in bioperl (probably a one-liner?). Take a look at Bio::SeqIOand Bio::Seq documentation. Regards, LRR -- Luis M. Rodriguez-R [ http://thebio.me/lrr | lrr at cpan.org ] --------------------------------- UMR R?sistance des Plantes aux Bioagresseurs - Group effecteur/cible Institut de Recherche pour le D?veloppement, Montpellier, France [ http://bioinfo-prod.mpl.ird.fr/xantho | Luismiguel.Rodriguez at ird.fr ] +33 (0) 6.29.74.55.93 Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a Universidad de Los Andes, Bogot?, Colombia [ http://lamfu.uniandes.edu.co | luisrodr at uniandes.edu.co ] +57 (1) 3.39.49.49 ext 2777 On Thu, Jul 21, 2011 at 1:50 PM, Akash wrote: > HI > > I am a student of University of Exeter and doing my master's in > Bioinformatics. I am working on perl and here I need to input the "fasta" > file in perl and then I have to remove the contigs which have more than > "200" base pairs. > > So can you help me how to do this? > > Looking for your positive reply > > With Regards > Akash > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From carandraug+dev at gmail.com Thu Jul 21 08:05:10 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Thu, 21 Jul 2011 13:05:10 +0100 Subject: [Bioperl-l] Perl In-Reply-To: References: Message-ID: Hi Akash try to read http://www.bioperl.org/wiki/HOWTO:SeqIO After reading the sequence, skip it if it's longer than a certain size? Avoid hard coding Carn? From awitney at sgul.ac.uk Thu Jul 21 08:24:54 2011 From: awitney at sgul.ac.uk (Adam Witney) Date: Thu, 21 Jul 2011 13:24:54 +0100 Subject: [Bioperl-l] Perl In-Reply-To: References: Message-ID: On 21 Jul 2011, at 12:50, Akash wrote: > I am a student of University of Exeter and doing my master's in > Bioinformatics. I am working on perl and here I need to input the "fasta" > file in perl and then I have to remove the contigs which have more than > "200" base pairs. > > So can you help me how to do this? take a look at Bio::SeqIO http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/SeqIO.pm regards adam From aminmom at hotmail.com Thu Jul 21 10:19:09 2011 From: aminmom at hotmail.com (Amin Momin) Date: Thu, 21 Jul 2011 19:49:09 +0530 Subject: [Bioperl-l] BLAT psl file to GTF Message-ID: Hi , I have been trying to convert a .psl file from BLAT into a GTF file. Is there a bioperl module capable of performing this. Or any tool that can be used to perform similar conversion. Amin From roy.chaudhuri at gmail.com Fri Jul 22 07:02:15 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 22 Jul 2011 12:02:15 +0100 Subject: [Bioperl-l] how to find SeqFeature at specific sequence location In-Reply-To: <118215D0-A837-4D72-BA8F-548A2CF5CAF6@sgul.ac.uk> References: <18DF7D20DFEC044098A1062202F5FFF3396074D2CB@exchsth.agresearch.co.nz> <38E4AD0A-5D35-4B1F-8E0D-5696B07AFD77@illinois.edu> <118215D0-A837-4D72-BA8F-548A2CF5CAF6@sgul.ac.uk> Message-ID: <4E2958B7.9020703@gmail.com> Hi Adam, Sorry for the late reply. You can do this direct from BioSQL using a BioQuery: #!/usr/bin/perl use warnings FATAL=>qw(all); use Modern::Perl; use Bio::DB::BioDB; use Bio::DB::Query::BioQuery; my ($accession, $start, $end)=qw(U00096 1000 1001); my $dbadap=Bio::DB::BioDB->new(-database=>'biosql', -dbname=>'mydatabase', -user=>'myuser', -pass=>'mypass', -driver=>'mysql'); my @where=("entry.accession_number='$accession'", "location.start < $end", "location.end > $start"); my $query=Bio::DB::Query::BioQuery->new(-datacollections=>['Bio::SeqFeatureI feat', 'Bio::Annotation::SimpleValue=>Bio::SeqFeatureI term::primary_tag', 'Bio::PrimarySeqI=>Bio::SeqFeatureI entry', 'Bio::SeqFeatureI=>Bio::LocationI'], -where=>\@where ); my $result=$dbadap->get_object_adaptor('Bio::SeqFeatureI')->find_by_query($query); while (my $feat=$result->next_object) { say join "\t", $feat->primary_tag, $feat->get_tagset_values(qw(gene locus_tag product)), $feat->location->to_FTstring; } Cheers, Roy. On 13/07/2011 23:03, Adam Witney wrote: > > Thanks for your replies guys. > > I was trying to do it from the BioSQL schema and Bio::DB::BioDB but > after some further investigation it looks like Bio::DB::GFF or > Bio::DB::SeqFeature::Store are the only ways to do it > > Thanks > > adam > > On 13 Jul 2011, at 01:06, Chris UI wrote: > >> One should be able to do this via Bio::DB::SeqFeature::Store. >> However, going from a Bio::Seq to that is the tricky part. >> >> chris >> >> Sent from my iPad >> >> On Jul 12, 2011, at 6:12 PM, "Smithies, >> Russell" wrote: >> >>> I've done it before by taking a segment of the sequence and >>> looking for features in that. >>> >>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',-dsn=> >>> 'hapmap:gbrowsemysql')or die "Can't open >>> database:",Bio::DB::GFF->error,"\n"; my $segment = >>> $db->segment(-class=>'Chromosome',-name=> "Chr1", >>> -start=>$start, -end=>$end); my @repeats = >>> $segment->features(-types=> ['match:UCSC_REPEATMASK']); >>> >>> --Russell >>> >>>> -----Original Message----- From: >>>> bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Adam Witney Sent: >>>> Wednesday, 13 July 2011 6:04 a.m. To: bioperl-l at bioperl.org >>>> Subject: [Bioperl-l] how to find SeqFeature at specific >>>> sequence location >>>> >>>> Hi, >>>> >>>> Is there an easy way of finding the SeqFeatures at a specific >>>> base location on a Bio::Seq? I guess I can do it by calling >>>> get_all_SeqFeatures and testing start/stop coordinates, but >>>> just wondered if there was a better way. >>>> >>>> Thanks >>>> >>>> Adam _______________________________________________ Bioperl-l >>>> mailing list Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> ======================================================================= >>> >>> Attention: The information contained in this message and/or attachments >>> from AgResearch Limited is intended only for the persons or >>> entities to which it is addressed and may contain confidential >>> and/or privileged material. Any review, retransmission, >>> dissemination or other use of, or taking of any action in >>> reliance upon, this information by persons or entities other than >>> the intended recipients is prohibited by AgResearch Limited. If >>> you have received this message in error, please notify the sender >>> immediately. >>> ======================================================================= >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ Bioperl-l mailing > list Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From carandraug+dev at gmail.com Fri Jul 22 07:39:08 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Fri, 22 Jul 2011 12:39:08 +0100 Subject: [Bioperl-l] changing script filenames Message-ID: Hi currently, when installing bioperl, the name of the scripts is changed from 'something' to 'bp_something'. Some of them already have 'bp_' on their name. After talking about it on #bioperl, I was told that it indeed looked indeed silly and it should be changed. As such, I changed the filenames. I also changed their documentation to match the correct name of the script. I made a pull request on github a few days ago. Could someone take a look at it, please? Thanks, Carn? From cjfields at illinois.edu Fri Jul 22 09:06:49 2011 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 22 Jul 2011 08:06:49 -0500 Subject: [Bioperl-l] changing script filenames In-Reply-To: References: Message-ID: Agreed. Anyone know of any particular reason why this *shouldn't* happen? If not I'll probably merge the fork request in on Sunday, chris On Jul 22, 2011, at 6:39 AM, Carn? Draug wrote: > Hi > > currently, when installing bioperl, the name of the scripts is changed > from 'something' to 'bp_something'. Some of them already have 'bp_' on > their name. After talking about it on #bioperl, I was told that it > indeed looked indeed silly and it should be changed. As such, I > changed the filenames. I also changed their documentation to match the > correct name of the script. I made a pull request on github a few days > ago. Could someone take a look at it, please? > > Thanks, > Carn? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Jul 22 10:56:48 2011 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 22 Jul 2011 09:56:48 -0500 Subject: [Bioperl-l] BLAT psl file to GTF In-Reply-To: References: Message-ID: Bio::SearchIO::pal is one way, though I recall a psl2gff script (maybe from Don Gilbert?) out there somewhere that might be more up-to-date and faster. chris On Jul 21, 2011, at 9:19 AM, Amin Momin wrote: > > Hi , > I have been trying to convert a .psl file from BLAT into a GTF file. Is there a bioperl module capable of performing this. Or any tool that can be used to perform similar conversion. > > Amin > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Fri Jul 22 11:03:16 2011 From: awitney at sgul.ac.uk (Adam Witney) Date: Fri, 22 Jul 2011 16:03:16 +0100 Subject: [Bioperl-l] how to find SeqFeature at specific sequence location In-Reply-To: <4E2958B7.9020703@gmail.com> References: <18DF7D20DFEC044098A1062202F5FFF3396074D2CB@exchsth.agresearch.co.nz> <38E4AD0A-5D35-4B1F-8E0D-5696B07AFD77@illinois.edu> <118215D0-A837-4D72-BA8F-548A2CF5CAF6@sgul.ac.uk> <4E2958B7.9020703@gmail.com> Message-ID: <43C4DE80-FBEF-4984-B82F-3101BE6DD1A2@sgul.ac.uk> Thanks Roy, I had given up and switched to Bio::DB::SeqFeature::Store which works fine (as suggested by Chris). But I may go back and give this a try to see how it works. Thanks again adam On 22 Jul 2011, at 12:02, Roy Chaudhuri wrote: > Hi Adam, > > Sorry for the late reply. You can do this direct from BioSQL using a BioQuery: > > #!/usr/bin/perl > use warnings FATAL=>qw(all); > use Modern::Perl; > use Bio::DB::BioDB; > use Bio::DB::Query::BioQuery; > my ($accession, $start, $end)=qw(U00096 1000 1001); > my $dbadap=Bio::DB::BioDB->new(-database=>'biosql', -dbname=>'mydatabase', -user=>'myuser', -pass=>'mypass', -driver=>'mysql'); > my @where=("entry.accession_number='$accession'", > "location.start < $end", > "location.end > $start"); > my $query=Bio::DB::Query::BioQuery->new(-datacollections=>['Bio::SeqFeatureI feat', 'Bio::Annotation::SimpleValue=>Bio::SeqFeatureI term::primary_tag', > 'Bio::PrimarySeqI=>Bio::SeqFeatureI entry', > 'Bio::SeqFeatureI=>Bio::LocationI'], > -where=>\@where > ); > my $result=$dbadap->get_object_adaptor('Bio::SeqFeatureI')->find_by_query($query); > while (my $feat=$result->next_object) { > say join "\t", $feat->primary_tag, $feat->get_tagset_values(qw(gene locus_tag product)), $feat->location->to_FTstring; > } > > Cheers, > Roy. > > On 13/07/2011 23:03, Adam Witney wrote: >> >> Thanks for your replies guys. >> >> I was trying to do it from the BioSQL schema and Bio::DB::BioDB but >> after some further investigation it looks like Bio::DB::GFF or >> Bio::DB::SeqFeature::Store are the only ways to do it >> >> Thanks >> >> adam >> >> On 13 Jul 2011, at 01:06, Chris UI wrote: >> >>> One should be able to do this via Bio::DB::SeqFeature::Store. >>> However, going from a Bio::Seq to that is the tricky part. >>> >>> chris >>> >>> Sent from my iPad >>> >>> On Jul 12, 2011, at 6:12 PM, "Smithies, >>> Russell" wrote: >>> >>>> I've done it before by taking a segment of the sequence and >>>> looking for features in that. >>>> >>>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',-dsn=> >>>> 'hapmap:gbrowsemysql')or die "Can't open >>>> database:",Bio::DB::GFF->error,"\n"; my $segment = >>>> $db->segment(-class=>'Chromosome',-name=> "Chr1", >>>> -start=>$start, -end=>$end); my @repeats = >>>> $segment->features(-types=> ['match:UCSC_REPEATMASK']); >>>> >>>> --Russell >>>> >>>>> -----Original Message----- From: >>>>> bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of Adam Witney Sent: >>>>> Wednesday, 13 July 2011 6:04 a.m. To: bioperl-l at bioperl.org >>>>> Subject: [Bioperl-l] how to find SeqFeature at specific >>>>> sequence location >>>>> >>>>> Hi, >>>>> >>>>> Is there an easy way of finding the SeqFeatures at a specific >>>>> base location on a Bio::Seq? I guess I can do it by calling >>>>> get_all_SeqFeatures and testing start/stop coordinates, but >>>>> just wondered if there was a better way. >>>>> >>>>> Thanks >>>>> >>>>> Adam _______________________________________________ Bioperl-l >>>>> mailing list Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> ======================================================================= >>>> >>>> > Attention: The information contained in this message and/or attachments >>>> from AgResearch Limited is intended only for the persons or >>>> entities to which it is addressed and may contain confidential >>>> and/or privileged material. Any review, retransmission, >>>> dissemination or other use of, or taking of any action in >>>> reliance upon, this information by persons or entities other than >>>> the intended recipients is prohibited by AgResearch Limited. If >>>> you have received this message in error, please notify the sender >>>> immediately. >>>> ======================================================================= >>>> >>>> >>>> > _______________________________________________ >>>> Bioperl-l mailing list Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ Bioperl-l mailing >> list Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From aminmom at hotmail.com Fri Jul 22 12:16:58 2011 From: aminmom at hotmail.com (Amin Momin) Date: Fri, 22 Jul 2011 21:46:58 +0530 Subject: [Bioperl-l] BLAT psl file to GTF In-Reply-To: References: , , Message-ID: Thanks very much Jamie and Chris, Some of these scripts (DAWG PAWS) and bioperl modules will make my future work much quicker. Amin Date: Fri, 22 Jul 2011 11:17:37 -0400 Subject: Re: [Bioperl-l] BLAT psl file to GTF From: jamesestill at gmail.com To: cjfields at illinois.edu CC: aminmom at hotmail.com; bioperl-l at lists.open-bio.org I wrote a blat2gff converter that worked the last time I used it .. http://dawgpaws.svn.sourceforge.net/viewvc/dawgpaws/trunk/scripts/cnv_blat2gff.pl?revision=994&content-type=text%2Fplain It is a simple/fast converter with minimal dependencies. Let me know if you try it and it works or does not work for you. The commandcnv_blat2gff.pl --help will list the basic commands, andcnv_blat2gff.pl --manwill pull up the man page. This is part of a larger set of conversion/annotation programs http://dawgpaws.sourceforge.net/with the source repository of scripts available athttp://dawgpaws.svn.sourceforge.net/viewvc/dawgpaws/trunk/scripts/ -- Jamie On Fri, Jul 22, 2011 at 10:56 AM, Chris Fields wrote: Bio::SearchIO::pal is one way, though I recall a psl2gff script (maybe from Don Gilbert?) out there somewhere that might be more up-to-date and faster. chris On Jul 21, 2011, at 9:19 AM, Amin Momin wrote: > > Hi , > I have been trying to convert a .psl file from BLAT into a GTF file. Is there a bioperl module capable of performing this. Or any tool that can be used to perform similar conversion. > > Amin > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ----------------------------------------- James C. Estill JamesEstill at gmail.com http://jestill.myweb.uga.edu ----------------------------------------- From jamesestill at gmail.com Fri Jul 22 21:56:28 2011 From: jamesestill at gmail.com (Jamie Estill) Date: Fri, 22 Jul 2011 21:56:28 -0400 Subject: [Bioperl-l] BLAT psl file to GTF In-Reply-To: References: Message-ID: I've never really needed to convert to GTF, but at one point bioperl supported gtf as gff 2.5 http://doc.bioperl.org/releases/bioperl-1.5.1/Bio/FeatureIO/gff.html So once blat data are in gff3, a conversion from gff3 to gtf would work like my $inGFF = Bio::FeatureIO->new( '-file' => "$inFile", '-format' => 'GFF', '-version' => 3 );my $outGTF = Bio::FeatureIO->new( '-file' => ">$outFile", '-format' => 'GFF', '-version' => 2.5); while (my $feature = $inGFF->next_feature() ) { $outGTF->write_feature($feature); } see discussion at http://sunnyjoy.wikispaces.com/convert+GTF+to+gff3 On Fri, Jul 22, 2011 at 12:16 PM, Amin Momin wrote: > Thanks very much Jamie and Chris, Some of these scripts (DAWG PAWS) and > bioperl modules will make my future work much quicker. > > Amin > > ------------------------------ > Date: Fri, 22 Jul 2011 11:17:37 -0400 > Subject: Re: [Bioperl-l] BLAT psl file to GTF > From: jamesestill at gmail.com > To: cjfields at illinois.edu > CC: aminmom at hotmail.com; bioperl-l at lists.open-bio.org > > > I wrote a blat2gff converter that worked the last time I used it .. > > > http://dawgpaws.svn.sourceforge.net/viewvc/dawgpaws/trunk/scripts/cnv_blat2gff.pl?revision=994&content-type=text%2Fplain > > It is a simple/fast converter with minimal dependencies. Let me know if you > try it and it works or does not work for you. > > The command > cnv_blat2gff.pl --help > will list the basic commands, and > cnv_blat2gff.pl --man > will pull up the man page. > > This is part of a larger set of conversion/annotation programs > http://dawgpaws.sourceforge.net/ > with the source repository of scripts available at > http://dawgpaws.svn.sourceforge.net/viewvc/dawgpaws/trunk/scripts/ > > -- Jamie > > On Fri, Jul 22, 2011 at 10:56 AM, Chris Fields wrote: > > Bio::SearchIO::pal is one way, though I recall a psl2gff script (maybe from > Don Gilbert?) out there somewhere that might be more up-to-date and faster. > > chris > > On Jul 21, 2011, at 9:19 AM, Amin Momin wrote: > > > > > Hi , > > I have been trying to convert a .psl file from BLAT into a GTF file. > Is there a bioperl module capable of performing this. Or any tool that can > be used to perform similar conversion. > > > > Amin > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ----------------------------------------- > James C. Estill > JamesEstill at gmail.com > http://jestill.myweb.uga.edu > ----------------------------------------- > -- ----------------------------------------- James C. Estill JamesEstill at gmail.com http://jestill.myweb.uga.edu ----------------------------------------- From shachigahoimbi at gmail.com Sat Jul 23 06:36:25 2011 From: shachigahoimbi at gmail.com (Shachi Gahoi) Date: Sat, 23 Jul 2011 16:06:25 +0530 Subject: [Bioperl-l] error in Protparm module Message-ID: I am trying following script using Protparam.pm module ################################################################################### #!/usr/bin/perl use warnings; use Bio::SeqIO; use Bio::Tools::Protparam; $seqfile='SHP1_At.fasta'; $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta'); while( $seq = $seqio->next_seq() ) { my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq); print "ID : ", $seq->display_id,"\n", "Amino acid number : ",$pp->amino_acid_number(),"\n", "Number of negative amino acids : ",$pp->num_neg(),"\n", "Number of positive amino acids : ",$pp->num_pos(),"\n", "Molecular weight : ",$pp->molecular_weight(),"\n", "Theoretical pI : ",$pp->theoretical_pI(),"\n", "Total number of atoms : ", $pp->total_atoms(),"\n", "Number of carbon atoms : ",$pp->num_carbon(),"\n", "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n", "Number of nitrogen atoms : ",$pp->num_nitro(),"\n", "Number of oxygen atoms : ",$pp->num_oxygen(),"\n", "Number of sulphur atoms : ",$pp->num_sulphur(),"\n", "Half life : ", $pp->half_life(),"\n", "Instability Index : ", $pp->instability_index(),"\n", "Stability class : ", $pp->stability(),"\n", "Aliphatic_index : ",$pp->aliphatic_index(),"\n", "Gravy : ", $pp->gravy(),"\n", "Composition of A : ", $pp->AA_comp('A'),"\n", "Composition of R : ", $pp->AA_comp('R'),"\n", "Composition of N : ", $pp->AA_comp('N'),"\n", "Composition of D : ", $pp->AA_comp('D'),"\n", "Composition of C : ", $pp->AA_comp('C'),"\n", "Composition of Q : ", $pp->AA_comp('Q'),"\n", "Composition of E : ", $pp->AA_comp('E'),"\n", "Composition of G : ", $pp->AA_comp('G'),"\n", "Composition of H : ", $pp->AA_comp('H'),"\n", "Composition of I : ", $pp->AA_comp('I'),"\n", "Composition of L : ", $pp->AA_comp('L'),"\n", "Composition of K : ", $pp->AA_comp('K'),"\n", "Composition of M : ", $pp->AA_comp('M'),"\n", "Composition of F : ", $pp->AA_comp('F'),"\n", "Composition of P : ", $pp->AA_comp('P'),"\n", "Composition of S : ", $pp->AA_comp('S'),"\n", "Composition of T : ", $pp->AA_comp('T'),"\n", "Composition of W : ", $pp->AA_comp('W'),"\n", "Composition of Y : ", $pp->AA_comp('Y'),"\n", "Composition of V : ", $pp->AA_comp('V'),"\n", "Composition of B : ", $pp->AA_comp('B'),"\n", "Composition of Z : ", $pp->AA_comp('Z'),"\n", "Composition of X : ", $pp->AA_comp('X'),"\n"; } ######################################################################################### but when i am running this script, it is printing one error message... Can't call method "throw" without a package or object reference at /usr/share/perl5/Bio/Root/Root.pm line 368, line 1. Please help me to solve this problem. Thanks in advance. -- Regards, Shachi From jovel_juan at hotmail.com Sat Jul 23 13:49:02 2011 From: jovel_juan at hotmail.com (Juan Jovel) Date: Sat, 23 Jul 2011 17:49:02 +0000 Subject: [Bioperl-l] Where to find S. cerevisiae tRNA and mRNA sequences? In-Reply-To: References: Message-ID: Hello Everybody! Anybody knows where can I find the sequence for tRNAs and mRNAs from Saccharomyces cerevisiae? I was having a look at SGD, but could not find them. Thanks a lot in advance, JUAN From cjfields at illinois.edu Sat Jul 23 17:28:05 2011 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 23 Jul 2011 16:28:05 -0500 Subject: [Bioperl-l] Where to find S. cerevisiae tRNA and mRNA sequences? In-Reply-To: References: Message-ID: <21A0A61E-8708-406D-BA64-FE10248702D5@illinois.edu> Ensembl or Biomart should have these. chris On Jul 23, 2011, at 12:49 PM, Juan Jovel wrote: > > Hello Everybody! > Anybody knows where can I find the sequence for tRNAs and mRNAs from Saccharomyces cerevisiae? I was having a look at SGD, but could not find them. > Thanks a lot in advance, > JUAN > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Mon Jul 25 04:29:48 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 25 Jul 2011 09:29:48 +0100 Subject: [Bioperl-l] How is is_circular recorded in BioSQL (by BioPerl)? Message-ID: Hi all, I'm trying to check how (currently) BioSQL should be used to record if a sequence is circular or linear. I know this property is exposed in BioPerl as the boolean is_circular() method from Bio::PrimarySeq, and based on this old thread the value gets stored in BioSQL as a sequence level annotation: http://www.bioperl.org/pipermail/biosql-l/2005-June/000843.html http://www.bioperl.org/pipermail/biosql-l/2005-June/000846.html http://www.bioperl.org/pipermail/biosql-l/2005-June/000849.html http://www.bioperl.org/pipermail/biosql-l/2005-June/000859.html The term is_circular also matches nicely with GFF3, other than a possible difference in capitalisation: "Is_circular A flag to indicate whether a feature is circular." and: "For a circular genome, the landmark feature should include Is_circular=true in column 9." http://www.sequenceontology.org/gff3.shtml. Can anyone confirm how exactly the is_circular (or Is_circular?) annotation is used in BioSQL by BioPerl? I am guessing that it is in the standard bioentry_qualifier_value table, with the term_id referencing "is_circular" (check case), rank 0, and value of "true" or "false" (check case). I want to make Biopython's BioSQL usage consistent, see: https://redmine.open-bio.org/issues/2578 Thanks, Peter From shachigahoimbi at gmail.com Mon Jul 25 05:17:15 2011 From: shachigahoimbi at gmail.com (Shachi Gahoi) Date: Mon, 25 Jul 2011 14:47:15 +0530 Subject: [Bioperl-l] problem in using protparam.pm module Message-ID: Dear All, i am using protparam.pm module. but when i am running this script it is printing one error message "Can't call method "throw" without a package or object reference at /usr/share/perl5/Bio/Root/Root.pm line 368, line 1." Kindly help me to solve this problem. Script is here---- ################################################################################### #!/usr/bin/perl use warnings; use Bio::SeqIO; use Bio::Tools::Protparam; $seqfile='test1.fasta'; $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta'); while( $seq = $seqio->next_seq() ) { my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq); print "ID : ", $seq->display_id,"\n", "Amino acid number : ",$pp->amino_acid_number(),"\n", "Number of negative amino acids : ",$pp->num_neg(),"\n", "Number of positive amino acids : ",$pp->num_pos(),"\n", "Molecular weight : ",$pp->molecular_weight(),"\n", "Theoretical pI : ",$pp->theoretical_pI(),"\n", "Total number of atoms : ", $pp->total_atoms(),"\n", "Number of carbon atoms : ",$pp->num_carbon(),"\n", "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n", "Number of nitrogen atoms : ",$pp->num_nitro(),"\n", "Number of oxygen atoms : ",$pp->num_oxygen(),"\n", "Number of sulphur atoms : ",$pp->num_sulphur(),"\n", "Half life : ", $pp->half_life(),"\n", "Instability Index : ", $pp->instability_index(),"\n", "Stability class : ", $pp->stability(),"\n", "Aliphatic_index : ",$pp->aliphatic_index(),"\n", "Gravy : ", $pp->gravy(),"\n", "Composition of A : ", $pp->AA_comp('A'),"\n", "Composition of R : ", $pp->AA_comp('R'),"\n", "Composition of N : ", $pp->AA_comp('N'),"\n", "Composition of D : ", $pp->AA_comp('D'),"\n", "Composition of C : ", $pp->AA_comp('C'),"\n", "Composition of Q : ", $pp->AA_comp('Q'),"\n", "Composition of E : ", $pp->AA_comp('E'),"\n", "Composition of G : ", $pp->AA_comp('G'),"\n", "Composition of H : ", $pp->AA_comp('H'),"\n", "Composition of I : ", $pp->AA_comp('I'),"\n", "Composition of L : ", $pp->AA_comp('L'),"\n", "Composition of K : ", $pp->AA_comp('K'),"\n", "Composition of M : ", $pp->AA_comp('M'),"\n", "Composition of F : ", $pp->AA_comp('F'),"\n", "Composition of P : ", $pp->AA_comp('P'),"\n", "Composition of S : ", $pp->AA_comp('S'),"\n", "Composition of T : ", $pp->AA_comp('T'),"\n", "Composition of W : ", $pp->AA_comp('W'),"\n", "Composition of Y : ", $pp->AA_comp('Y'),"\n", "Composition of V : ", $pp->AA_comp('V'),"\n", "Composition of B : ", $pp->AA_comp('B'),"\n", "Composition of Z : ", $pp->AA_comp('Z'),"\n", "Composition of X : ", $pp->AA_comp('X'),"\n"; } ################################################################################### -- Regards, Shachi From roy.chaudhuri at gmail.com Mon Jul 25 07:14:08 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 25 Jul 2011 12:14:08 +0100 Subject: [Bioperl-l] [BioSQL-l] How is is_circular recorded in BioSQL (by BioPerl)? In-Reply-To: References: Message-ID: <4E2D5000.30305@gmail.com> Hi Peter, As far as I understand, is_circular is not stored in BioSQL by default when using bp_load_seqdatabase.pl. As indicated in the thread you quoted, you can optionally store it as an annotation using a SequenceProcessor - I use a copy of Bio::Seq::BaseSeqProcessor modified with the following subroutine: sub process_seq { my ($self,$seq) = @_; my $value=Bio::Annotation::SimpleValue->new(-tagname=>'is_circular', -value=>$seq->is_circular); $seq->annotation->add_Annotation($value); return ($seq); } This SequenceProcessor module can be specified to bp_load_seqdatabase.pl using the --pipeline flag, and results in the value of is_circular being stored in the table bioentry_qualifier_value. Its value can be determined using SQL like: select q.value from bioentry e join bioentry_qualifier_value q using(bioentry_id) join term t using(term_id) where e.accession='U00096' and t.name='is_circular' In the thread you mentioned, it was suggested that a specific is_circular column be added to the BioSQL schema (in the biosequence table), but I don't think this has been implemented yet. Cheers, Roy. On 25/07/2011 09:29, Peter Cock wrote: > Hi all, > > I'm trying to check how (currently) BioSQL should be used to record > if a sequence is circular or linear. I know this property is exposed in > BioPerl as the boolean is_circular() method from Bio::PrimarySeq, > and based on this old thread the value gets stored in BioSQL as a > sequence level annotation: > > http://www.bioperl.org/pipermail/biosql-l/2005-June/000843.html > http://www.bioperl.org/pipermail/biosql-l/2005-June/000846.html > http://www.bioperl.org/pipermail/biosql-l/2005-June/000849.html > http://www.bioperl.org/pipermail/biosql-l/2005-June/000859.html > > The term is_circular also matches nicely with GFF3, other than a > possible difference in capitalisation: > > "Is_circular A flag to indicate whether a feature is circular." > > and: > > "For a circular genome, the landmark feature should include > Is_circular=true in column 9." > > http://www.sequenceontology.org/gff3.shtml. > > Can anyone confirm how exactly the is_circular (or Is_circular?) > annotation is used in BioSQL by BioPerl? I am guessing that > it is in the standard bioentry_qualifier_value table, with the > term_id referencing "is_circular" (check case), rank 0, and > value of "true" or "false" (check case). > > I want to make Biopython's BioSQL usage consistent, see: > https://redmine.open-bio.org/issues/2578 > > Thanks, > > Peter > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From p.j.a.cock at googlemail.com Mon Jul 25 07:20:42 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 25 Jul 2011 12:20:42 +0100 Subject: [Bioperl-l] [BioSQL-l] How is is_circular recorded in BioSQL (by BioPerl)? In-Reply-To: <4E2D5000.30305@gmail.com> References: <4E2D5000.30305@gmail.com> Message-ID: On Mon, Jul 25, 2011 at 12:14 PM, Roy Chaudhuri wrote: > Hi Peter, > > As far as I understand, is_circular is not stored in BioSQL by default when > using bp_load_seqdatabase.pl. As indicated in the thread you quoted, you can > optionally store it as an annotation using a SequenceProcessor - I use a > copy of Bio::Seq::BaseSeqProcessor modified with the following subroutine: > > sub process_seq { > ? ?my ($self,$seq) = @_; > ? ?my $value=Bio::Annotation::SimpleValue->new(-tagname=>'is_circular', > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -value=>$seq->is_circular); > ? ?$seq->annotation->add_Annotation($value); > ? ?return ($seq); > } > > This SequenceProcessor module can be specified to bp_load_seqdatabase.pl > using the --pipeline flag, and results in the value of is_circular being > stored in the table bioentry_qualifier_value. Its value can be determined > using SQL like: > > select q.value from bioentry e > join bioentry_qualifier_value q using(bioentry_id) > join term t using(term_id) > where e.accession='U00096' and t.name='is_circular' > That's interesting - I hadn't realised this was optional in BioPerl. Can you tell what value is actually put in the database? Presumably whatever Perl defaults to as the string representation of a boolean? > In the thread you mentioned, it was suggested that a specific is_circular > column be added to the BioSQL schema (in the biosequence table), but I don't > think this has been implemented yet. > > Cheers, > Roy. Confirmed, there is no such column in the biosequence table (yet). Thank you, Peter From roy.chaudhuri at gmail.com Mon Jul 25 07:27:16 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 25 Jul 2011 12:27:16 +0100 Subject: [Bioperl-l] [BioSQL-l] How is is_circular recorded in BioSQL (by BioPerl)? In-Reply-To: References: <4E2D5000.30305@gmail.com> Message-ID: <4E2D5314.5090107@gmail.com> > Can you tell what value is actually put in the database? Presumably > whatever Perl defaults to as the string representation of a boolean? The database value is either 1 or NULL (equivalent to 1 or undef in Perl). From p.j.a.cock at googlemail.com Mon Jul 25 07:30:17 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 25 Jul 2011 12:30:17 +0100 Subject: [Bioperl-l] [BioSQL-l] How is is_circular recorded in BioSQL (by BioPerl)? In-Reply-To: <4E2D5314.5090107@gmail.com> References: <4E2D5000.30305@gmail.com> <4E2D5314.5090107@gmail.com> Message-ID: On Mon, Jul 25, 2011 at 12:27 PM, Roy Chaudhuri wrote: >> Can you tell what value is actually put in the database? Presumably >> whatever Perl defaults to as the string representation of a boolean? > > The database value is either 1 or NULL (equivalent to 1 or undef in Perl). > Excellent - I can do the same in Biopython then. I don't suppose you happen to know where the molecule type goes (also in the GenBank/EMBL LOCUS/ID line, e.g. genomic DNA)? Thank you, Peter From jamesestill at gmail.com Fri Jul 22 11:17:37 2011 From: jamesestill at gmail.com (Jamie Estill) Date: Fri, 22 Jul 2011 11:17:37 -0400 Subject: [Bioperl-l] BLAT psl file to GTF In-Reply-To: References: Message-ID: I wrote a blat2gff converter that worked the last time I used it .. http://dawgpaws.svn.sourceforge.net/viewvc/dawgpaws/trunk/scripts/cnv_blat2gff.pl?revision=994&content-type=text%2Fplain It is a simple/fast converter with minimal dependencies. Let me know if you try it and it works or does not work for you. The command cnv_blat2gff.pl --help will list the basic commands, and cnv_blat2gff.pl --man will pull up the man page. This is part of a larger set of conversion/annotation programs http://dawgpaws.sourceforge.net/ with the source repository of scripts available at http://dawgpaws.svn.sourceforge.net/viewvc/dawgpaws/trunk/scripts/ -- Jamie On Fri, Jul 22, 2011 at 10:56 AM, Chris Fields wrote: > Bio::SearchIO::pal is one way, though I recall a psl2gff script (maybe from > Don Gilbert?) out there somewhere that might be more up-to-date and faster. > > chris > > On Jul 21, 2011, at 9:19 AM, Amin Momin wrote: > > > > > Hi , > > I have been trying to convert a .psl file from BLAT into a GTF file. > Is there a bioperl module capable of performing this. Or any tool that can > be used to perform similar conversion. > > > > Amin > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ----------------------------------------- James C. Estill JamesEstill at gmail.com http://jestill.myweb.uga.edu ----------------------------------------- From seobiologi93 at gmail.com Mon Jul 25 05:11:13 2011 From: seobiologi93 at gmail.com (seo biologi) Date: Mon, 25 Jul 2011 17:11:13 +0800 Subject: [Bioperl-l] BioPerl Installation Message-ID: Hi, I am having some problem for the installation of BioPerl in my machine. this is the details : a) BioPerl version : BioPerl-1.6.0.tar.gz b) Machine Linux version : Linux clcbio.crystal.um.edu.my 2.6.9-100.ELsmp #1 SMP Wed Feb 16 16:01:26 CST 2011 x86_64 x86_64 x86_64 GNU/Linux c) Perl version : v5.8.5 built for x86_64-linux-thread-multi d) ActivePerl 5.14 And im using this step of installation : 1. Run the command for extract the files : >tar xvfz BioPerl-1.6.1.tar.gz >cd BioPerl-1.6.1 2. Issue the build commands : >perl Build.PL >./Build test 2 pic attached is the error and linux version of the machine. Please advised. Thank u so much. -------------- next part -------------- A non-text attachment was scrubbed... Name: BioPerl Problem.jpg Type: image/jpeg Size: 282045 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Linux Version.jpg Type: image/jpeg Size: 108859 bytes Desc: not available URL: From roy.chaudhuri at gmail.com Mon Jul 25 08:03:56 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 25 Jul 2011 13:03:56 +0100 Subject: [Bioperl-l] [BioSQL-l] How is is_circular recorded in BioSQL (by BioPerl)? In-Reply-To: References: <4E2D5000.30305@gmail.com> <4E2D5314.5090107@gmail.com> Message-ID: <4E2D5BAC.8020001@gmail.com> I don't think there's any specific handling, but (in GenBank files at least) mol_type is recorded as a tag in the source feature, so it will be stored in BioSQL like any other feature tag (in seqfeature_qualifier_value). On 25/07/2011 12:30, Peter Cock wrote: > On Mon, Jul 25, 2011 at 12:27 PM, Roy Chaudhuri wrote: >>> Can you tell what value is actually put in the database? Presumably >>> whatever Perl defaults to as the string representation of a boolean? >> >> The database value is either 1 or NULL (equivalent to 1 or undef in Perl). >> > > Excellent - I can do the same in Biopython then. > > I don't suppose you happen to know where the molecule type goes > (also in the GenBank/EMBL LOCUS/ID line, e.g. genomic DNA)? > > Thank you, > > Peter From p.j.a.cock at googlemail.com Mon Jul 25 08:57:51 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 25 Jul 2011 13:57:51 +0100 Subject: [Bioperl-l] [BioSQL-l] How is is_circular recorded in BioSQL (by BioPerl)? In-Reply-To: <4E2D5BAC.8020001@gmail.com> References: <4E2D5000.30305@gmail.com> <4E2D5314.5090107@gmail.com> <4E2D5BAC.8020001@gmail.com> Message-ID: On Mon, Jul 25, 2011 at 1:03 PM, Roy Chaudhuri wrote: > I don't think there's any specific handling, but (in GenBank files at least) > mol_type is recorded as a tag in the source feature, so it will be stored in > BioSQL like any other feature tag (in seqfeature_qualifier_value). I'd forgotten in my question this potential slight redundancy in the GenBank format! Consider this example, the molecule type is only in the LOCUS line (DNA), and incidentally there are two source features: http://biopython.org/SRC/biopython/Tests/GenBank/NT_019265.gb Likewise in the current version of the sample record on the NCBI website, the molecule type is only in the LOCUS line (in this case again just as DNA, but other values are mentioned), and not in the source feature: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.htm However in this third example, the molecule type is in the LOCUS line (as DNA) and in the source feature (as genomic DNA): http://biopython.org/SRC/biopython/Tests/GenBank/NC_000932.gb The GenBank/EMBL feature annotation is quite straightforward with mapping to BioSQL (and I'm pretty sure the Biopython and BioPerl are consistent here). Its all the header information that isn't as pinned down. Let me clarify that I'm interested in if and where BioPerl stores the molecule type from the GenBank LOCUS line in BioSQL (and I'm expecting this to go in bioentry_qualifier_value table under some tag name). Thanks again, Peter P.S. As as been discussed before, the BioSQL documentation would benefit from at least one worked example of a (small) GenBank file showing where each field ends up in the database. It would be a reasonable amount of work though - but could then be used for a basic compliance unit test by all the Bio* interfaces to BioSQL. From roy.chaudhuri at gmail.com Mon Jul 25 10:12:38 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 25 Jul 2011 15:12:38 +0100 Subject: [Bioperl-l] [BioSQL-l] How is is_circular recorded in BioSQL (by BioPerl)? In-Reply-To: References: <4E2D5000.30305@gmail.com> <4E2D5314.5090107@gmail.com> <4E2D5BAC.8020001@gmail.com> Message-ID: <4E2D79D6.6020108@gmail.com> >> I don't think there's any specific handling, but (in GenBank files >> at least) mol_type is recorded as a tag in the source feature, so >> it will be stored in BioSQL like any other feature tag (in >> seqfeature_qualifier_value). > > I'd forgotten in my question this potential slight redundancy in the > GenBank format! No problem, I forgot in my answer that for some obscure reason people may be interested in looking at GenBank files that aren't bacterial genome sequences. > Let me clarify that I'm interested in if and where BioPerl stores > the molecule type from the GenBank LOCUS line in BioSQL (and I'm > expecting this to go in bioentry_qualifier_value table under some tag > name). As far as I can tell, the only fields stored by default in bioentry_qualifier_value are keyword, date_changed and secondary_accession (although my database only contains GenBank bacterial genomes). As with the is_circular hack, you could store the molecule type by adding it as an annotation in the SequenceProcessor (it's stored as $seq->molecule by BioPerl). Actually, when round-tripping a GenBank file through BioSQL, the LOCUS line molecule type ends up in lower case, which makes me wonder if it is coming from alphabet in the biosequence table. > P.S. > > As as been discussed before, the BioSQL documentation would benefit > from at least one worked example of a (small) GenBank file showing > where each field ends up in the database. It would be a reasonable > amount of work though - but could then be used for a basic compliance > unit test by all the Bio* interfaces to BioSQL. I agree that this would be very useful - the SearchIO HOWTO has a similar treatment of a BLAST report that I often refer to. From carandraug+dev at gmail.com Mon Jul 25 10:34:42 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Mon, 25 Jul 2011 15:34:42 +0100 Subject: [Bioperl-l] github -- pull request for adding bp_ to scripts Message-ID: Hi everyone I made a pull request to the bioperl repo a few days ago ( https://github.com/bioperl/bioperl-live/pull/17 ) but got no answer yet. The changes are on the scripts directory. During installation, the scripts have their names changed to bp_something. Some of them already have the bp_ prefix on them and those are not changed. I simply changed the filename of all of them so there's no need to make changes by the install script. Also, since after installation all of them are named bp_something, it makes sense that their documentation (and man pages generated from the POD) reflect this. As such, I also changed their names on the documentation. I asked at #bioperl back then and people seemed positive about this change: I noticed that the scripts have their names changed to bp_scriptname during install. Any reason why not to have the files already with bp? I noticed the man pages refer to the scripts without the bp. I forked it and was planning on fix that carandraug: i'm not sure what the logic is behind having it that way. pyrimidine is likely to be in the channel pretty soon, he would probably know carandraug: i agree with you that it's silly carandraug: yes, it is silly. I recall there being a reason for this at some point, but I slept since then. I think it's safe to go ahead and change them. Could someone comment on the pull request? Thanks, Carn? From skastu01 at students.poly.edu Mon Jul 25 11:27:09 2011 From: skastu01 at students.poly.edu (Lakshmi Kastury-Vennelaganti) Date: Mon, 25 Jul 2011 15:27:09 +0000 Subject: [Bioperl-l] Where to find S. cerevisiae tRNA and mRNA sequences? In-Reply-To: <21A0A61E-8708-406D-BA64-FE10248702D5@illinois.edu> References: , , <21A0A61E-8708-406D-BA64-FE10248702D5@illinois.edu> Message-ID: Hi, The following resource should have these: http://www.yeastgenome.org/cgi-bin/blast-sgd.pl The entire site is dedicated to S.Cerevisiae gene/seq resources. > From: cjfields at illinois.edu > Date: Sat, 23 Jul 2011 16:28:05 -0500 > To: jovel_juan at hotmail.com > CC: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Where to find S. cerevisiae tRNA and mRNA sequences? > > Ensembl or Biomart should have these. > > chris > > On Jul 23, 2011, at 12:49 PM, Juan Jovel wrote: > > > > > Hello Everybody! > > Anybody knows where can I find the sequence for tRNAs and mRNAs from Saccharomyces cerevisiae? I was having a look at SGD, but could not find them. > > Thanks a lot in advance, > > JUAN > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Mon Jul 25 11:39:38 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 25 Jul 2011 16:39:38 +0100 Subject: [Bioperl-l] [BioSQL-l] How is is_circular recorded in BioSQL (by BioPerl)? In-Reply-To: <4E2D79D6.6020108@gmail.com> References: <4E2D5000.30305@gmail.com> <4E2D5314.5090107@gmail.com> <4E2D5BAC.8020001@gmail.com> <4E2D79D6.6020108@gmail.com> Message-ID: On Mon, Jul 25, 2011 at 3:12 PM, Roy Chaudhuri wrote: >>> I don't think there's any specific handling, but (in GenBank files >>> at least) mol_type is recorded as a tag in the source feature, so >>> it will be stored in BioSQL like any other feature tag (in >>> seqfeature_qualifier_value). >> >> I'd forgotten in my question this potential slight redundancy in the >> ?GenBank format! > > No problem, I forgot in my answer that for some obscure reason people > may be interested in looking at GenBank files that aren't bacterial genome > sequences. Sampling bias ;) >> Let me clarify that I'm interested in if and where BioPerl stores >> the molecule type from the GenBank LOCUS line in BioSQL (and I'm >> expecting this to go in bioentry_qualifier_value table under some tag >> name). > > As far as I can tell, the only fields stored by default in > bioentry_qualifier_value are keyword, date_changed and secondary_accession > (although my database only contains GenBank bacterial genomes). As with the > is_circular hack, you could store the molecule type by adding it as an > annotation in the SequenceProcessor (it's stored as $seq->molecule by > BioPerl). OK, that makes sense. > Actually, when round-tripping a GenBank file through BioSQL, the LOCUS line > molecule type ends up in lower case, which makes me wonder if it is coming > from alphabet in the biosequence table. If so, that may break for viral GenBank files where the LOCUS line may say RNA, but the sequence is given using acgt (i.e. the DNA alphabet). >> P.S. >> >> As as been discussed before, the BioSQL documentation would benefit >> from at least one worked example of a (small) GenBank file showing >> where each field ends up in the database. It would be a reasonable >> amount of work though - but could then be used for a basic compliance >> unit test by all the Bio* interfaces to BioSQL. > > I agree that this would be very useful - the SearchIO HOWTO has a similar > treatment of a BLAST report that I often refer to. If only we could clone/fork bioinformaticians ;) Peter From cjfields at illinois.edu Mon Jul 25 12:24:19 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 25 Jul 2011 11:24:19 -0500 Subject: [Bioperl-l] github -- pull request for adding bp_ to scripts In-Reply-To: References: Message-ID: <1A743058-86E4-4325-9306-52334E99DA6E@illinois.edu> I responded to that: http://lists.open-bio.org/pipermail/bioperl-l/2011-July/035425.html I haven't heard any arguments against it, will merge in today. We will need to ensure Build.PL is set up so the scripts are installed correctly. chris On Jul 25, 2011, at 9:34 AM, Carn? Draug wrote: > Hi everyone > > I made a pull request to the bioperl repo a few days ago ( > https://github.com/bioperl/bioperl-live/pull/17 ) but got no answer > yet. The changes are on the scripts directory. > > During installation, the scripts have their names changed to > bp_something. Some of them already have the bp_ prefix on them and > those are not changed. I simply changed the filename of all of them so > there's no need to make changes by the install script. Also, since > after installation all of them are named bp_something, it makes sense > that their documentation (and man pages generated from the POD) > reflect this. As such, I also changed their names on the > documentation. > > I asked at #bioperl back then and people seemed positive about this change: > > I noticed that the scripts have their names changed to > bp_scriptname during install. Any reason why not to have the files > already with bp? I noticed the man pages refer to the scripts without > the bp. I forked it and was planning on fix that > carandraug: i'm not sure what the logic is behind having it > that way. pyrimidine is likely to be in the channel pretty soon, he > would probably know > carandraug: i agree with you that it's silly > carandraug: yes, it is silly. I recall there being a > reason for this at some point, but I slept since then. I think it's > safe to go ahead and change them. > > Could someone comment on the pull request? > > Thanks, > Carn? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Mon Jul 25 12:25:28 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 25 Jul 2011 17:25:28 +0100 Subject: [Bioperl-l] [BioSQL-l] How is is_circular recorded in BioSQL (by BioPerl)? In-Reply-To: References: <4E2D5000.30305@gmail.com> <4E2D5314.5090107@gmail.com> <4E2D5BAC.8020001@gmail.com> <4E2D79D6.6020108@gmail.com> Message-ID: <4E2D98F8.5010703@gmail.com> >> Actually, when round-tripping a GenBank file through BioSQL, the LOCUS line >> molecule type ends up in lower case, which makes me wonder if it is coming >> from alphabet in the biosequence table. > > If so, that may break for viral GenBank files where the LOCUS line may say > RNA, but the sequence is given using acgt (i.e. the DNA alphabet). Just tried it and that does seem to be the case. It's not the only thing that breaks on round tripping, for example circular genomes become linear. From p.j.a.cock at googlemail.com Mon Jul 25 12:29:48 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 25 Jul 2011 17:29:48 +0100 Subject: [Bioperl-l] [BioSQL-l] How is is_circular recorded in BioSQL (by BioPerl)? In-Reply-To: <4E2D98F8.5010703@gmail.com> References: <4E2D5000.30305@gmail.com> <4E2D5314.5090107@gmail.com> <4E2D5BAC.8020001@gmail.com> <4E2D79D6.6020108@gmail.com> <4E2D98F8.5010703@gmail.com> Message-ID: On Mon, Jul 25, 2011 at 5:25 PM, Roy Chaudhuri wrote: >>> Actually, when round-tripping a GenBank file through BioSQL, the LOCUS >>> line molecule type ends up in lower case, which makes me wonder if it is >>> coming from alphabet in the biosequence table. >> >> If so, that may break for viral GenBank files where the LOCUS line may say >> RNA, but the sequence is given using acgt (i.e. the DNA alphabet). > > Just tried it and that does seem to be the case. It's not the only thing > that breaks on round tripping, for example circular genomes become linear. > Sounds like one or two bug reports are needed, http://redmine.open-bio.org/projects/bioperl We already have one open on Biopython for this: https://redmine.open-bio.org/issues/2578 Peter From cjfields at illinois.edu Mon Jul 25 12:31:41 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 25 Jul 2011 11:31:41 -0500 Subject: [Bioperl-l] [BioSQL-l] How is is_circular recorded in BioSQL (by BioPerl)? In-Reply-To: References: <4E2D5000.30305@gmail.com> <4E2D5314.5090107@gmail.com> <4E2D5BAC.8020001@gmail.com> <4E2D79D6.6020108@gmail.com> Message-ID: On Jul 25, 2011, at 10:39 AM, Peter Cock wrote: > On Mon, Jul 25, 2011 at 3:12 PM, Roy Chaudhuri wrote: >>>> I don't think there's any specific handling, but (in GenBank files >>>> at least) mol_type is recorded as a tag in the source feature, so >>>> it will be stored in BioSQL like any other feature tag (in >>>> seqfeature_qualifier_value). >>> >>> I'd forgotten in my question this potential slight redundancy in the >>> GenBank format! >> >> No problem, I forgot in my answer that for some obscure reason people >> may be interested in looking at GenBank files that aren't bacterial genome >> sequences. > > Sampling bias ;) > >>> Let me clarify that I'm interested in if and where BioPerl stores >>> the molecule type from the GenBank LOCUS line in BioSQL (and I'm >>> expecting this to go in bioentry_qualifier_value table under some tag >>> name). >> >> As far as I can tell, the only fields stored by default in >> bioentry_qualifier_value are keyword, date_changed and secondary_accession >> (although my database only contains GenBank bacterial genomes). As with the >> is_circular hack, you could store the molecule type by adding it as an >> annotation in the SequenceProcessor (it's stored as $seq->molecule by >> BioPerl). > > OK, that makes sense. > >> Actually, when round-tripping a GenBank file through BioSQL, the LOCUS line >> molecule type ends up in lower case, which makes me wonder if it is coming >> from alphabet in the biosequence table. > > If so, that may break for viral GenBank files where the LOCUS line may say > RNA, but the sequence is given using acgt (i.e. the DNA alphabet). Not sure, but that's worth checking on. Truthfully, our interest has typically been in favor more towards parsing data into the proper classes for downstream analysis than round-tripping sequence formats. Not that the latter isn't important, but that there is frankly more interest in doing something more than rote sequence format conversion. >>> P.S. >>> >>> As as been discussed before, the BioSQL documentation would benefit >>> from at least one worked example of a (small) GenBank file showing >>> where each field ends up in the database. It would be a reasonable >>> amount of work though - but could then be used for a basic compliance >>> unit test by all the Bio* interfaces to BioSQL. >> >> I agree that this would be very useful - the SearchIO HOWTO has a similar >> treatment of a BLAST report that I often refer to. > > If only we could clone/fork bioinformaticians ;) > > Peter :) chris From cjfields at illinois.edu Mon Jul 25 12:33:01 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 25 Jul 2011 11:33:01 -0500 Subject: [Bioperl-l] [BioSQL-l] How is is_circular recorded in BioSQL (by BioPerl)? In-Reply-To: References: <4E2D5000.30305@gmail.com> <4E2D5314.5090107@gmail.com> <4E2D5BAC.8020001@gmail.com> <4E2D79D6.6020108@gmail.com> <4E2D98F8.5010703@gmail.com> Message-ID: <464C3D4C-D0CD-4EA1-8A77-CE036841EB77@illinois.edu> On Jul 25, 2011, at 11:29 AM, Peter Cock wrote: > On Mon, Jul 25, 2011 at 5:25 PM, Roy Chaudhuri wrote: >>>> Actually, when round-tripping a GenBank file through BioSQL, the LOCUS >>>> line molecule type ends up in lower case, which makes me wonder if it is >>>> coming from alphabet in the biosequence table. >>> >>> If so, that may break for viral GenBank files where the LOCUS line may say >>> RNA, but the sequence is given using acgt (i.e. the DNA alphabet). >> >> Just tried it and that does seem to be the case. It's not the only thing >> that breaks on round tripping, for example circular genomes become linear. >> > > Sounds like one or two bug reports are needed, > http://redmine.open-bio.org/projects/bioperl > > We already have one open on Biopython for this: > https://redmine.open-bio.org/issues/2578 > > Peter If these reports have examples we can work on a fix. chris From cjfields at illinois.edu Mon Jul 25 12:33:01 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 25 Jul 2011 11:33:01 -0500 Subject: [Bioperl-l] [BioSQL-l] How is is_circular recorded in BioSQL (by BioPerl)? In-Reply-To: References: <4E2D5000.30305@gmail.com> <4E2D5314.5090107@gmail.com> <4E2D5BAC.8020001@gmail.com> <4E2D79D6.6020108@gmail.com> <4E2D98F8.5010703@gmail.com> Message-ID: <715EFC4B-8778-44E6-B531-26D4F0C3FE56@illinois.edu> On Jul 25, 2011, at 11:29 AM, Peter Cock wrote: > On Mon, Jul 25, 2011 at 5:25 PM, Roy Chaudhuri wrote: >>>> Actually, when round-tripping a GenBank file through BioSQL, the LOCUS >>>> line molecule type ends up in lower case, which makes me wonder if it is >>>> coming from alphabet in the biosequence table. >>> >>> If so, that may break for viral GenBank files where the LOCUS line may say >>> RNA, but the sequence is given using acgt (i.e. the DNA alphabet). >> >> Just tried it and that does seem to be the case. It's not the only thing >> that breaks on round tripping, for example circular genomes become linear. >> > > Sounds like one or two bug reports are needed, > http://redmine.open-bio.org/projects/bioperl > > We already have one open on Biopython for this: > https://redmine.open-bio.org/issues/2578 > > Peter If these reports have examples we can work on a fix. chris From carandraug+dev at gmail.com Mon Jul 25 13:21:31 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Mon, 25 Jul 2011 18:21:31 +0100 Subject: [Bioperl-l] github -- pull request for adding bp_ to scripts In-Reply-To: <1A743058-86E4-4325-9306-52334E99DA6E@illinois.edu> References: <1A743058-86E4-4325-9306-52334E99DA6E@illinois.edu> Message-ID: 2011/7/25 Chris Fields : > I responded to that: > > http://lists.open-bio.org/pipermail/bioperl-l/2011-July/035425.html > > I haven't heard any arguments against it, will merge in today. ?We will need to ensure Build.PL is set up so the scripts are installed correctly. Oh, you're right. I somehow missed it. Thank you. Another thing that I found weird is the extension. Why the .PLS extension? It's the first time I see it for a perl script and actually confuses my desktop. I understand that aside the .pm and .t extension, perl doesn't really care and there's no official rule. However, the canonical approach is to use .pl for scripts or nothing for applications (and these scripts end up being installed as individual applications anyway). Carn? From cjfields at illinois.edu Mon Jul 25 13:44:33 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 25 Jul 2011 12:44:33 -0500 Subject: [Bioperl-l] github -- pull request for adding bp_ to scripts In-Reply-To: References: <1A743058-86E4-4325-9306-52334E99DA6E@illinois.edu> Message-ID: <47B7AA99-5ED3-4B1F-B512-DEB98FADECBC@illinois.edu> (Lincoln, maybe you can add to the below?) On Jul 25, 2011, at 12:21 PM, Carn? Draug wrote: > 2011/7/25 Chris Fields : >> I responded to that: >> >> http://lists.open-bio.org/pipermail/bioperl-l/2011-July/035425.html >> >> I haven't heard any arguments against it, will merge in today. We will need to ensure Build.PL is set up so the scripts are installed correctly. > > Oh, you're right. I somehow missed it. Thank you. > > Another thing that I found weird is the extension. Why the .PLS > extension? It's the first time I see it for a perl script and actually > confuses my desktop. I recall there being an argument for this at some point; there is an old post here: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/880 http://thread.gmane.org/gmane.comp.lang.perl.bio.general/1150/focus=1216 The way I read that, I think the reasoning was to indicate the scripts are templates and the proper perl version would be affixed during build/installation. Maybe this is now outdated; we have since moved on from ExtUtils::MakeMakes to Module::Build; IIRC the tempting system mentioned in the latter thread still works regardless of the file extension. > I understand that aside the .pm and .t extension, perl doesn't really > care and there's no official rule. However, the canonical approach is > to use .pl for scripts or nothing for applications (and these scripts > end up being installed as individual applications anyway). > > Carn? Well, that's not completely true (the 'canonical' bit). In general it's considered best practice to use *.pl for scripts, *.pm for modules, and *.pod for doc-only (helps with editors, default launch apps, etc), but there isn't any real steadfast rule saying one must use that, or that not doing so will break things. Some installed scripts leave off .pl entirely and just make the script executable. chris From carandraug+dev at gmail.com Mon Jul 25 13:59:29 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Mon, 25 Jul 2011 18:59:29 +0100 Subject: [Bioperl-l] github -- pull request for adding bp_ to scripts In-Reply-To: <47B7AA99-5ED3-4B1F-B512-DEB98FADECBC@illinois.edu> References: <1A743058-86E4-4325-9306-52334E99DA6E@illinois.edu> <47B7AA99-5ED3-4B1F-B512-DEB98FADECBC@illinois.edu> Message-ID: 2011/7/25 Chris Fields : > (Lincoln, maybe you can add to the below?) > > On Jul 25, 2011, at 12:21 PM, Carn? Draug wrote: > >> 2011/7/25 Chris Fields : >>> I responded to that: >>> >>> http://lists.open-bio.org/pipermail/bioperl-l/2011-July/035425.html >>> >>> I haven't heard any arguments against it, will merge in today. ?We will need to ensure Build.PL is set up so the scripts are installed correctly. >> >> Oh, you're right. I somehow missed it. Thank you. >> >> Another thing that I found weird is the extension. Why the .PLS >> extension? It's the first time I see it for a perl script and actually >> confuses my desktop. > > I recall there being an argument for this at some point; there is an old post here: > > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/880 > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/1150/focus=1216 > > The way I read that, I think the reasoning was to indicate the scripts are templates and the proper perl version would be affixed during build/installation. ?Maybe this is now outdated; we have since moved on from ExtUtils::MakeMakes to Module::Build; IIRC the tempting system mentioned in the latter thread still works regardless of the file extension. At least the script 'bp_das_server' has the .pl extension so I'm guessing it no longer makes any difference. Carn? From cjfields at illinois.edu Mon Jul 25 14:13:11 2011 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 25 Jul 2011 13:13:11 -0500 Subject: [Bioperl-l] github -- pull request for adding bp_ to scripts In-Reply-To: References: <1A743058-86E4-4325-9306-52334E99DA6E@illinois.edu> <47B7AA99-5ED3-4B1F-B512-DEB98FADECBC@illinois.edu> Message-ID: On Jul 25, 2011, at 12:59 PM, Carn? Draug wrote: > 2011/7/25 Chris Fields : >> (Lincoln, maybe you can add to the below?) >> >> On Jul 25, 2011, at 12:21 PM, Carn? Draug wrote: >> >>> 2011/7/25 Chris Fields : >>>> I responded to that: >>>> >>>> http://lists.open-bio.org/pipermail/bioperl-l/2011-July/035425.html >>>> >>>> I haven't heard any arguments against it, will merge in today. We will need to ensure Build.PL is set up so the scripts are installed correctly. >>> >>> Oh, you're right. I somehow missed it. Thank you. >>> >>> Another thing that I found weird is the extension. Why the .PLS >>> extension? It's the first time I see it for a perl script and actually >>> confuses my desktop. >> >> I recall there being an argument for this at some point; there is an old post here: >> >> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/880 >> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/1150/focus=1216 >> >> The way I read that, I think the reasoning was to indicate the scripts are templates and the proper perl version would be affixed during build/installation. Maybe this is now outdated; we have since moved on from ExtUtils::MakeMakes to Module::Build; IIRC the tempting system mentioned in the latter thread still works regardless of the file extension. > > At least the script 'bp_das_server' has the .pl extension so I'm > guessing it no longer makes any difference. > > Carn? I think it'll be okay, but maybe we can submit that as a separate pull request just in case. I would like a a bit more feedback from the GMOD side. chris From awitney at sgul.ac.uk Tue Jul 26 08:35:08 2011 From: awitney at sgul.ac.uk (Adam Witney) Date: Tue, 26 Jul 2011 13:35:08 +0100 Subject: [Bioperl-l] Perl In-Reply-To: References: Message-ID: <419C31B6-32F3-4B2B-B59B-5DACD9825EF2@sgul.ac.uk> [please keep the list cc'd into your emails] your script is mostly there, you just need to add your processing criteria, ie sequences longer than 200bp, see below: On 26 Jul 2011, at 13:29, Akash wrote: > Sir > > I am very thankful to you for your suggestion and I tried to work on the Bio-Perl but as it is very new for me, I am unable to understand in a very short period of time because the work I am doing, I have to show in next couple of days. > > So Please can you help in this? I want to remove the "contgis" from a "fasta" file which contain more than "200 base pairs". > > With Regards > Akash > > I am using this script right now > > use strict; > use Bio::SeqIO; > my $in = Bio::SeqIO->new(-file=>"contigs.fa", '-format'=>'FASTA'); > my $out = Bio::SeqIO->new(-file=>">contigs1.fa", '-format'=>'FASTA'); > > while( my $Seq = $in->next_seq()) > { if ( $seq->length() > 200 ) { $out->write_seq($seq); } > } From carandraug+dev at gmail.com Tue Jul 26 08:46:31 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Tue, 26 Jul 2011 13:46:31 +0100 Subject: [Bioperl-l] Fwd: perl In-Reply-To: References: Message-ID: Please keep it the conversation on the mailing list. Also, I won't be doing your homework. There's plenty of documentation for perl and bioperl. Carn? ---------- Forwarded message ---------- From: Akash Date: 26 July 2011 13:28 Subject: perl To: carandraug+dev at gmail.com Sir I am very thankful to you for your suggestion and I tried to work on the Bio-Perl but as it is very new for me, I am unable to understand in a very short period of time because the work I am doing, I have to show in next couple of days. So Please can you help in this? I want to remove the "contgis" from a "fasta" file which contain more than "200 base pairs". With Regards Akash I am using this script right now use strict; use Bio::SeqIO; my $in = Bio::SeqIO->new(-file=>"contigs.fa", '-format'=>'FASTA'); my $out = Bio::SeqIO->new(-file=>">contigs1.fa", '-format'=>'FASTA'); while( my $Seq = $in->next_seq()) { } From tristan.lefebure at gmail.com Tue Jul 26 10:47:43 2011 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Tue, 26 Jul 2011 16:47:43 +0200 Subject: [Bioperl-l] Bio::Tools::Run::Phylo::Phyml, tree_string Message-ID: <201107261647.43652.tristan.lefebure@gmail.com> Hi there, I am not quite sure I understand why tree_string() from Bio::Tools::Run::Phylo::Phyml returns a string that looks like that (I removed the end of the tree): Tree is BIONJ(((((((('92':0.0114354726,'12':0.0472591023)0.0000000000:0.0000005859,... Why do we have this 'Tree is BIONJ' thing? A quick look at the code in the _run() function gives : { open(my $FH_TREE, "<", $tree_file) || $self->throw("Phyml call ($command) did not give an output: $?"); local $/; $self->{_tree} .= <$FH_TREE>; } Why appending something to $self->{_tree}? What about? $self->{_tree} = <$FH_TREE>; I was about to fill a bug report, but then I saw that in Phyml.t: is substr($factory->tree_string, 0, 9), 'BIONJ(SIN', 'tree_string()'; Well, I am lost. Any help much appreciated... -- Tristan From awitney at sgul.ac.uk Tue Jul 26 11:07:05 2011 From: awitney at sgul.ac.uk (Adam Witney) Date: Tue, 26 Jul 2011 16:07:05 +0100 Subject: [Bioperl-l] Error writing SequenceProcessor to associate GO terms in biosql database Message-ID: <26C59A57-F54A-4237-8D97-4E7A77E55D59@sgul.ac.uk> Hi, I'm trying to write a SequenceProcessor for a genbank file to associate GO terms to the GO data preloaded in my biosql database. The command looks like this: perl load_seqdatabase.pl --dbname=biosql --driver=Pg --host=myhost --port= 5432 --dbuser=user --dbpass=pass -format genbank -namespace testing -pipeline 'GOSequenceProcessor' --debug S_sonnei.EB1_s_sonnei.dat The SequenceProcessor process_seq looks like this: sub process_seq{ my ($self,$seq) = @_; my @features = $seq->get_SeqFeatures(); foreach my $feat ( @features ) { if ( $feat->has_tag('db_xref') ) { my @db_xrefs = $feat->get_tag_values('db_xref'); foreach my $db_xref (@db_xrefs) { if ( $db_xref =~ m/^GO:/ ) { my $term = Bio::Annotation::OntologyTerm->new(-identifier => $db_xref, -ontology => 'Gene Ontology'); $feat->annotation->add_Annotation($term); } } } } return ($seq); } But this gives this error: preparing INSERT statement: INSERT INTO seqfeature_qualifier_value (seqfeature_id, term_id, rank) VALUES (?, ?, ?) TermAdaptor::add_assoc: binding column 1 to "935181" (FK to Bio::SeqFeature::Generic) TermAdaptor::add_assoc: binding column 2 to "50253" (FK to Bio::Annotation::OntologyTerm) TermAdaptor::add_assoc: binding column 3 to "1" (rank) --------------------- WARNING --------------------- MSG: TermAdaptor::add_assoc: unexpected failure of statement execution: ERROR: null value in column "value" violates not-null constraint name: INSERT ASSOC [1] Bio::SeqFeature::Generic;Bio::Annotation::OntologyTerm STACK Bio::DB::BioSQL::BasePersistenceAdaptor::add_association /var/users/adam/BioPerl/bioperl-db/lib//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:458 STACK Bio::DB::BioSQL::AnnotationCollectionAdaptor::add_association /var/users/adam/BioPerl/bioperl-db/lib//Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:468 STACK Bio::DB::BioSQL::SeqFeatureAdaptor::store_children /var/users/adam/BioPerl/bioperl-db/lib//Bio/DB/BioSQL/SeqFeatureAdaptor.pm:304 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /var/users/adam/BioPerl/bioperl-db/lib//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /var/users/adam/BioPerl/bioperl-db/lib//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK Bio::DB::Persistent::PersistentObject::store /var/users/adam/BioPerl/bioperl-db/lib//Bio/DB/Persistent/PersistentObject.pm:284 STACK Bio::DB::BioSQL::SeqAdaptor::store_children /var/users/adam/BioPerl/bioperl-db/lib//Bio/DB/BioSQL/SeqAdaptor.pm:257 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /var/users/adam/BioPerl/bioperl-db/lib//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /var/users/adam/BioPerl/bioperl-db/lib//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK Bio::DB::Persistent::PersistentObject::store /var/users/adam/BioPerl/bioperl-db/lib//Bio/DB/Persistent/PersistentObject.pm:284STACK (eval) /var/users/adam/BioPerl/bioperl-db/scripts/biosql/load_seqdatabase.pl:630 STACK toplevel /var/users/adam/BioPerl/bioperl-db/scripts/biosql/load_seqdatabase.pl:612 As you can see it generates an INSERT against seqfeature_qualifier_value without including a 'value' field, which is of course defined as NOT NULL. Firstly, is this the best way to achieve this? And secondly, where is the INSERT statement put together, I can't seem to find it in the object hierarchy Thanks adam From tristan.lefebure at gmail.com Tue Jul 26 11:14:10 2011 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Tue, 26 Jul 2011 17:14:10 +0200 Subject: [Bioperl-l] Bio::Tools::Run::Phylo::Phyml, tree_string In-Reply-To: <201107261647.43652.tristan.lefebure@gmail.com> References: <201107261647.43652.tristan.lefebure@gmail.com> Message-ID: Ouups, I found a typo in my post, it should read: I am not quite sure I understand why tree_string() from Bio::Tools::Run::Phylo::Phyml returns a string that looks like that (I removed the end of the tree): BIONJ(((((((('92':0.0114354726,'12':0.0472591023)0.0000000000:0.0000005859,... On Tue, Jul 26, 2011 at 4:47 PM, Tristan Lefebure wrote: > Hi there, > I am not quite sure I understand why tree_string() from Bio::Tools::Run::Phylo::Phyml returns > a string that looks like that (I removed the end of the tree): > > Tree is BIONJ(((((((('92':0.0114354726,'12':0.0472591023)0.0000000000:0.0000005859,... > > Why do we have this 'Tree is BIONJ' thing? > > A quick look at the code in the _run() function gives : > > ? ? ? ?{ > ? ? ? ?open(my $FH_TREE, "<", $tree_file) > ? ? ? ? ? ?|| $self->throw("Phyml call ($command) did not give an output: $?"); > ? ? ? ?local $/; > ? ? ? ?$self->{_tree} .= <$FH_TREE>; > ? ?} > > Why appending something to $self->{_tree}? What about? > ? ? ? ?$self->{_tree} = <$FH_TREE>; > > I was about to fill a bug report, but then I saw that in Phyml.t: > > ? ?is substr($factory->tree_string, 0, 9), 'BIONJ(SIN', 'tree_string()'; > > Well, I am lost. Any help much appreciated... > > -- > Tristan > From cjfields at illinois.edu Tue Jul 26 13:35:11 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 26 Jul 2011 12:35:11 -0500 Subject: [Bioperl-l] Fwd: [blast-announce] New SOAP based BLAST service References: Message-ID: <57EF46CA-06D4-40FB-A3AF-B2DC41FA3F80@illinois.edu> FYI, if anyone is interested in implementing something :) chris Begin forwarded message: > From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" > Subject: [blast-announce] New SOAP based BLAST service > Date: July 26, 2011 11:12:59 AM CDT > To: NLM/NCBI List blast-announce > > A SOAP based BLAST service is available. This service makes use of the Simple Object Access Protocol to submit and retrieve searches with the NCBI BLAST web server. The service can also query the server for other information. A simple ("Lite") interface is available that should be suitable for most projects. Documentation and links to the WSDL and sample clients are http://www.ncbi.nlm.nih.gov/books/NBK55699/ > > From cjfields at illinois.edu Tue Jul 26 15:43:15 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 26 Jul 2011 14:43:15 -0500 Subject: [Bioperl-l] Bio::Tools::Run::Phylo::Phyml, tree_string In-Reply-To: References: <201107261647.43652.tristan.lefebure@gmail.com> Message-ID: That's an odd one. Could you file this on redmine? chris On Jul 26, 2011, at 10:14 AM, Tristan Lefebure wrote: > Ouups, I found a typo in my post, it should read: > > I am not quite sure I understand why tree_string() from > Bio::Tools::Run::Phylo::Phyml returns > a string that looks like that (I removed the end of the tree): > > BIONJ(((((((('92':0.0114354726,'12':0.0472591023)0.0000000000:0.0000005859,... > > On Tue, Jul 26, 2011 at 4:47 PM, Tristan Lefebure > wrote: >> Hi there, >> I am not quite sure I understand why tree_string() from Bio::Tools::Run::Phylo::Phyml returns >> a string that looks like that (I removed the end of the tree): >> >> Tree is BIONJ(((((((('92':0.0114354726,'12':0.0472591023)0.0000000000:0.0000005859,... >> >> Why do we have this 'Tree is BIONJ' thing? >> >> A quick look at the code in the _run() function gives : >> >> { >> open(my $FH_TREE, "<", $tree_file) >> || $self->throw("Phyml call ($command) did not give an output: $?"); >> local $/; >> $self->{_tree} .= <$FH_TREE>; >> } >> >> Why appending something to $self->{_tree}? What about? >> $self->{_tree} = <$FH_TREE>; >> >> I was about to fill a bug report, but then I saw that in Phyml.t: >> >> is substr($factory->tree_string, 0, 9), 'BIONJ(SIN', 'tree_string()'; >> >> Well, I am lost. Any help much appreciated... >> >> -- >> Tristan >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shachigahoimbi at gmail.com Tue Jul 26 23:58:54 2011 From: shachigahoimbi at gmail.com (Shachi Gahoi) Date: Wed, 27 Jul 2011 09:28:54 +0530 Subject: [Bioperl-l] problem in using protparam.pm module In-Reply-To: References: Message-ID: Dear All, i am using protparam.pm module. but when i am running this script it is printing one error message "Can't call method "throw" without a package or object reference at /usr/share/perl5/Bio/Root/Root.pm line 368, line 1." Kindly help me to solve this problem. Script is here---- ################################################################################### #!/usr/bin/perl use warnings; use Bio::SeqIO; use Bio::Tools::Protparam; $seqfile='test1.fasta'; $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta'); while( $seq = $seqio->next_seq() ) { my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq); print "ID : ", $seq->display_id,"\n", "Amino acid number : ",$pp->amino_acid_number(),"\n", "Number of negative amino acids : ",$pp->num_neg(),"\n", "Number of positive amino acids : ",$pp->num_pos(),"\n", "Molecular weight : ",$pp->molecular_weight(),"\n", "Theoretical pI : ",$pp->theoretical_pI(),"\n", "Total number of atoms : ", $pp->total_atoms(),"\n", "Number of carbon atoms : ",$pp->num_carbon(),"\n", "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n", "Number of nitrogen atoms : ",$pp->num_nitro(),"\n", "Number of oxygen atoms : ",$pp->num_oxygen(),"\n", "Number of sulphur atoms : ",$pp->num_sulphur(),"\n", "Half life : ", $pp->half_life(),"\n", "Instability Index : ", $pp->instability_index(),"\n", "Stability class : ", $pp->stability(),"\n", "Aliphatic_index : ",$pp->aliphatic_index(),"\n", "Gravy : ", $pp->gravy(),"\n", "Composition of A : ", $pp->AA_comp('A'),"\n", "Composition of R : ", $pp->AA_comp('R'),"\n", "Composition of N : ", $pp->AA_comp('N'),"\n", "Composition of D : ", $pp->AA_comp('D'),"\n", "Composition of C : ", $pp->AA_comp('C'),"\n", "Composition of Q : ", $pp->AA_comp('Q'),"\n", "Composition of E : ", $pp->AA_comp('E'),"\n", "Composition of G : ", $pp->AA_comp('G'),"\n", "Composition of H : ", $pp->AA_comp('H'),"\n", "Composition of I : ", $pp->AA_comp('I'),"\n", "Composition of L : ", $pp->AA_comp('L'),"\n", "Composition of K : ", $pp->AA_comp('K'),"\n", "Composition of M : ", $pp->AA_comp('M'),"\n", "Composition of F : ", $pp->AA_comp('F'),"\n", "Composition of P : ", $pp->AA_comp('P'),"\n", "Composition of S : ", $pp->AA_comp('S'),"\n", "Composition of T : ", $pp->AA_comp('T'),"\n", "Composition of W : ", $pp->AA_comp('W'),"\n", "Composition of Y : ", $pp->AA_comp('Y'),"\n", "Composition of V : ", $pp->AA_comp('V'),"\n", "Composition of B : ", $pp->AA_comp('B'),"\n", "Composition of Z : ", $pp->AA_comp('Z'),"\n", "Composition of X : ", $pp->AA_comp('X'),"\n"; } ################################################################################### -- Regards, Shachi From cjfields at illinois.edu Wed Jul 27 01:35:54 2011 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 27 Jul 2011 00:35:54 -0500 Subject: [Bioperl-l] problem in using protparam.pm module In-Reply-To: References: Message-ID: <9AA27ADA-FFE1-4735-BDE4-56C9B9A18009@illinois.edu> The web service appears to have changed, but it looks as if no tests have been written up for this module which would have caught this out. We can write some basic tests up to check for simple functionality. chris On Jul 26, 2011, at 10:58 PM, Shachi Gahoi wrote: > Dear All, > > i am using protparam.pm module. but when i am running this script it is > printing one error message > > "Can't call method "throw" without a package or object reference at > /usr/share/perl5/Bio/Root/Root.pm line 368, line 1." > > Kindly help me to solve this problem. > > > Script is here---- > ################################################################################### > #!/usr/bin/perl > > use warnings; > use Bio::SeqIO; > use Bio::Tools::Protparam; > > > $seqfile='test1.fasta'; > > $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta'); > > > while( $seq = $seqio->next_seq() ) > { > > > my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq); > > print > "ID : ", $seq->display_id,"\n", > "Amino acid number : ",$pp->amino_acid_number(),"\n", > "Number of negative amino acids : ",$pp->num_neg(),"\n", > "Number of positive amino acids : ",$pp->num_pos(),"\n", > "Molecular weight : ",$pp->molecular_weight(),"\n", > "Theoretical pI : ",$pp->theoretical_pI(),"\n", > "Total number of atoms : ", $pp->total_atoms(),"\n", > "Number of carbon atoms : ",$pp->num_carbon(),"\n", > "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n", > "Number of nitrogen atoms : ",$pp->num_nitro(),"\n", > "Number of oxygen atoms : ",$pp->num_oxygen(),"\n", > "Number of sulphur atoms : ",$pp->num_sulphur(),"\n", > "Half life : ", $pp->half_life(),"\n", > "Instability Index : ", $pp->instability_index(),"\n", > "Stability class : ", $pp->stability(),"\n", > "Aliphatic_index : ",$pp->aliphatic_index(),"\n", > "Gravy : ", $pp->gravy(),"\n", > "Composition of A : ", $pp->AA_comp('A'),"\n", > "Composition of R : ", $pp->AA_comp('R'),"\n", > "Composition of N : ", $pp->AA_comp('N'),"\n", > "Composition of D : ", $pp->AA_comp('D'),"\n", > "Composition of C : ", $pp->AA_comp('C'),"\n", > "Composition of Q : ", $pp->AA_comp('Q'),"\n", > "Composition of E : ", $pp->AA_comp('E'),"\n", > "Composition of G : ", $pp->AA_comp('G'),"\n", > "Composition of H : ", $pp->AA_comp('H'),"\n", > "Composition of I : ", $pp->AA_comp('I'),"\n", > "Composition of L : ", $pp->AA_comp('L'),"\n", > "Composition of K : ", $pp->AA_comp('K'),"\n", > "Composition of M : ", $pp->AA_comp('M'),"\n", > "Composition of F : ", $pp->AA_comp('F'),"\n", > "Composition of P : ", $pp->AA_comp('P'),"\n", > "Composition of S : ", $pp->AA_comp('S'),"\n", > "Composition of T : ", $pp->AA_comp('T'),"\n", > "Composition of W : ", $pp->AA_comp('W'),"\n", > "Composition of Y : ", $pp->AA_comp('Y'),"\n", > "Composition of V : ", $pp->AA_comp('V'),"\n", > "Composition of B : ", $pp->AA_comp('B'),"\n", > "Composition of Z : ", $pp->AA_comp('Z'),"\n", > "Composition of X : ", $pp->AA_comp('X'),"\n"; > } > ################################################################################### > > > > > -- > Regards, > Shachi > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fs5 at sanger.ac.uk Wed Jul 27 04:21:45 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 27 Jul 2011 09:21:45 +0100 Subject: [Bioperl-l] Fwd: perl In-Reply-To: References: Message-ID: <1311754905.9740.238.camel@deskpro15336.internal.sanger.ac.uk> has somebody pointed you to the BioPerl HOWTOs already? All you need is in there and illustrated in example scripts. http://www.bioperl.org/wiki/HOWTOs In particular, check out the Beginner HOWTO and the SeqIO Frank On Tue, 2011-07-26 at 13:46 +0100, Carn? Draug wrote: > Please keep it the conversation on the mailing list. Also, I won't be > doing your homework. There's plenty of documentation for perl and > bioperl. > > Carn? > > > ---------- Forwarded message ---------- > From: Akash > Date: 26 July 2011 13:28 > Subject: perl > To: carandraug+dev at gmail.com > > > Sir > > I am very thankful to you for your suggestion and I tried to work on > the Bio-Perl but as it is very new for me, I am unable to understand > in a very short period of time because the work I am doing, I have to > show in next couple of days. > > So Please can you help in this? I want to remove the "contgis" from a > "fasta" file which contain more than "200 base pairs". > > With Regards > Akash > > I am using this script right now > > use strict; > use Bio::SeqIO; > my $in = Bio::SeqIO->new(-file=>"contigs.fa", '-format'=>'FASTA'); > my $out = Bio::SeqIO->new(-file=>">contigs1.fa", '-format'=>'FASTA'); > > while( my $Seq = $in->next_seq()) > { > > } > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From awitney at sgul.ac.uk Wed Jul 27 07:08:24 2011 From: awitney at sgul.ac.uk (Adam Witney) Date: Wed, 27 Jul 2011 12:08:24 +0100 Subject: [Bioperl-l] Perl In-Reply-To: References: <419C31B6-32F3-4B2B-B59B-5DACD9825EF2@sgul.ac.uk> Message-ID: <5471CF01-6722-4D69-AB05-223F59E89ECD@sgul.ac.uk> [again, please keep the list cc'd] Yes it should be very easy.... the sequence files are just text... however this looks like a homework exercise, so you should try doing it yourself first and then asking for help when your script does not quite work. This really is a basic perl task so give it a go.... people on the list will not write it for you. On 27 Jul 2011, at 12:00, Akash wrote: > Sir, > > The thing which you told me really works, but now my professor wants the script in Perl not using Bio-Perl. So is it possible to do this programming in perl? > > Thanking You > > With Regards > Akash From tristan.lefebure at gmail.com Wed Jul 27 10:12:16 2011 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Wed, 27 Jul 2011 16:12:16 +0200 Subject: [Bioperl-l] Bio::Tools::Run::Phylo::Phyml, tree_string In-Reply-To: References: <201107261647.43652.tristan.lefebure@gmail.com> Message-ID: done: https://redmine.open-bio.org/issues/3273 -- Tristan On Tue, Jul 26, 2011 at 9:43 PM, Chris Fields wrote: > That's an odd one. ?Could you file this on redmine? > > chris > > On Jul 26, 2011, at 10:14 AM, Tristan Lefebure wrote: > >> Ouups, I found a typo in my post, it should read: >> >> I am not quite sure I understand why tree_string() from >> Bio::Tools::Run::Phylo::Phyml returns >> a string that looks like that (I removed the end of the tree): >> >> BIONJ(((((((('92':0.0114354726,'12':0.0472591023)0.0000000000:0.0000005859,... >> >> On Tue, Jul 26, 2011 at 4:47 PM, Tristan Lefebure >> wrote: >>> Hi there, >>> I am not quite sure I understand why tree_string() from Bio::Tools::Run::Phylo::Phyml returns >>> a string that looks like that (I removed the end of the tree): >>> >>> Tree is BIONJ(((((((('92':0.0114354726,'12':0.0472591023)0.0000000000:0.0000005859,... >>> >>> Why do we have this 'Tree is BIONJ' thing? >>> >>> A quick look at the code in the _run() function gives : >>> >>> ? ? ? ?{ >>> ? ? ? ?open(my $FH_TREE, "<", $tree_file) >>> ? ? ? ? ? ?|| $self->throw("Phyml call ($command) did not give an output: $?"); >>> ? ? ? ?local $/; >>> ? ? ? ?$self->{_tree} .= <$FH_TREE>; >>> ? ?} >>> >>> Why appending something to $self->{_tree}? What about? >>> ? ? ? ?$self->{_tree} = <$FH_TREE>; >>> >>> I was about to fill a bug report, but then I saw that in Phyml.t: >>> >>> ? ?is substr($factory->tree_string, 0, 9), 'BIONJ(SIN', 'tree_string()'; >>> >>> Well, I am lost. Any help much appreciated... >>> >>> -- >>> Tristan >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jsteele at caltech.edu Wed Jul 27 11:44:30 2011 From: jsteele at caltech.edu (Joshua Steele) Date: Wed, 27 Jul 2011 08:44:30 -0700 Subject: [Bioperl-l] Error for Bio::Graphics in the HOWTO script for rendering EMBL features Message-ID: Dear All, I'm getting an error (warning) message when I try to run examples 5 and 6 in the Graphics HOWTO. Error: Can't locate object method "has_tag" via package "Bio::Location::Simple" at /opt/local/lib/perl5/site_perl/5.12.3/Bio/Graphics/Glyph.pm line 704, line 323. This error occurs whether I'm using a genbank file or whether I'm using the suggested EMBL file ( http://www.bioperl.org/wiki/Factor7 ). The Bio::Graphics package seems to work fine to render BLAST outputs, I was able to get both the BLAST examples and a BLAST output to work. I'm using (macports)Perl 5.12.3, Bioperl 1.0069, and Bio::Graphics 2.24 on Mac OS 10.6.7. If there is something obvious that I could do to fix it I'd be quite grateful. Cheers, Josh Here is the script as I tried to run it: #!/opt/local/bin/perl # file: embl2picture.pl # This is code example 5 in the Graphics-HOWTO # Author: Lincoln Stein use strict; use lib '$ENV{HOME}/src/bioperl-live'; use Bio::Graphics; use Bio::SeqIO; use Bio::SeqFeature::Generic; my $file = shift or die "provide a sequence file as the argument"; my $io = Bio::SeqIO->new(-file=>$file) or die "couldn't create Bio::SeqIO"; my $seq = $io->next_seq or die "couldn't find a sequence in the file"; my $wholeseq = Bio::SeqFeature::Generic->new( -start => 1, -end => $seq->length, -display_name => $seq->display_name ); my $seqname = $seq->display_name; my @features = $seq->get_SeqFeatures; # partition features by their primary tags my %sorted_features; for my $f (@features) { my $tag = $f->primary_tag; push @{$sorted_features{$tag}},$f; } my $panel = Bio::Graphics::Panel->new( -length => $seq->length, -key_style => 'between', -width => 10000, -pad_left => 50, -pad_right => 50, ); $panel->add_track($wholeseq, -glyph => 'arrow', -bump => 0, -double => 1, -tick => 2); $panel->add_track($wholeseq, -glyph => 'generic', -bgcolor => 'blue', -label => 1, ); # general case my @colors = qw(cyan orange blue purple green chartreuse magenta yellow aqua); my $idx = 0; for my $tag (sort keys %sorted_features) { my $features = $sorted_features{$tag}; $panel->add_track($features, -glyph => 'generic', -bgcolor => $colors[$idx++ % @colors], -fgcolor => 'black', -font2color => 'red', -key => "${tag}s", -bump => +1, -height => 8, -label => 1, -description => 1, ); } ## print PNG open OUT, ">$seqname.png"; print OUT $panel->png; close OUT; exit 0; From nathan.watson-haigh at awri.com.au Tue Jul 26 20:37:45 2011 From: nathan.watson-haigh at awri.com.au (Nath) Date: Tue, 26 Jul 2011 17:37:45 -0700 (PDT) Subject: [Bioperl-l] Bug in Bio::SeqIO::fastq - seq length of 1 with quality of 0 (zero) Message-ID: <4cba983e-cbb7-4b09-b2c7-920553fe109d@r5g2000prf.googlegroups.com> It's been a while since I was involved with Perl/BioPerl..... I installed the latest version of BioPerl (1.6.901) through CPAN and used it to parse some fastq files. I get an exception thrown when a sequence has length = 1 and a quality of 0 (zero). -- start test.pl -- #!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; my $seq_io = Bio::SeqIO->new(-file => 'test.fastq', -format => 'fastq'); while (my $seq = $seq_io->next_seq){ } -- end test.pl -- -- start test.fastq -- @someID1.f G + 0 -- end test.fastq -- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Missing sequence and/or quality data; line: 4 STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/ Root.pm:472 STACK: Bio::SeqIO::fastq::next_dataset /usr/lib/perl5/site_perl/5.8.8/ Bio/SeqIO/fastq.pm:97 STACK: Bio::SeqIO::fastq::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/ SeqIO/fastq.pm:29 STACK: test.pl:7 ----------------------------------------------------------- -- start test.fastq -- @someID1.f G + 0 @someID2 AG + DD -- end test.fastq -- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Quality string [0 at someID2] of length [9] doesn't match length of sequence G [1], line: 6 STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/ Root.pm:472 STACK: Bio::SeqIO::fastq::next_dataset /usr/lib/perl5/site_perl/5.8.8/ Bio/SeqIO/fastq.pm:102 STACK: Bio::SeqIO::fastq::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/ SeqIO/fastq.pm:29 STACK: test.pl:7 ----------------------------------------------------------- Note that the quality string parsed from the file for someID1.f also includes the @someID2 sequence ID line. This may be related to Bug #3068 that was apparently fixed about 1yr ago. From cjfields at illinois.edu Wed Jul 27 16:25:56 2011 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 27 Jul 2011 15:25:56 -0500 Subject: [Bioperl-l] Bug in Bio::SeqIO::fastq - seq length of 1 with quality of 0 (zero) In-Reply-To: <4cba983e-cbb7-4b09-b2c7-920553fe109d@r5g2000prf.googlegroups.com> References: <4cba983e-cbb7-4b09-b2c7-920553fe109d@r5g2000prf.googlegroups.com> Message-ID: Huh, thought that one was fixed a while ago but apparently not completely. Just committed a fix and pushed to github, along with a test. Thanks for pointing this one out! chris On Jul 26, 2011, at 7:37 PM, Nath wrote: > It's been a while since I was involved with Perl/BioPerl..... > > I installed the latest version of BioPerl (1.6.901) through CPAN and > used it to parse some fastq files. I get an exception thrown when a > sequence has length = 1 and a quality of 0 (zero). > > -- start test.pl -- > #!/usr/bin/perl > use strict; > use warnings; > use Bio::SeqIO; > > my $seq_io = Bio::SeqIO->new(-file => 'test.fastq', -format => > 'fastq'); > while (my $seq = $seq_io->next_seq){ } > -- end test.pl -- > > -- start test.fastq -- > @someID1.f > G > + > 0 > -- end test.fastq -- > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Missing sequence and/or quality data; line: 4 > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/ > Root.pm:472 > STACK: Bio::SeqIO::fastq::next_dataset /usr/lib/perl5/site_perl/5.8.8/ > Bio/SeqIO/fastq.pm:97 > STACK: Bio::SeqIO::fastq::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/ > SeqIO/fastq.pm:29 > STACK: test.pl:7 > ----------------------------------------------------------- > > -- start test.fastq -- > @someID1.f > G > + > 0 > @someID2 > AG > + > DD > -- end test.fastq -- > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Quality string [0 at someID2] of length [9] > doesn't match length of sequence G > [1], line: 6 > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/ > Root.pm:472 > STACK: Bio::SeqIO::fastq::next_dataset /usr/lib/perl5/site_perl/5.8.8/ > Bio/SeqIO/fastq.pm:102 > STACK: Bio::SeqIO::fastq::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/ > SeqIO/fastq.pm:29 > STACK: test.pl:7 > ----------------------------------------------------------- > > Note that the quality string parsed from the file for someID1.f also > includes the @someID2 sequence ID line. This may be related to Bug > #3068 that was apparently fixed about 1yr ago. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fs5 at sanger.ac.uk Thu Jul 28 04:34:35 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 28 Jul 2011 09:34:35 +0100 Subject: [Bioperl-l] Perl In-Reply-To: <5471CF01-6722-4D69-AB05-223F59E89ECD@sgul.ac.uk> References: <419C31B6-32F3-4B2B-B59B-5DACD9825EF2@sgul.ac.uk> <5471CF01-6722-4D69-AB05-223F59E89ECD@sgul.ac.uk> Message-ID: <1311842075.9740.269.camel@deskpro15336.internal.sanger.ac.uk> and the answer is: yes, it is possible. What you need is a book such as "Beginning Perl for Bioinformatics" (O'Reilly). Have a go at a script and if it doesn't work, post it in a forum along with yuor specific questions to get help. However, if you want a non-BioPerl solution, the BioPerl mailing list might not be the best place for it (try Perlmonks for example). Frank On Wed, 2011-07-27 at 12:08 +0100, Adam Witney wrote: > [again, please keep the list cc'd] > > Yes it should be very easy.... the sequence files are just text... however this looks like a homework exercise, so you should try doing it yourself first and then asking for help when your script does not quite work. > > This really is a basic perl task so give it a go.... people on the list will not write it for you. > > > On 27 Jul 2011, at 12:00, Akash wrote: > > > Sir, > > > > The thing which you told me really works, but now my professor wants the script in Perl not using Bio-Perl. So is it possible to do this programming in perl? > > > > Thanking You > > > > With Regards > > Akash > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From juettemann at gmail.com Thu Jul 28 05:21:47 2011 From: juettemann at gmail.com (Thomas Juettemann) Date: Thu, 28 Jul 2011 11:21:47 +0200 Subject: [Bioperl-l] Perl In-Reply-To: <1311842075.9740.269.camel@deskpro15336.internal.sanger.ac.uk> References: <419C31B6-32F3-4B2B-B59B-5DACD9825EF2@sgul.ac.uk> <5471CF01-6722-4D69-AB05-223F59E89ECD@sgul.ac.uk> <1311842075.9740.269.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: And you should also keep in mind that if you have a Perl Bioinformatics assignment, your Professor might be on this list as well. Thomas On Thu, Jul 28, 2011 at 10:34, Frank Schwach wrote: > and the answer is: yes, it is possible. What you need is a book such as > "Beginning Perl for Bioinformatics" (O'Reilly). Have a go at a script > and if it doesn't work, post it in a forum along with yuor specific > questions to get help. However, if you want a non-BioPerl solution, the > BioPerl mailing list might not be the best place for it (try Perlmonks > for example). > > Frank > > > > On Wed, 2011-07-27 at 12:08 +0100, Adam Witney wrote: >> [again, please keep the list cc'd] >> >> Yes it should be very easy.... the sequence files are just text... however this looks like a homework exercise, so you should try doing it yourself first and then asking for help when your script does not quite work. >> >> This really is a basic perl task so give it a go.... people on the list will not write it for you. >> >> >> On 27 Jul 2011, at 12:00, Akash wrote: >> >> > Sir, >> > >> > The thing which you told me really works, but now my professor wants the script in Perl not using Bio-Perl. So is it possible to do this programming in perl? >> > >> > Thanking You >> > >> > With Regards >> > Akash >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ?The Wellcome Trust Sanger Institute is operated by Genome Research > ?Limited, a charity registered in England with number 1021457 and a > ?company registered in England with number 2742969, whose registered > ?office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shachigahoimbi at gmail.com Thu Jul 28 06:26:33 2011 From: shachigahoimbi at gmail.com (Shachi Gahoi) Date: Thu, 28 Jul 2011 15:56:33 +0530 Subject: [Bioperl-l] problem in using protparam.pm module In-Reply-To: <9AA27ADA-FFE1-4735-BDE4-56C9B9A18009@illinois.edu> References: <9AA27ADA-FFE1-4735-BDE4-56C9B9A18009@illinois.edu> Message-ID: Please help me how to run protparam using bioperl module On Wed, Jul 27, 2011 at 11:05 AM, Chris Fields wrote: > The web service appears to have changed, but it looks as if no tests have > been written up for this module which would have caught this out. We can > write some basic tests up to check for simple functionality. > > chris > > On Jul 26, 2011, at 10:58 PM, Shachi Gahoi wrote: > > > Dear All, > > > > i am using protparam.pm module. but when i am running this script it is > > printing one error message > > > > "Can't call method "throw" without a package or object reference at > > /usr/share/perl5/Bio/Root/Root.pm line 368, line 1." > > > > Kindly help me to solve this problem. > > > > > > Script is here---- > > > ################################################################################### > > #!/usr/bin/perl > > > > use warnings; > > use Bio::SeqIO; > > use Bio::Tools::Protparam; > > > > > > $seqfile='test1.fasta'; > > > > $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta'); > > > > > > while( $seq = $seqio->next_seq() ) > > { > > > > > > my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq); > > > > print > > "ID : ", $seq->display_id,"\n", > > "Amino acid number : ",$pp->amino_acid_number(),"\n", > > "Number of negative amino acids : ",$pp->num_neg(),"\n", > > "Number of positive amino acids : ",$pp->num_pos(),"\n", > > "Molecular weight : ",$pp->molecular_weight(),"\n", > > "Theoretical pI : ",$pp->theoretical_pI(),"\n", > > "Total number of atoms : ", $pp->total_atoms(),"\n", > > "Number of carbon atoms : ",$pp->num_carbon(),"\n", > > "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n", > > "Number of nitrogen atoms : ",$pp->num_nitro(),"\n", > > "Number of oxygen atoms : ",$pp->num_oxygen(),"\n", > > "Number of sulphur atoms : ",$pp->num_sulphur(),"\n", > > "Half life : ", $pp->half_life(),"\n", > > "Instability Index : ", $pp->instability_index(),"\n", > > "Stability class : ", $pp->stability(),"\n", > > "Aliphatic_index : ",$pp->aliphatic_index(),"\n", > > "Gravy : ", $pp->gravy(),"\n", > > "Composition of A : ", $pp->AA_comp('A'),"\n", > > "Composition of R : ", $pp->AA_comp('R'),"\n", > > "Composition of N : ", $pp->AA_comp('N'),"\n", > > "Composition of D : ", $pp->AA_comp('D'),"\n", > > "Composition of C : ", $pp->AA_comp('C'),"\n", > > "Composition of Q : ", $pp->AA_comp('Q'),"\n", > > "Composition of E : ", $pp->AA_comp('E'),"\n", > > "Composition of G : ", $pp->AA_comp('G'),"\n", > > "Composition of H : ", $pp->AA_comp('H'),"\n", > > "Composition of I : ", $pp->AA_comp('I'),"\n", > > "Composition of L : ", $pp->AA_comp('L'),"\n", > > "Composition of K : ", $pp->AA_comp('K'),"\n", > > "Composition of M : ", $pp->AA_comp('M'),"\n", > > "Composition of F : ", $pp->AA_comp('F'),"\n", > > "Composition of P : ", $pp->AA_comp('P'),"\n", > > "Composition of S : ", $pp->AA_comp('S'),"\n", > > "Composition of T : ", $pp->AA_comp('T'),"\n", > > "Composition of W : ", $pp->AA_comp('W'),"\n", > > "Composition of Y : ", $pp->AA_comp('Y'),"\n", > > "Composition of V : ", $pp->AA_comp('V'),"\n", > > "Composition of B : ", $pp->AA_comp('B'),"\n", > > "Composition of Z : ", $pp->AA_comp('Z'),"\n", > > "Composition of X : ", $pp->AA_comp('X'),"\n"; > > } > > > ################################################################################### > > > > > > > > > > -- > > Regards, > > Shachi > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Regards, Shachi From shachigahoimbi at gmail.com Fri Jul 29 00:46:44 2011 From: shachigahoimbi at gmail.com (Shachi Gahoi) Date: Fri, 29 Jul 2011 10:16:44 +0530 Subject: [Bioperl-l] protaparam Message-ID: Dear All, If anybody know how to rum protparam using bioperl please let me know. Thanks in advance -- Regards, Shachi From cjfields at illinois.edu Fri Jul 29 17:35:01 2011 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 29 Jul 2011 16:35:01 -0500 Subject: [Bioperl-l] Bug in Bio::SeqIO::fastq - seq length of 1 with quality of 0 (zero) In-Reply-To: <4cba983e-cbb7-4b09-b2c7-920553fe109d@r5g2000prf.googlegroups.com> References: <4cba983e-cbb7-4b09-b2c7-920553fe109d@r5g2000prf.googlegroups.com> Message-ID: <7A1056E8-CE1D-47C4-B345-68DD66AEA777@illinois.edu> Nath, Posted a fix for this to bioperl-live (forgot to respond). chris On Jul 26, 2011, at 7:37 PM, Nath wrote: > It's been a while since I was involved with Perl/BioPerl..... > > I installed the latest version of BioPerl (1.6.901) through CPAN and > used it to parse some fastq files. I get an exception thrown when a > sequence has length = 1 and a quality of 0 (zero). > > -- start test.pl -- > #!/usr/bin/perl > use strict; > use warnings; > use Bio::SeqIO; > > my $seq_io = Bio::SeqIO->new(-file => 'test.fastq', -format => > 'fastq'); > while (my $seq = $seq_io->next_seq){ } > -- end test.pl -- > > -- start test.fastq -- > @someID1.f > G > + > 0 > -- end test.fastq -- > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Missing sequence and/or quality data; line: 4 > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/ > Root.pm:472 > STACK: Bio::SeqIO::fastq::next_dataset /usr/lib/perl5/site_perl/5.8.8/ > Bio/SeqIO/fastq.pm:97 > STACK: Bio::SeqIO::fastq::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/ > SeqIO/fastq.pm:29 > STACK: test.pl:7 > ----------------------------------------------------------- > > -- start test.fastq -- > @someID1.f > G > + > 0 > @someID2 > AG > + > DD > -- end test.fastq -- > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Quality string [0 at someID2] of length [9] > doesn't match length of sequence G > [1], line: 6 > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/ > Root.pm:472 > STACK: Bio::SeqIO::fastq::next_dataset /usr/lib/perl5/site_perl/5.8.8/ > Bio/SeqIO/fastq.pm:102 > STACK: Bio::SeqIO::fastq::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/ > SeqIO/fastq.pm:29 > STACK: test.pl:7 > ----------------------------------------------------------- > > Note that the quality string parsed from the file for someID1.f also > includes the @someID2 sequence ID line. This may be related to Bug > #3068 that was apparently fixed about 1yr ago. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From nadine.tatto at boku.ac.at Fri Jul 29 10:35:23 2011 From: nadine.tatto at boku.ac.at (Nadine Elpida Tatto) Date: Fri, 29 Jul 2011 16:35:23 +0200 Subject: [Bioperl-l] Question to Bio::SearchIO::infernal.pm Message-ID: <4E32E14B020000EE00004F57@gwia1.boku.ac.at> Hi There! I was wondering if you would or can help me. I have an infernal report containing about 2000 CMs from an infernal run against Rfam.cm. To parse this report I wanted to use Bio::SearchIO::infernal.pm. Unfortunately this turned out to be a problem for me, because "$parser->next_result" only delivers the result for the first CM in the report and nothing more. My code: #!/usr/bin/perl -w use strict;use Data::Dumper; use Bio::SearchIO; my $infile = $ARGV[0]; # infernal report my $parser = Bio::SearchIO->new(-format => 'Infernal', -file => $infile); while( my $result = $parser->next_result ) { print $result->query_name . "\n"; } exit; The output: ntatto:~$ ./infernalParser.pl infernal.output 5S_rRNA ntatto:~$ I would expect the following (like parsing a blast report): ntatto:~$ ./infernalParser.pl infernal.output 5S_rRNA 5_8S_rRNA U1 ... ntatto:~$ I would be glad for help. Thank you in advance. Best Regards N Tatto