[Bioperl-guts-l] bioperl commit
Brian Osborne
bosborne at pub.open-bio.org
Thu May 20 08:44:20 EDT 2004
bosborne
Thu May 20 08:44:20 EDT 2004
Update of /home/repository/bioperl/bioperl-live/Bio/DB/Flat
In directory pub.open-bio.org:/tmp/cvs-serv4373
Modified Files:
BinarySearch.pm
Log Message:
Using single quotes simplifies use of regex, some other POD edits
bioperl-live/Bio/DB/Flat BinarySearch.pm,1.11,1.12
===================================================================
RCS file: /home/repository/bioperl/bioperl-live/Bio/DB/Flat/BinarySearch.pm,v
retrieving revision 1.11
retrieving revision 1.12
diff -u -r1.11 -r1.12
--- /home/repository/bioperl/bioperl-live/Bio/DB/Flat/BinarySearch.pm 2004/05/20 02:56:47 1.11
+++ /home/repository/bioperl/bioperl-live/Bio/DB/Flat/BinarySearch.pm 2004/05/20 12:44:20 1.12
@@ -35,8 +35,8 @@
Patterns have to be entered to define where the keys are to be indexed
and also where the start of each record. E.g. for fasta
- my $start_pattern = "^>";
- my $primary_pattern = "^>(\\S+)";
+ my $start_pattern = '^>';
+ my $primary_pattern = '^>(\S+)';
So the start of a record is a line starting with a E<gt> and the
primary key is all characters up to the first space after the E<gt>
@@ -60,7 +60,10 @@
The index is now ready to use. For large sequence files the perl way
of indexing takes a *long* time and a *huge* amount of memory. For
-indexing things like dbEST I recommend using the C indexer.
+indexing things like dbEST I recommend using the DB_File indexer, BDB.
+
+The formats currently supported by this module are fasta, Swissprot,
+and EMBL.
=head2 Creating indices with secondary keys
@@ -92,13 +95,13 @@
my %secondary_patterns;
- my $start_pattern = "^ID (\\S+)";
- my $primary_pattern = "^AC (\\S+)\;";
+ my $start_pattern = '^ID (\S+)';
+ my $primary_pattern = '^AC (\S+)\;';
- $secondary_patterns{"ID"} = "^ID (\\S+)";
+ $secondary_patterns{"ID"} = '^ID (\S+)';
my $index = new Bio::DB::Flat::BinarySearch(
- -directory => ".",
+ -directory => $index_directory,
-dbname => "ppp",
-write_flag => 1,
-verbose => 1,
@@ -109,8 +112,8 @@
$index->build_index($seqfile);
-Of course having secondary indices makes indexing slower and more
-of a memory hog.
+Of course having secondary indices makes indexing slower and use more
+memory.
=head2 Index reading
@@ -147,9 +150,12 @@
$index->secondary_namespaces("ID");
-Then the following calls can be used
+Then the following call can be used
my $seq = $index->get_Seq_by_secondary('ID','1433_CAEEL');
+
+These calls are not yet implemented
+
my $fh = $index->get_stream_by_secondary('ID','1433_CAEEL');
my $entry = $index->get_entry_by_secondary('ID','1433_CAEEL');
@@ -237,7 +243,8 @@
Function: create a new Bio::DB::Flat::BinarySearch object
Returns : new Bio::DB::Flat::BinarySearch
Args : -directory Root directory for index files
- -dbname Name of subdirectory containing indices for named database
+ -dbname Name of subdirectory containing indices
+ for named database
-write_flag Allow building index
-primary_pattern Regexp defining the primary id
-secondary_patterns A hash ref containing the secondary
More information about the Bioperl-guts-l
mailing list