[Bioperl-l] GEO SOFT Parser?

Gong Wuming gongwuming at hotmail.com
Sun May 30 22:47:26 EDT 2004

Hi Tex. 
I asked the same question here some days before but got no responce. It is 
a bit surprising because I thought it should be relatively common problem.
At first I planned to roll a module for parsing soft format in 
Bio::Expression::MicroarrayIO::, but then I found it is a difficult for me 
because many important base classes in Bioperl-Microarray were not 
implemented yet especially on the feature of expression data. So, I wrote a 
simple perl script for reading information in soft file into a data 
strucuture. below is the code. 

#! /usr/bin/perl
use strict;
use warnings;
my $hash = {};
my $DATA = ();
my ($last_domain, $this_domain, $last_mark, $this_mark);

# Reading file line by line.
while (<>){

  $this_mark = substr($_, 0, 1); # Get line marker: '^', '!' or '#'

  if ($this_mark =~ /\^|\!/){ # If the line is headed by '^' or '!'.
    my @attr;

    # Extract the key-value pair ("key = value")
    my ($key, $value) = split (/\s+=\s+/, substr($_, 1));
    ($this_domain, @attr) = split ("_", $key);
    my $attribute = join ('_', @attr) || 'id';

    if ($this_mark eq '^' and $last_domain) {
      my %attribute = %$hash;
      push (@{$DATA->{$last_domain}}, \%attribute);
      $hash = {};
    $hash->{$attribute} = $value;
  }elsif ($this_mark eq '#'){
    my ($field, $desc) = /^#(.+?)\s+=\s+(.+)$/;
    my ($description, $src) = (split (/;*\s+.+?:\s+/, $desc))[1, 2];
    push (@{$DATA->{'data'}}, {'field'=>$field, 
'description'=>$description, 'src'=>$src, 'value'=>[]});
  }else{ # Data field.
    next if /^ID_REF/;
    my $i = 0;
    map {push (@{$DATA->{'data'}->[$i++]->{'value'}}, $_)} split (/\t/);
  $last_domain = $this_domain;
  $last_mark = $this_mark;
The results were stored in such a data structrure:

      field => 
Wuming Gong
College of Life Science, 
Wuhan University, China.

Ãâ·ÑÏÂÔØ MSN Explorer:   http://explorer.msn.com/lccn/  

More information about the Bioperl-l mailing list