[Bioperl-l] Suggestion for a new script: bp_repeat_mask_sequence.pl

Dan Bolser dan.bolser at gmail.com
Tue Jul 6 08:37:05 EDT 2010


I'd like to submit a script, 'bp_repeat_mask_sequence.pl'(?), to the
set of scripts in BioPerl. Below is what I have so far. The script
works by reading in a GFF of 'repeat_region's and a fasta file of
sequences. It outputs a fasta sequence file with the repeats replaced
by Xs.

The script clearly needs to be more configurable, but I thought I'd
send it along now to see if I'm working along the right lines, or if I
should be using a different approach.



#!/usr/bin/perl -w

use strict;
use Getopt::Long;

use Bio::SeqIO;
use Bio::FeatureIO;

## Set options

my $verbose = 0;
my $seq_file;
my $gff_file;

   "verbose" => \$verbose,
   "seq=s" => \$seq_file,
   "gff=s" => \$gff_file,
  or die "failed to parse command line options\n";

die "fail $gff_file : $!\n"
  unless -s $gff_file;

## Set up the BioPerl objects

my $seq_reader =
  Bio::SeqIO->new( -file => $seq_file,
		   -format => 'fasta'

my $seq_writer =
  Bio::SeqIO->new( -fh => \*STDOUT,
		   -format => 'fasta',
		   -width => 80

my $gff_reader =
  Bio::FeatureIO->new( -file => $gff_file,
		       -format => 'GFF',

#warn $seq_reader->width, "\n"; exit;

## Run

my (%repeats, $c);

while ( my $feature = $gff_reader->next_feature() ) {
      join("\t", #$feature,
	  ), "\n";

  if($feature->type->name eq 'repeat_region'){
    push @{$repeats{ $feature->seq_id }},

  # Debugging
  #last if $c > 100;

warn "read $c repeat_region features for ",
  scalar keys(%repeats), " sequences\n";


while(my $seq = $seq_reader->next_seq){
  my $id = $seq->id;
  my $sequence = $seq->seq;

  print $id, "\n"
    if $verbose > 0;

  print length($sequence), "\n"
    if $verbose > 0;

  for my $region (@{$repeats{ $id }}){
    my ($start, $end) = @$region;
    print "$start\t$end\n"
      if $verbose > 1;

    substr($sequence, $start, $end - $start, 'X' x ($end - $start));

  print length($sequence), "\n"
    if $verbose > 0;



  # Debugging;

warn "OK\n";

More information about the Bioperl-l mailing list