the question is:
Write a Perl program that scores an alignment using the scoring matrix in the file
"matrix.txt". Assume that the alignment is stored in FASTA format in the file
"dna.fasta". The program should print the alignment score as output.
The file matrix.txt contains this input:
10 -1 2 1
-1 9 1 2
2 1 11 -1
1 2 -1 12
The file dna.fasta contains this input:
>human
ACCGTTAAG
>mouse
–CGTTCA-
(the dashes ‘-’ represents blanks or spaces)
I think what they are basically asking is to I use the following input in matrix.txt:
A C G T
A 10 -1 2 1
C -1 9 1 2
G 2 1 11 -1
T 1 2 -1 12
to score the input in dna.fasta, and then print out the alignment score. Also, they want these done in a subroutine.
I have the main code and the subroutine, but they seem to be combined all wrong and so cant perform the whole task.
@matrix=();
&getmatrix("matrix.txt",\@matrix);
@sequences=(); @names=();
&get_dna_score("dna.fasta", \@sequences, \@names);
$score=$score(\@sequences, \@matrix);
print $score;
@matrix= ( );
&getmatrix ("matrix.txt",\@matrix );
for(my$i=0; $i<4; $i++)
{
for (my$j=0; $j<4; $j++)
{
print $matrix[$i][$j] ." ";
}
print "\n";
}
sub getmatrix
{
open (IN, $_[0]);
my $ref=$_[1];
$i=0;
while(<IN>)
{
@s=split(/\s+/, $_);
for (my$j=0; $j<@s; $j++)
{
$$ref[$i][$j]=$s[$j];
}
$i++;
}
close(IN);
return;
}
use strict;
use warnings;
open(IN, "dna.fasta");
my @file1=<IN>;
chomp($file1[1]);
chomp($file1[3]);
close IN;
sub get_dna_score {
my @first_arr = split //, shift;
my @second_arr = split //, shift;
my $match = shift;
my $miss = shift;
my $gap = shift;
my $score = 0;
my ($i, $len);
if (scalar @first_arr != scalar @second_arr) {
die "Can’t compare strings with different length!\n";
}
for ($i = 0, $len = scalar @first_arr; $i < $len; ++$i) {
if ($first_arr[$i] eq $second_arr[$i]) {
$score += $match;
} elsif ($first_arr[$i] eq " " || $second_arr[$i] eq " ") {
$score += $gap;
} else {
$score += $miss;
}
}
return $score;
}
print get_dna_score($file1[1], $file1[3], 2, -1, -2), "\n";
Please Help!!!













Hey Ben.
Here’s the complete solution. You can find the formatted (and indented) source text at http://pastebin.com/f29757e3c .
#!/usr/bin/perl
use strict;
use warnings;
# config
my $MATRIX_FILE = "matrix.txt";
my $FASTA_FILE = "dna.fasta";
my $GAP_PENALTY = 0;
# read files and compare
my $matrix = parse_matrix($MATRIX_FILE);
my $fasta = parse_fasta($FASTA_FILE);
my $score = get_dna_score($fasta, $matrix);
print "$fasta->{target_name}/$fasta->{test_name} match score is $score.\n";
# reads and parses the matrix file
sub parse_matrix {
open(MATRIX, shift) or die "could not read file: $!\n";
my $hdr = <MATRIX>;
my @bases = ($hdr =~ /(\w)/g);
my %matrix = ();
while (my $line = <MATRIX>) {
my @parts = ($line =~ /(\S+)/g);
my($currbase, $i);
for ($i = 0; $i < @parts; ++$i) {
if (!$i) {
$currbase = $parts[$i];
$matrix{$currbase} = {};
} else {
$matrix{$currbase}->{$bases[$i - 1]} = $parts[$i];
}
}
}
close MATRIX;
return \%matrix;
}
# reads and parses the .fasta file
sub parse_fasta {
my %fasta = ();
open(FASTA, shift) or die "could not read file: $!\n";
chomp($fasta{target_name} = <FASTA>);
chomp($fasta{target_dna} = <FASTA>);
chomp($fasta{test_name} = <FASTA>);
chomp($fasta{test_dna} = <FASTA>);
$fasta{target_name} = substr $fasta{target_name}, 1;
$fasta{test_name} = substr $fasta{test_name}, 1;
close FASTA;
return \%fasta;
}
# compares two DNA base strings
sub get_dna_score {
my($fasta, $matrix) = @_;
my @target_arr = split //, $fasta->{target_dna};
my @test_arr = split //, $fasta->{test_dna};
my $score = 0;
my($i, $len);
if (@target_arr != @test_arr) {
die "Can’t compare DNA strings with different length!\n";
}
for ($i = 0, $len = @target_arr; $i < $len; ++$i) {
if ($target_arr[$i] eq "-" or $test_arr[$i] eq "-") {
$score += $GAP_PENALTY;
} else {
$score += int($matrix->{$target_arr[$i]}->{$test_arr[$i]});
}
}
return $score;
}
Just btw, if this is for a class, be sure to read and *understand* what you submit. Perl’s a great language (still my favorite after many years), and if you take it seriously, it will reward you.
cheers,
Zilk