
Search
|
| |
Problem Description
James Watson and Francis Crick teamed up at Cambridge in 1951 in a
collaboration that led to the unlocking the secret of life, the famous double
helix that explains how DNA copies itself to make proteins. All proteins are
made from the twenty amino acids (alanine, arginine, tyrosine, valine, etc.).
There are 4 building block bases (adenine, cytosine, guanine and, thymidine)
which define the amino acids and are denoted by the letters A, C, G, and T.
The actual mechanism by which these building blocks make amino acids is
more complex than one of Cricks brilliant original theories, which has been
called the "greatest wrong theory ever devised".
Assume, Crick thought, that amino acids are defined by 3 letter words. (Two
letter words could only produce 16 amino acids, not enough. 3 letter words
should produce be 64 amino acids - too many.) But if we further assume that
these words are repeated without separators, how could nature avoid misreads
caused by starting at the wrong location? I.E. word ACGACGACG... could be
misread as CGACGACGA... or GACGACGAC... if the initial letters were skipped
or missing. It turns out that if we eliminate words that could be misread in this way,
there are exactly 20 remaining!
This program illustrates this theory by eliminating "shifted" words from the set of 64 and
displays the resulting set of 20 words.
Mother Nature's actual method turned out to be messier, but has the
advantage of
redundancy. All 64 words are used with multiple words mapping to the each
amino acid, and reserving special "start' and "stop" words to solve the
misreading problem.
Background & Techniques
The solution is implemented using a 3 dimensional boolean array, each
dimension representing a letter position in the 3 -letter words and each of the
4 elements representing the a possible letter choice. We'll
initialize all 64 elements to "true", then make a pass though
and set to "false" each position that would represent a shifted
version of a "true" word. That is, if candidate[i,j,k] is
true then candidate[j,k.i] and candidate[k,i,j] must both be eliminated.
Here's the essential Delphi code that builds and
displays the list:
--------------------------------------------------------------------------------------------------------------
const
{other letter arrangements here will build different lists}
letters:array[1..4] of char='ACGT';
procedure TForm1.ShowmeBtnClick(Sender: TObject);
var
i,j,k,n:integer;
candidates:array[1..4,1..4,1..4] of boolean;
begin
{Initialize}
listbox1.clear;
for i:= 1 to 4 do for j:= 1 to 4 do for k:=1 to 4 do
candidates[i,j,k]:=true;
{Eliminate shifted versions of words}
for i:= 1 to 4 do for j:= 1 to 4 do for k:=1 to 4 do
if candidates[I,J,K]=true then
begin
candidates[J,K,I]:=false;
candidates[K,I,J]:=false;
end;
{Display output}
n :=0;
for i:= 1 to 4 do for j:= 1 to 4 do for k:=1 to 4 do
if candidates[i,j,k]=true then
begin {build and display the word}
inc(n);
listbox1.items.add(format('%2d. ',[n])+letters[i]+letters[j]+letters[k]);
end;
end;
Running/Exploring the Program
Suggestions for Further Explorations
Decode and verify the remainder of the human
genome ;>)
|