OPTIMAL ALIGNMENT AND LCS
In computational genetics and computational linguistics, one
of the basic problems is to find an optimal alignment
between two given sequences (texts), X
and Y
. This requires a scoring system
which can rank the alignments. Typically a substitution matrix
gives the score for each possible pair of letters.
The total score of an alignment is the sum of terms for each
aligned pair of residues, plus terms for each gap.
Sometimes we consider the special case where
the substitution matrix is equal to the identity matrix
and the gap-penalty is zero. In this case, the optimal score is equal
to the length of the Longest Common Subsequence (LCS) of the two texts
X
and Y
. (A common subsequence
of X
and Y
is a sequence which
is a subsequence of X
as well as of Y). We frequently refer to the length of the LCS of X
and Y
as the optimal LCS-score.
Fluctuations
A long open standing
problem in the LCS context is to determine the exact order of the fluctuation
of the LCS length. For this we take the text X
and Y
to be i.i.d. and independent of each other. We solved
the long open problem of the order of the fluctuation of the LCS and the OA
in several important cases. Here are some of our results:
Expected value
One of the main ideas which allowed all the subsequent
developements is contained in:
Large deviation approach to LCS.
The idea
is to identify each alignment
with an increasing sequence of
stopping times. Then, one tries to figure out a low-entropy class
of stopping times which is likely to contain the OA.
The rest is large deviation in order to control what happens along the
alignments from the class. With this approach we beat the existing bounds
for the LCS-constant. The same technique is applied to the
LCS curve in: The expected curve.
Structure of the optimal alignment path
We developed a set of techniques which
yield results on the microscopical
and macroscopical structure of the optimal path,
as well as the transversal and longitudinal fluctuations.
Furthermore, we
discovered deep relations between these
entities. Some of these results are contained in: