OPTIMAL ALIGNMENT AND LCS

In computational genetics and computational linguistics, one of the basic problems is to find an optimal alignment between two given sequences (texts), X and Y. This requires a scoring system which can rank the alignments. Typically a substitution matrix gives the score for each possible pair of letters. The total score of an alignment is the sum of terms for each aligned pair of residues, plus terms for each gap. Sometimes we consider the special case where the substitution matrix is equal to the identity matrix and the gap-penalty is zero. In this case, the optimal score is equal to the length of the Longest Common Subsequence (LCS) of the two texts X and Y. (A common subsequence of X and Y is a sequence which is a subsequence of X as well as of

Y).  We frequently refer to the length of the LCS of X and Y
as the  optimal LCS-score.
Fluctuations
A long open standing 
problem in the LCS context is to determine the exact order of the fluctuation 
of the LCS length.  For this we take the text X
and Y to be i.i.d. and independent of each other. We solved 
the long open problem of the order of the fluctuation of the LCS and the OA
in several important cases. Here are some of our results:

 One text periodic
 
Asymmetric case 
          of 2- and 3-letter alphabets.
 
One and zero have different probabilities
 
Longest Common Increasing Subsequence

Expected value
One of the main ideas which allowed all the subsequent
developements is contained in: 
Large deviation approach to  LCS.
The idea 
is to identify each alignment
with an increasing sequence of 
stopping times.  Then, one tries to figure out a low-entropy class
of stopping times which is likely to contain the OA.  
The rest is large deviation in order to control what happens along the 
alignments from the class. With this approach we beat the existing bounds
for the LCS-constant. The same technique is applied to the 
LCS curve in: The expected curve.
Structure of the optimal alignment path
We developed a set of techniques which
 yield results on the microscopical 
and macroscopical structure of the optimal path,
as well as the transversal  and longitudinal fluctuations.
Furthermore, we  
discovered deep relations between these
entities. Some of these results are contained in:

 Local uniqueness of alignments
 
Macroscopic non-uniqueness and transversal
fluctuation.