Longest Common Subsequence: Difference between revisions

From Rice Wiki
No edit summary
No edit summary
Line 11: Line 11:
<math>X</math> and <math>Y</math>.
<math>X</math> and <math>Y</math>.


==== Property 1 ====
If <math>x_m = y_n</math>, then <math>z_k = x_m = y_n</math> and
If <math>x_m = y_n</math>, then <math>z_k = x_m = y_n</math> and
<math>Z_{k-1}</math> is an LCS of <math>X_{m-1}</math> and
<math>Z_{k-1}</math> is an LCS of <math>X_{m-1}</math> and
<math>Y_{n-1}</math>.
<math>Y_{n-1}</math>.


==== Property 2 ====
If <math>x_m \neq y_n</math> and <math>Z_k \neq x_m</math>, then
If <math>x_m \neq y_n</math> and <math>Z_k \neq x_m</math>, then
<math>Z</math> is an LCS of <math>X_{m-1}</math> and
<math>Z</math> is an LCS of <math>X_{m-1}</math> and
<math>Y</math>.
<math>Y</math>.


==== Property 3 ====
If <math>x_m \neq y_n</math> and <math>Z_k \neq y_n</math>, then
If <math>x_m \neq y_n</math> and <math>Z_k \neq y_n</math>, then
<math>Z</math> is an LCS of <math>X</math> and
<math>Z</math> is an LCS of <math>X</math> and
<math>Y_{n-1}</math>.
<math>Y_{n-1}</math>.


==== Subproblems ====
=== Subproblems ===
 
It is clear from the above that there are three subproblems to examine:
# <math>LCS(X_{m-1}, Y_{n-1})</math>
# <math>LCS(X_{m}, Y_{n-1})</math>
# <math>LCS(X_{m-1}, Y_{n})</math>


Let us have <math>c[i,j]</math> be the length of the LCS for
Let us have <math>c[i,j]</math> be the length of the LCS for
<math>X_i</math> and <math>Y_j</math>. We have
<math>X_i</math> and <math>Y_j</math>. We have
==== Property 4 ====


<math>
<math>
c[i,j] = \begin{cases}
c[i,j] = \begin{cases}
     0 & i = 0 and j = 0 \\
     0 & i = 0 or j = 0 \\
     c[i-1, j-1] + 1 & x_i = y_i \\
     c[i-1, j-1] + 1 & x_i = y_i \\
     max(c[i-1, j], c[i - 1, j]) & x_i \neq y_i
     max(c[i-1, j], c[i - 1, j]) & x_i \neq y_i
\end{cases}
\end{cases}
</math>
</math>
The first line is trivial. The second line stems from property 1 and
makes enough sense to me.
The third line stems from property 3. This property tells us that
depending on the last <math>Z_k</math>, <math>Z</math> must either be
the LCS of <math>X_{m-1}</math> and <math>Y</math> or the LCS of
<math>X</math> and <math>Y_{n-1}</math>. We don't know which one it is,
so we simply compare them. Whichever is greater would be the LCS.
= Implementation =
Based on property 4, we have the following DP algorithm
<pre>
LCS-Length(X, Y):
    m = X.length
    n = Y.length
    let c[0...m, 0...n] be a new table  // cache
    // Initialize c (property 4.1)
    for i = 0 to n:
        c[0, i] = 0
    for i = 0 to m:
        c[i, 0] = 0
    let traceback[0...m, 0...n] be a new table
    // Construct c top -> down, left -> right
    for row = 1 to n:
        for col = 1 to m:
            if (X[col] == Y[row]):
                // Property 4.2
                c[col, row] = c[col - 1, row - 1] + 1
                traceback[col, row] = (col - 1, row - 1)
                continue
            else:
                // Property 4.3
                left = c[col - 1, row]
                top = c[col, row - 1]
                if (left > top):
                    c[col, row] = c[col - 1, row]
                    traceback[col, row] = (col - 1, row)
                else:
                    c[col, row] = c[col, row - 1]
                    traceback[col, row] = (col, row - 1)
    return (c, traceback)
</pre>

Revision as of 03:53, 4 March 2024

Problem description

A sequence Z is a subsequence of another sequence X if all the elements in Z also appear in order in X. Notably, the elements do not need to be consecutive.

In the longest-common-subsequence problem (LCS), we wish to find a maximum-length common subsequence of X and Y.

Approach: dynamic programming

Consider and , and let be an LCS of and .

Property 1

If , then and is an LCS of and .

Property 2

If and , then is an LCS of and .

Property 3

If and , then is an LCS of and .

Subproblems

Let us have be the length of the LCS for and . We have

Property 4

The first line is trivial. The second line stems from property 1 and makes enough sense to me.

The third line stems from property 3. This property tells us that depending on the last , must either be the LCS of and or the LCS of and . We don't know which one it is, so we simply compare them. Whichever is greater would be the LCS.


Implementation

Based on property 4, we have the following DP algorithm

LCS-Length(X, Y):
    m = X.length
    n = Y.length
    let c[0...m, 0...n] be a new table  // cache
    // Initialize c (property 4.1)
    for i = 0 to n:
        c[0, i] = 0
    for i = 0 to m:
        c[i, 0] = 0

    let traceback[0...m, 0...n] be a new table

    // Construct c top -> down, left -> right
    for row = 1 to n:
        for col = 1 to m:
            if (X[col] == Y[row]):
                // Property 4.2
                c[col, row] = c[col - 1, row - 1] + 1
                traceback[col, row] = (col - 1, row - 1)
                continue
            else:
                // Property 4.3
                left = c[col - 1, row]
                top = c[col, row - 1]
                if (left > top):
                    c[col, row] = c[col - 1, row]
                    traceback[col, row] = (col - 1, row)
                else:
                    c[col, row] = c[col, row - 1]
                    traceback[col, row] = (col, row - 1)

    return (c, traceback)