Longest Common Subsequence: Difference between revisions

From Rice Wiki
No edit summary
No edit summary
Line 11: Line 11:
<math>X</math> and <math>Y</math>.
<math>X</math> and <math>Y</math>.


==== Property 1 ====
If <math>x_m = y_n</math>, then <math>z_k = x_m = y_n</math> and
If <math>x_m = y_n</math>, then <math>z_k = x_m = y_n</math> and
<math>Z_{k-1}</math> is an LCS of <math>X_{m-1}</math> and
<math>Z_{k-1}</math> is an LCS of <math>X_{m-1}</math> and
<math>Y_{n-1}</math>.
<math>Y_{n-1}</math>.


==== Property 2 ====
If <math>x_m \neq y_n</math> and <math>Z_k \neq x_m</math>, then
If <math>x_m \neq y_n</math> and <math>Z_k \neq x_m</math>, then
<math>Z</math> is an LCS of <math>X_{m-1}</math> and
<math>Z</math> is an LCS of <math>X_{m-1}</math> and
<math>Y</math>.
<math>Y</math>.


==== Property 3 ====
If <math>x_m \neq y_n</math> and <math>Z_k \neq y_n</math>, then
If <math>x_m \neq y_n</math> and <math>Z_k \neq y_n</math>, then
<math>Z</math> is an LCS of <math>X</math> and
<math>Z</math> is an LCS of <math>X</math> and
<math>Y_{n-1}</math>.
<math>Y_{n-1}</math>.


==== Subproblems ====
=== Subproblems ===
 
It is clear from the above that there are three subproblems to examine:
# <math>LCS(X_{m-1}, Y_{n-1})</math>
# <math>LCS(X_{m}, Y_{n-1})</math>
# <math>LCS(X_{m-1}, Y_{n})</math>


Let us have <math>c[i,j]</math> be the length of the LCS for
Let us have <math>c[i,j]</math> be the length of the LCS for
<math>X_i</math> and <math>Y_j</math>. We have
<math>X_i</math> and <math>Y_j</math>. We have
==== Property 4 ====


<math>
<math>
c[i,j] = \begin{cases}
c[i,j] = \begin{cases}
     0 & i = 0 and j = 0 \\
     0 & i = 0 or j = 0 \\
     c[i-1, j-1] + 1 & x_i = y_i \\
     c[i-1, j-1] + 1 & x_i = y_i \\
     max(c[i-1, j], c[i - 1, j]) & x_i \neq y_i
     max(c[i-1, j], c[i - 1, j]) & x_i \neq y_i
\end{cases}
\end{cases}
</math>
</math>
The first line is trivial. The second line stems from property 1 and
makes enough sense to me.
The third line stems from property 3. This property tells us that
depending on the last <math>Z_k</math>, <math>Z</math> must either be
the LCS of <math>X_{m-1}</math> and <math>Y</math> or the LCS of
<math>X</math> and <math>Y_{n-1}</math>. We don't know which one it is,
so we simply compare them. Whichever is greater would be the LCS.
= Implementation =
Based on property 4, we have the following DP algorithm
<pre>
LCS-Length(X, Y):
    m = X.length
    n = Y.length
    let c[0...m, 0...n] be a new table  // cache
    // Initialize c (property 4.1)
    for i = 0 to n:
        c[0, i] = 0
    for i = 0 to m:
        c[i, 0] = 0
    let traceback[0...m, 0...n] be a new table
    // Construct c top -> down, left -> right
    for row = 1 to n:
        for col = 1 to m:
            if (X[col] == Y[row]):
                // Property 4.2
                c[col, row] = c[col - 1, row - 1] + 1
                traceback[col, row] = (col - 1, row - 1)
                continue
            else:
                // Property 4.3
                left = c[col - 1, row]
                top = c[col, row - 1]
                if (left > top):
                    c[col, row] = c[col - 1, row]
                    traceback[col, row] = (col - 1, row)
                else:
                    c[col, row] = c[col, row - 1]
                    traceback[col, row] = (col, row - 1)
    return (c, traceback)
</pre>

Revision as of 03:53, 4 March 2024

Problem description

A sequence Z is a subsequence of another sequence X if all the elements in Z also appear in order in X. Notably, the elements do not need to be consecutive.

In the longest-common-subsequence problem (LCS), we wish to find a maximum-length common subsequence of X and Y.

Approach: dynamic programming

Consider Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X = \langle x_1, x_2, \ldots, x_m \rangle} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Y = \langle y_1, y_2, \ldots, y_n \rangle } , and let Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Z = \langle z_1, z_2, \ldots, z_k \rangle} be an LCS of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Y} .

Property 1

If Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_m = y_n} , then Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle z_k = x_m = y_n} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Z_{k-1}} is an LCS of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X_{m-1}} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Y_{n-1}} .

Property 2

If Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_m \neq y_n} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Z_k \neq x_m} , then Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Z} is an LCS of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X_{m-1}} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Y} .

Property 3

If Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_m \neq y_n} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Z_k \neq y_n} , then Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Z} is an LCS of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Y_{n-1}} .

Subproblems

Let us have Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle c[i,j]} be the length of the LCS for Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X_i} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Y_j} . We have

Property 4

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle c[i,j] = \begin{cases} 0 & i = 0 or j = 0 \\ c[i-1, j-1] + 1 & x_i = y_i \\ max(c[i-1, j], c[i - 1, j]) & x_i \neq y_i \end{cases} }

The first line is trivial. The second line stems from property 1 and makes enough sense to me.

The third line stems from property 3. This property tells us that depending on the last Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Z_k} , Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Z} must either be the LCS of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X_{m-1}} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Y} or the LCS of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Y_{n-1}} . We don't know which one it is, so we simply compare them. Whichever is greater would be the LCS.


Implementation

Based on property 4, we have the following DP algorithm

LCS-Length(X, Y):
    m = X.length
    n = Y.length
    let c[0...m, 0...n] be a new table  // cache
    // Initialize c (property 4.1)
    for i = 0 to n:
        c[0, i] = 0
    for i = 0 to m:
        c[i, 0] = 0

    let traceback[0...m, 0...n] be a new table

    // Construct c top -> down, left -> right
    for row = 1 to n:
        for col = 1 to m:
            if (X[col] == Y[row]):
                // Property 4.2
                c[col, row] = c[col - 1, row - 1] + 1
                traceback[col, row] = (col - 1, row - 1)
                continue
            else:
                // Property 4.3
                left = c[col - 1, row]
                top = c[col, row - 1]
                if (left > top):
                    c[col, row] = c[col - 1, row]
                    traceback[col, row] = (col - 1, row)
                else:
                    c[col, row] = c[col, row - 1]
                    traceback[col, row] = (col, row - 1)

    return (c, traceback)