Longest Common Subsequence

From Rice Wiki

Problem description

A sequence Z is a subsequence of another sequence X if all the elements in Z also appear in order in X. Notably, the elements do not need to be consecutive.

In the longest-common-subsequence problem (LCS), we wish to find a maximum-length common subsequence of X and Y.

Approach: dynamic programming

Consider Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X = \langle x_1, x_2, \ldots, x_m \rangle} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Y = \langle y_1, y_2, \ldots, y_n \rangle } , and let Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Z = \langle z_1, z_2, \ldots, z_k \rangle} be an LCS of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Y} .

Property 1

If Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_m = y_n} , then Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle z_k = x_m = y_n} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Z_{k-1}} is an LCS of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X_{m-1}} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Y_{n-1}} .

Property 2

If and , then is an LCS of and .

Property 3

If and , then is an LCS of and .

Subproblems

Let us have be the length of the LCS for and . We have

Property 4

The first line is trivial. The second line stems from property 1 and makes enough sense to me.

The third line stems from property 3. This property tells us that depending on the last , Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Z} must either be the LCS of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X_{m-1}} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Y} or the LCS of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Y_{n-1}} . We don't know which one it is, so we simply compare them. Whichever is greater would be the LCS.


Implementation

Based on property 4, we have the following DP algorithm

LCS-Length(X, Y):
    m = X.length
    n = Y.length
    let c[0...m, 0...n] be a new table  // cache
    // Initialize c (property 4.1)
    for i = 0 to n:
        c[0, i] = 0
    for i = 0 to m:
        c[i, 0] = 0

    let traceback[0...m, 0...n] be a new table

    // Construct c top -> down, left -> right
    for row = 1 to n:
        for col = 1 to m:
            if (X[col] == Y[row]):
                // Property 4.2
                c[col, row] = c[col - 1, row - 1] + 1
                traceback[col, row] = (col - 1, row - 1)
                continue
            else:
                // Property 4.3
                left = c[col - 1, row]
                top = c[col, row - 1]
                if (left > top):
                    c[col, row] = c[col - 1, row]
                    traceback[col, row] = (col - 1, row)
                else:
                    c[col, row] = c[col, row - 1]
                    traceback[col, row] = (col, row - 1)

    return (c, traceback)