Longest Common Subsequence
Problem description
A sequence Z is a subsequence of another sequence X if all the elements in Z also appear in order in X. Notably, the elements do not need to be consecutive.
In the longest-common-subsequence problem (LCS), we wish to find a maximum-length common subsequence of X and Y.
Overview
- Runtime complexity:
- Space complexity:
- Approach: Dynamic programming
Approach: dynamic programming
Consider and , and let be an LCS of and .
Property 1
If , then and is an LCS of and .
Property 2
If and , then is an LCS of and .
Property 3
If and , then is an LCS of and .
Subproblems
Let us have be the length of the LCS for and . We have
Property 4
The first line is trivial. The second line stems from property 1 and makes enough sense to me.
The third line stems from property 3. This property tells us that depending on the last , must either be the LCS of and or the LCS of and . We don't know which one it is, so we simply compare them. Whichever is greater would be the LCS.
Implementation
Based on property 4, we have the following DP algorithm
LCS-Length(X, Y): m = X.length n = Y.length let c[0...m, 0...n] be a new table // cache // Initialize c (property 4.1) for i = 0 to n: c[0, i] = 0 for i = 0 to m: c[i, 0] = 0 let traceback[0...m, 0...n] be a new table // Construct c top -> down, left -> right for row = 1 to n: for col = 1 to m: if (X[col] == Y[row]): // Property 4.2 c[col, row] = c[col - 1, row - 1] + 1 traceback[col, row] = (col - 1, row - 1) continue else: // Property 4.3 left = c[col - 1, row] top = c[col, row - 1] if (left > top): c[col, row] = c[col - 1, row] traceback[col, row] = (col - 1, row) else: c[col, row] = c[col, row - 1] traceback[col, row] = (col, row - 1) return (c, traceback)
Printing shouldn't be that hard. You just trace back and record whenever they are the same value. I think.
Optimizations
We can reduce the asymptotic space requirements since it only needs one row and column at a time (although stopping us from reconstructing the actual subsequence).
We can also remove the entire traceback array and just compare which value is used for the calculation. This halves the space requirement but doesn't change anything asymptotically.