Longest Common Subsequence: Difference between revisions
No edit summary |
No edit summary |
||
Line 11: | Line 11: | ||
<math>X</math> and <math>Y</math>. | <math>X</math> and <math>Y</math>. | ||
==== Property 1 ==== | |||
If <math>x_m = y_n</math>, then <math>z_k = x_m = y_n</math> and | If <math>x_m = y_n</math>, then <math>z_k = x_m = y_n</math> and | ||
<math>Z_{k-1}</math> is an LCS of <math>X_{m-1}</math> and | <math>Z_{k-1}</math> is an LCS of <math>X_{m-1}</math> and | ||
<math>Y_{n-1}</math>. | <math>Y_{n-1}</math>. | ||
==== Property 2 ==== | |||
If <math>x_m \neq y_n</math> and <math>Z_k \neq x_m</math>, then | If <math>x_m \neq y_n</math> and <math>Z_k \neq x_m</math>, then | ||
<math>Z</math> is an LCS of <math>X_{m-1}</math> and | <math>Z</math> is an LCS of <math>X_{m-1}</math> and | ||
<math>Y</math>. | <math>Y</math>. | ||
==== Property 3 ==== | |||
If <math>x_m \neq y_n</math> and <math>Z_k \neq y_n</math>, then | If <math>x_m \neq y_n</math> and <math>Z_k \neq y_n</math>, then | ||
<math>Z</math> is an LCS of <math>X</math> and | <math>Z</math> is an LCS of <math>X</math> and | ||
<math>Y_{n-1}</math>. | <math>Y_{n-1}</math>. | ||
=== Subproblems === | |||
Let us have <math>c[i,j]</math> be the length of the LCS for | Let us have <math>c[i,j]</math> be the length of the LCS for | ||
<math>X_i</math> and <math>Y_j</math>. We have | <math>X_i</math> and <math>Y_j</math>. We have | ||
==== Property 4 ==== | |||
<math> | <math> | ||
c[i,j] = \begin{cases} | c[i,j] = \begin{cases} | ||
0 & i = 0 | 0 & i = 0 or j = 0 \\ | ||
c[i-1, j-1] + 1 & x_i = y_i \\ | c[i-1, j-1] + 1 & x_i = y_i \\ | ||
max(c[i-1, j], c[i - 1, j]) & x_i \neq y_i | max(c[i-1, j], c[i - 1, j]) & x_i \neq y_i | ||
\end{cases} | \end{cases} | ||
</math> | </math> | ||
The first line is trivial. The second line stems from property 1 and | |||
makes enough sense to me. | |||
The third line stems from property 3. This property tells us that | |||
depending on the last <math>Z_k</math>, <math>Z</math> must either be | |||
the LCS of <math>X_{m-1}</math> and <math>Y</math> or the LCS of | |||
<math>X</math> and <math>Y_{n-1}</math>. We don't know which one it is, | |||
so we simply compare them. Whichever is greater would be the LCS. | |||
= Implementation = | |||
Based on property 4, we have the following DP algorithm | |||
<pre> | |||
LCS-Length(X, Y): | |||
m = X.length | |||
n = Y.length | |||
let c[0...m, 0...n] be a new table // cache | |||
// Initialize c (property 4.1) | |||
for i = 0 to n: | |||
c[0, i] = 0 | |||
for i = 0 to m: | |||
c[i, 0] = 0 | |||
let traceback[0...m, 0...n] be a new table | |||
// Construct c top -> down, left -> right | |||
for row = 1 to n: | |||
for col = 1 to m: | |||
if (X[col] == Y[row]): | |||
// Property 4.2 | |||
c[col, row] = c[col - 1, row - 1] + 1 | |||
traceback[col, row] = (col - 1, row - 1) | |||
continue | |||
else: | |||
// Property 4.3 | |||
left = c[col - 1, row] | |||
top = c[col, row - 1] | |||
if (left > top): | |||
c[col, row] = c[col - 1, row] | |||
traceback[col, row] = (col - 1, row) | |||
else: | |||
c[col, row] = c[col, row - 1] | |||
traceback[col, row] = (col, row - 1) | |||
return (c, traceback) | |||
</pre> |
Revision as of 03:53, 4 March 2024
Problem description
A sequence Z is a subsequence of another sequence X if all the elements in Z also appear in order in X. Notably, the elements do not need to be consecutive.
In the longest-common-subsequence problem (LCS), we wish to find a maximum-length common subsequence of X and Y.
Approach: dynamic programming
Consider and , and let be an LCS of and .
Property 1
If , then and is an LCS of and .
Property 2
If and , then is an LCS of and .
Property 3
If and , then is an LCS of and .
Subproblems
Let us have be the length of the LCS for and . We have
Property 4
The first line is trivial. The second line stems from property 1 and makes enough sense to me.
The third line stems from property 3. This property tells us that depending on the last , must either be the LCS of and or the LCS of and . We don't know which one it is, so we simply compare them. Whichever is greater would be the LCS.
Implementation
Based on property 4, we have the following DP algorithm
LCS-Length(X, Y): m = X.length n = Y.length let c[0...m, 0...n] be a new table // cache // Initialize c (property 4.1) for i = 0 to n: c[0, i] = 0 for i = 0 to m: c[i, 0] = 0 let traceback[0...m, 0...n] be a new table // Construct c top -> down, left -> right for row = 1 to n: for col = 1 to m: if (X[col] == Y[row]): // Property 4.2 c[col, row] = c[col - 1, row - 1] + 1 traceback[col, row] = (col - 1, row - 1) continue else: // Property 4.3 left = c[col - 1, row] top = c[col, row - 1] if (left > top): c[col, row] = c[col - 1, row] traceback[col, row] = (col - 1, row) else: c[col, row] = c[col, row - 1] traceback[col, row] = (col, row - 1) return (c, traceback)