Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 60 additions & 13 deletions docs/paper/reductions.typ
Original file line number Diff line number Diff line change
Expand Up @@ -2190,13 +2190,60 @@ NP-completeness was established by Garey, Johnson, and Stockmeyer @gareyJohnsonS
*Example.* Consider host graph $G$ with 7 vertices: a $K_4$ clique on ${0, 1, 2, 3}$ and a triangle on ${4, 5, 6}$ connected via edge $(3, 4)$. Pattern $H = K_4$ with vertices ${a, b, c, d}$. The mapping $f(a) = 0, f(b) = 1, f(c) = 2, f(d) = 3$ preserves all 6 edges of $K_4$, confirming a subgraph isomorphism exists.
]

#problem-def("LongestCommonSubsequence")[
Given $k$ strings $s_1, dots, s_k$ over a finite alphabet $Sigma$, find a longest string $w$ that is a subsequence of every $s_i$. A string $w$ is a _subsequence_ of $s$ if $w$ can be obtained by deleting zero or more characters from $s$ without changing the order of the remaining characters.
][
The LCS problem is polynomial-time solvable for $k = 2$ strings via dynamic programming in $O(n_1 n_2)$ time (Wagner & Fischer, 1974), but NP-hard for $k gt.eq 3$ strings @maier1978. It is a foundational problem in bioinformatics (sequence alignment), version control (diff algorithms), and data compression. The problem is listed as SR10 in Garey & Johnson @garey1979.
#{
let x = load-model-example("LongestCommonSubsequence")
let strings = x.instance.strings
let witness = x.samples.at(0).config
let fmt-str(s) = "\"" + s.map(c => str(c)).join("") + "\""
let string-list = strings.map(fmt-str).join(", ")
let find-embed(target, candidate) = {
let positions = ()
let j = 0
for (i, ch) in target.enumerate() {
if j < candidate.len() and ch == candidate.at(j) {
positions.push(i)
j += 1
}
Comment on lines +2199 to +2206
}
positions
}
let embeds = strings.map(s => find-embed(s, witness))
[
#problem-def("LongestCommonSubsequence")[
Given a finite alphabet $Sigma$, a set $R = {r_1, dots, r_m}$ of strings over $Sigma^*$, and a positive integer $K$, determine whether there exists a string $w in Sigma^*$ with $|w| gt.eq K$ such that every string $r_i in R$ contains $w$ as a _subsequence_: there exist indices $1 lt.eq j_1 < j_2 < dots < j_(|w|) lt.eq |r_i|$ with $r_i[j_t] = w[t]$ for all $t$.
][
A classic NP-complete string problem, listed as problem SR10 in Garey and Johnson @garey1979. #cite(<maier1978>, form: "prose") proved NP-completeness, while Garey and Johnson note polynomial-time cases for fixed $K$ or fixed $|R|$. For the special case of two strings, the classical dynamic-programming algorithm of #cite(<wagnerfischer1974>, form: "prose") runs in $O(|r_1| dot |r_2|)$ time. The decision model implemented in this repository fixes the witness length to exactly $K$; this is equivalent to the standard "$|w| gt.eq K$" formulation because any longer common subsequence has a length-$K$ prefix.

*Example.* Let $s_1 = $ `ABAC` and $s_2 = $ `BACA` over $Sigma = {A, B, C}$. The longest common subsequence has length 3, e.g., `BAC`: positions 1, 2, 3 of $s_1$ match positions 0, 1, 2 of $s_2$.
]
*Example.* Let $Sigma = {0, 1}$ and let the input set $R$ contain the strings #string-list. The witness $w = $ #fmt-str(witness) is a common subsequence of every string in $R$.

#figure({
let blue = graph-colors.at(0)
align(center, stack(dir: ttb, spacing: 0.35cm,
stack(dir: ltr, spacing: 0pt,
box(width: 1.2cm, height: 0.45cm, align(center + horizon, text(8pt, "w ="))),
..witness.enumerate().map(((i, symbol)) => {
box(width: 0.48cm, height: 0.48cm, fill: blue.transparentize(70%), stroke: 0.5pt + luma(120),
align(center + horizon, text(9pt, weight: "bold", str(symbol))))
}),
),
..strings.enumerate().map(((ri, s)) => {
let embed = embeds.at(ri)
stack(dir: ltr, spacing: 0pt,
box(width: 1.2cm, height: 0.45cm, align(center + horizon, text(8pt, "r" + str(ri + 1) + " ="))),
..s.enumerate().map(((i, symbol)) => {
let fill = if embed.contains(i) { blue.transparentize(78%) } else { white }
box(width: 0.48cm, height: 0.48cm, fill: fill, stroke: 0.5pt + luma(120),
align(center + horizon, text(9pt, weight: "bold", str(symbol))))
}),
)
}),
))
})

The highlighted positions show one left-to-right embedding of $w = $ #fmt-str(witness) in each input string, certifying the YES answer for $K = 3$.
]
]
}

#problem-def("SubsetSum")[
Given a finite set $A = {a_0, dots, a_(n-1)}$ with sizes $s(a_i) in ZZ^+$ and a target $B in ZZ^+$, determine whether there exists a subset $A' subset.eq A$ such that $sum_(a in A') s(a) = B$.
Expand Down Expand Up @@ -3585,19 +3632,19 @@ The following reductions to Integer Linear Programming are straightforward formu
]

#reduction-rule("LongestCommonSubsequence", "ILP")[
The match-pair ILP formulation @blum2021 encodes subsequence alignment as a binary optimization. For two strings $s_1$ (length $n_1$) and $s_2$ (length $n_2$), each position pair $(j_1, j_2)$ where $s_1[j_1] = s_2[j_2]$ yields a binary variable. Constraints enforce one-to-one matching and order preservation (no crossings). The objective maximizes the number of matched pairs.
A bounded-witness ILP formulation turns the decision version of LCS into a feasibility problem. Binary variables choose the symbol at each witness position and, for every input string, choose where that witness position is realized. Linear constraints enforce symbol consistency and strictly increasing source positions.
][
_Construction._ Given strings $s_1$ and $s_2$:
_Construction._ Given alphabet $Sigma$, strings $R = {r_1, dots, r_m}$, and bound $K$:

_Variables:_ Binary $m_(j_1, j_2) in {0, 1}$ for each $(j_1, j_2)$ with $s_1[j_1] = s_2[j_2]$. Interpretation: $m_(j_1, j_2) = 1$ iff position $j_1$ of $s_1$ is matched to position $j_2$ of $s_2$.
_Variables:_ Binary $x_(p, a) in {0, 1}$ for witness position $p in {1, dots, K}$ and symbol $a in Sigma$, with $x_(p, a) = 1$ iff the $p$-th witness symbol equals $a$. For every input string $r_i$, witness position $p$, and source index $j in {1, dots, |r_i|}$, binary $y_(i, p, j) = 1$ iff the $p$-th witness symbol is matched to position $j$ of $r_i$.

_Constraints:_ (1) Each position in $s_1$ matched at most once: $sum_(j_2 : (j_1, j_2) in M) m_(j_1, j_2) lt.eq 1$ for all $j_1$. (2) Each position in $s_2$ matched at most once: $sum_(j_1 : (j_1, j_2) in M) m_(j_1, j_2) lt.eq 1$ for all $j_2$. (3) No crossings: for $(j_1, j_2), (j'_1, j'_2) in M$ with $j_1 < j'_1$ and $j_2 > j'_2$: $m_(j_1, j_2) + m_(j'_1, j'_2) lt.eq 1$.
_Constraints:_ (1) Exactly one symbol per witness position: $sum_(a in Sigma) x_(p, a) = 1$ for all $p$. (2) Exactly one matched source position for each $(i, p)$: $sum_(j = 1)^(|r_i|) y_(i, p, j) = 1$. (3) Character consistency: if $r_i[j] = a$, then $y_(i, p, j) lt.eq x_(p, a)$. (4) Strictly increasing matches: for consecutive witness positions $p$ and $p + 1$, forbid $y_(i, p, j') = y_(i, p + 1, j) = 1$ whenever $j' gt.eq j$.

_Objective:_ Maximize $sum_((j_1, j_2) in M) m_(j_1, j_2)$.
_Objective:_ Use the zero objective. The target ILP is feasible iff the source LCS instance is a YES instance.

_Correctness._ ($arrow.r.double$) A common subsequence of length $ell$ defines $ell$ matched pairs that are order-preserving (no crossings) and one-to-one, yielding a feasible ILP solution with objective $ell$. ($arrow.l.double$) An ILP solution with objective $ell$ defines $ell$ matched pairs; constraints (1)--(2) ensure one-to-one matching, and constraint (3) ensures order preservation, so the matched characters form a common subsequence of length $ell$.
_Correctness._ ($arrow.r.double$) If a witness $w = w_1 dots w_K$ is a common subsequence of every string, set $x_(p, w_p) = 1$ and choose, in every $r_i$, the positions where that embedding occurs. Constraints (1)--(4) are satisfied, so the ILP is feasible. ($arrow.l.double$) Any feasible ILP solution selects exactly one symbol for each witness position and exactly one realization in each source string. Character consistency ensures the chosen positions spell the same witness string in every input string, and the ordering constraints ensure those positions are strictly increasing. Therefore the extracted witness is a common subsequence of length $K$.

_Solution extraction._ Collect pairs $(j_1, j_2)$ with $m_(j_1, j_2) = 1$, sort by $j_1$, and read the characters.
_Solution extraction._ For each witness position $p$, read the unique symbol $a$ with $x_(p, a) = 1$ and output the resulting length-$K$ string.
]

== Unit Disk Mapping
Expand Down
11 changes: 11 additions & 0 deletions docs/paper/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -597,6 +597,17 @@ @article{maier1978
doi = {10.1145/322063.322075}
}

@article{wagnerfischer1974,
author = {Robert A. Wagner and Michael J. Fischer},
title = {The String-to-String Correction Problem},
journal = {Journal of the ACM},
volume = {21},
number = {1},
pages = {168--173},
year = {1974},
doi = {10.1145/321796.321811}
}

@article{blum2021,
author = {Christian Blum and Maria J. Blesa and Borja Calvo},
title = {{ILP}-based reduced variable neighborhood search for the longest common subsequence problem},
Expand Down
8 changes: 4 additions & 4 deletions problemreductions-cli/src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -251,7 +251,7 @@ Flags by problem type:
RuralPostman (RPP) --graph, --edge-weights, --required-edges, --bound
MultipleChoiceBranching --arcs [--weights] --partition --bound [--num-vertices]
SubgraphIsomorphism --graph (host), --pattern (pattern)
LCS --strings
LCS --strings, --bound [--alphabet-size]
FAS --arcs [--weights] [--num-vertices]
FVS --arcs [--weights] [--num-vertices]
StrongConnectivityAugmentation --arcs, --candidate-arcs, --bound [--num-vertices]
Expand Down Expand Up @@ -452,13 +452,13 @@ pub struct CreateArgs {
/// Required edge indices for RuralPostman (comma-separated, e.g., "0,2,4")
#[arg(long)]
pub required_edges: Option<String>,
/// Upper bound or length bound (for BoundedComponentSpanningForest, LengthBoundedDisjointPaths, MultipleChoiceBranching, OptimalLinearArrangement, RuralPostman, SCS, or StringToStringCorrection)
/// Upper bound or length bound (for BoundedComponentSpanningForest, LengthBoundedDisjointPaths, LongestCommonSubsequence, MultipleChoiceBranching, OptimalLinearArrangement, RuralPostman, ShortestCommonSupersequence, or StringToStringCorrection)
#[arg(long, allow_hyphen_values = true)]
pub bound: Option<i64>,
/// Pattern graph edge list for SubgraphIsomorphism (e.g., 0-1,1-2,2-0)
#[arg(long)]
pub pattern: Option<String>,
/// Input strings for LCS (e.g., "ABAC;BACA") or SCS (e.g., "0,1,2;1,2,0")
/// Input strings for LCS (e.g., "ABAC;BACA" or "0,1,0;1,0,1") or SCS (e.g., "0,1,2;1,2,0")
#[arg(long)]
pub strings: Option<String>,
/// Directed arcs for directed graph problems (e.g., 0>1,1>2,2>0)
Expand Down Expand Up @@ -497,7 +497,7 @@ pub struct CreateArgs {
/// Number of available workers for StaffScheduling
#[arg(long)]
pub num_workers: Option<u64>,
/// Alphabet size for SCS or StringToStringCorrection (optional; inferred from max symbol + 1 if omitted)
/// Alphabet size for LCS, SCS, or StringToStringCorrection (optional; inferred from the input strings if omitted)
#[arg(long)]
pub alphabet_size: Option<usize>,
/// Functional dependencies for MinimumCardinalityKey (semicolon-separated "lhs>rhs" pairs, e.g., "0,1>2;0,2>3")
Expand Down
114 changes: 105 additions & 9 deletions problemreductions-cli/src/commands/create.rs
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,9 @@ fn example_for(canonical: &str, graph_type: Option<&str>) -> &'static str {
"--universe 4 --r-sets \"0,1,2,3;0,1\" --s-sets \"0,1,2,3;2,3\" --r-weights 2,5 --s-weights 3,6"
}
"SetBasis" => "--universe 4 --sets \"0,1;1,2;0,2;0,1,2\" --k 3",
"LongestCommonSubsequence" => {
"--strings \"010110;100101;001011\" --bound 3 --alphabet-size 2"
}
"MinimumCardinalityKey" => {
"--num-attributes 6 --dependencies \"0,1>2;0,2>3;1,3>4;2,4>5\" --k 2"
}
Expand Down Expand Up @@ -377,6 +380,10 @@ fn help_flag_hint(
) -> &'static str {
match (canonical, field_name) {
("BoundedComponentSpanningForest", "max_weight") => "integer",
("LongestCommonSubsequence", "strings") => {
"raw strings: \"ABAC;BACA\" or symbol lists: \"0,1,0;1,0,1\""
}
("ShortestCommonSupersequence", "strings") => "symbol lists: \"0,1,2;1,2,0\"",
("MultipleChoiceBranching", "partition") => "semicolon-separated groups: \"0,1;2,3\"",
_ => type_format_hint(type_name, graph_type),
}
Expand Down Expand Up @@ -1294,18 +1301,85 @@ pub fn create(args: &CreateArgs, out: &OutputConfig) -> Result<()> {

// LongestCommonSubsequence
"LongestCommonSubsequence" => {
let usage =
"Usage: pred create LCS --strings \"010110;100101;001011\" --bound 3 [--alphabet-size 2]";
let strings_str = args.strings.as_deref().ok_or_else(|| {
anyhow::anyhow!(
"LCS requires --strings\n\n\
Usage: pred create LCS --strings \"ABAC;BACA\""
)
anyhow::anyhow!("LongestCommonSubsequence requires --strings\n\n{usage}")
})?;
let strings: Vec<Vec<u8>> = strings_str
.split(';')
.map(|s| s.trim().as_bytes().to_vec())
.collect();
let bound_i64 = args.bound.ok_or_else(|| {
anyhow::anyhow!("LongestCommonSubsequence requires --bound\n\n{usage}")
})?;
anyhow::ensure!(
bound_i64 >= 0,
"LongestCommonSubsequence requires a nonnegative --bound, got {}",
bound_i64
);
let bound = bound_i64 as usize;

let segments: Vec<&str> = strings_str.split(';').map(str::trim).collect();
let comma_mode = segments.iter().any(|segment| segment.contains(','));

let (strings, inferred_alphabet_size): (Vec<Vec<usize>>, usize) = if comma_mode {
let strings = segments
.iter()
.map(|segment| {
if segment.is_empty() {
return Ok(Vec::new());
}
segment
.split(',')
.map(|value| {
value.trim().parse::<usize>().map_err(|e| {
anyhow::anyhow!("Invalid LCS alphabet index: {}", e)
})
})
.collect::<Result<Vec<_>>>()
})
.collect::<Result<Vec<_>>>()?;
let inferred = strings
.iter()
.flat_map(|string| string.iter())
.copied()
.max()
.map(|value| value + 1)
.unwrap_or(0);
(strings, inferred)
} else {
let mut encoding = BTreeMap::new();
let mut next_symbol = 0usize;
let strings = segments
.iter()
.map(|segment| {
segment
.as_bytes()
.iter()
.map(|byte| {
let entry = encoding.entry(*byte).or_insert_with(|| {
let current = next_symbol;
next_symbol += 1;
current
});
*entry
})
.collect::<Vec<_>>()
})
.collect::<Vec<_>>();
(strings, next_symbol)
};

let alphabet_size = args.alphabet_size.unwrap_or(inferred_alphabet_size);
anyhow::ensure!(
alphabet_size >= inferred_alphabet_size,
"--alphabet-size {} is smaller than the inferred alphabet size ({})",
alphabet_size,
inferred_alphabet_size
);
anyhow::ensure!(
alphabet_size > 0 || (bound == 0 && strings.iter().all(|string| string.is_empty())),
"LongestCommonSubsequence requires a positive alphabet. Provide --alphabet-size when all strings are empty and --bound > 0.\n\n{usage}"
);
(
ser(LongestCommonSubsequence::new(strings))?,
ser(LongestCommonSubsequence::new(alphabet_size, strings, bound))?,
resolved_variant.clone(),
)
}
Expand Down Expand Up @@ -3089,6 +3163,7 @@ fn create_random(
#[cfg(test)]
mod tests {
use super::create;
use super::help_flag_hint;
use super::help_flag_name;
use super::parse_bool_rows;
use super::problem_help_flag_name;
Expand Down Expand Up @@ -3118,6 +3193,19 @@ mod tests {
);
}

#[test]
fn test_problem_help_uses_problem_specific_lcs_strings_hint() {
assert_eq!(
help_flag_hint(
"LongestCommonSubsequence",
"strings",
"Vec<Vec<usize>>",
None,
),
"raw strings: \"ABAC;BACA\" or symbol lists: \"0,1,0;1,0,1\""
);
}

#[test]
fn test_problem_help_uses_string_to_string_correction_cli_flags() {
assert_eq!(
Expand All @@ -3134,6 +3222,14 @@ mod tests {
);
}

#[test]
fn test_problem_help_keeps_generic_vec_vec_usize_hint_for_other_models() {
assert_eq!(
help_flag_hint("SetBasis", "sets", "Vec<Vec<usize>>", None),
"semicolon-separated sets: \"0,1;1,2;0,2\""
);
}

#[test]
fn test_problem_help_uses_k_for_staff_scheduling() {
assert_eq!(
Expand Down
40 changes: 40 additions & 0 deletions problemreductions-cli/tests/cli_tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2837,6 +2837,46 @@ fn test_create_set_basis_no_flags_uses_actual_cli_flag_names() {
);
}

#[test]
fn test_create_lcs_with_raw_strings_infers_alphabet() {
let output = pred()
.args(["create", "LCS", "--strings", "ABAC;BACA", "--bound", "2"])
.output()
.unwrap();
assert!(
output.status.success(),
"stderr: {}",
String::from_utf8_lossy(&output.stderr)
);
let stdout = String::from_utf8(output.stdout).unwrap();
let json: serde_json::Value = serde_json::from_str(&stdout).unwrap();
assert_eq!(json["type"], "LongestCommonSubsequence");
assert_eq!(json["data"]["alphabet_size"], 3);
assert_eq!(json["data"]["bound"], 2);
assert_eq!(
json["data"]["strings"],
serde_json::json!([[0, 1, 0, 2], [1, 0, 2, 0]])
);
}

#[test]
fn test_create_lcs_rejects_empty_strings_with_positive_bound_without_panicking() {
let output = pred()
.args(["create", "LCS", "--strings", "", "--bound", "1"])
.output()
.unwrap();
assert!(!output.status.success());
let stderr = String::from_utf8_lossy(&output.stderr);
assert!(
stderr.contains("Provide --alphabet-size when all strings are empty and --bound > 0"),
"expected user-facing validation error, got: {stderr}"
);
assert!(
!stderr.contains("panicked at"),
"create command should reject invalid LCS input without panicking: {stderr}"
);
}

#[test]
fn test_create_kcoloring_missing_k() {
let output = pred()
Expand Down
Loading
Loading