-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Motivation
CONSECUTIVE SETS (P166) from Garey & Johnson, A4 SR18. An NP-complete problem from the domain of storage and retrieval. Given a finite alphabet and a collection of subsets, the question is whether there exists a short string over the alphabet such that each subset's elements appear as a consecutive block in the string. This is a generalization of the consecutive ones property from matrices to a string-based formulation and arises in information retrieval and file organization.
Associated rules:
- R112: Hamiltonian Path -> Consecutive Sets (as target)
Definition
Name: ConsecutiveSets
Canonical name: CONSECUTIVE SETS
Reference: Garey & Johnson, Computers and Intractability, A4 SR18
Mathematical definition:
INSTANCE: Finite alphabet Sigma, collection C = {Sigma_1, Sigma_2, ..., Sigma_n} of subsets of Sigma, and a positive integer K.
QUESTION: Is there a string w in Sigma* with |w| <= K such that, for each i, the elements of Sigma_i occur in a consecutive block of |Sigma_i| symbols of w?
Variables
- Count:
bound_kposition variables, one per position in the string w. - Per-variable domain: Each position takes a value in {0, ..., alphabet_size} — values 0..alphabet_size-1 represent symbols, value alphabet_size represents "unused" (for strings shorter than bound_k).
dims():vec![alphabet_size + 1; bound_k]evaluate(): Interpret the configuration as a string w (strip trailing "unused" symbols). Returntrueiff for every subset Sigma_i in C, there exists a contiguous substring of w of length |Sigma_i| that contains exactly the elements of Sigma_i.
Schema (data type)
Type name: ConsecutiveSets
Variants: None
| Field | Type | Description |
|---|---|---|
alphabet_size |
usize |
Size of the alphabet Sigma (elements are {0, ..., alphabet_size - 1}) |
subsets |
Vec<Vec<usize>> |
The collection C of subsets of Sigma |
bound_k |
usize |
The positive integer K (max string length) |
Size fields (getter methods for overhead expressions and declare_variants!):
alphabet_size()— returnsalphabet_sizenum_subsets()— returnssubsets.len()(n)bound_k()— returns K
Notes:
- This is a satisfaction (decision) problem:
Metric = bool, implementingSatisfactionProblem. - When K = |Sigma| (number of distinct symbols), the problem is equivalent to testing a matrix for the consecutive ones property, which is polynomial-time solvable.
- The circular variant (blocks may wrap around from end to beginning of ww) is also NP-complete [Booth, 1975].
Complexity
- Best known exact algorithm: O(alphabet_size^bound_k * n) brute-force by trying all strings of length bound_k over the alphabet and checking if subsets form consecutive blocks. When bound_k = alphabet_size, this reduces to O(alphabet_size! * n) by considering only permutations.
- NP-completeness: NP-complete [Kou, 1977]. Transformation from HAMILTONIAN PATH.
declare_variants!complexity string:"alphabet_size^bound_k * num_subsets"- Polynomial special case: If K equals the number of distinct symbols appearing in the subsets, the problem reduces to testing a binary matrix for the consecutive ones property [Booth and Lueker, 1976], solvable in linear time.
- References:
- L. T. Kou (1977). "Polynomial complete consecutive information retrieval problems." SIAM Journal on Computing, 6(1):67-75.
- K. S. Booth (1975). "PQ Tree Algorithms." Ph.D. thesis, University of California, Berkeley.
- K. S. Booth and G. S. Lueker (1976). "Testing for the consecutive ones property, interval graphs, and graph planarity using PQ-tree algorithms." J. Computer and System Sciences, 13:335-379.
Extra Remark
Full book text:
INSTANCE: Finite alphabet Sigma, collection C = {Sigma_1, Sigma_2, ..., Sigma_n} of subsets of Sigma, and a positive integer K.
QUESTION: Is there a string w in Sigma* with |w| <= K such that, for each i, the elements of Sigma_i occur in a consecutive block of |Sigma_i| symbols of W?
Reference: [Kou, 1977]. Transformation from HAMILTONIAN PATH.
Comment: The variant in which we ask only that the elements of each Sigma_i occur in a consecutive block of |Sigma_i| symbols of the string ww (i.e., we allow blocks that circulate from the end of w back to its beginning) is also NP-complete [Booth, 1975]. If K is the number of distinct symbols in the Sigma_i, then these problems are equivalent to determining whether a matrix has the consecutive ones property or the circular ones property and are solvable in polynomial time.
How to solve
- It can be solved by (existing) bruteforce -- enumerate all strings w of length <= K over Sigma and verify the consecutive block condition for each subset.
- It can be solved by reducing to integer programming -- assign position variables to symbols and linearize the consecutiveness constraints.
- Other: Reduction to consecutive ones property testing for the special case K = |Sigma|; constraint programming for general instances.
Example Instance
Instance 1 (YES instance):
alphabet_size = 6 (symbols: {0, 1, 2, 3, 4, 5})
Subsets: C = [{0, 4}, {2, 4}, {2, 5}, {1, 5}, {1, 3}]
bound_k = 6
The identity string [0, 1, 2, 3, 4, 5] fails: {0, 4} requires 0 and 4 adjacent, but they are 4 apart.
String w = [0, 4, 2, 5, 1, 3] (length 6 = bound_k):
- {0, 4}: positions 0-1 = [0, 4] -- consecutive. YES.
- {2, 4}: positions 1-2 = [4, 2] -- consecutive. YES.
- {2, 5}: positions 2-3 = [2, 5] -- consecutive. YES.
- {1, 5}: positions 3-4 = [5, 1] -- consecutive. YES.
- {1, 3}: positions 4-5 = [1, 3] -- consecutive. YES.
Answer: YES (the overlapping pair constraints force a chain: 0-4-2-5-1-3)
Instance 2 (NO instance):
alphabet_size = 6 (symbols: {0, 1, 2, 3, 4, 5})
Subsets: C = [{0, 2, 4}, {1, 3, 5}, {0, 1}, {2, 3}, {4, 5}]
bound_k = 6
For any string w of length 6 that is a permutation of {0, 1, 2, 3, 4, 5}:
- {0, 2, 4} must be consecutive: symbols 0, 2, 4 must appear in 3 adjacent positions.
- {1, 3, 5} must be consecutive: symbols 1, 3, 5 must appear in 3 adjacent positions.
- These two blocks of 3 must occupy positions [0-2] and [3-5] (in some order).
- {0, 1} requires 0 and 1 adjacent, but 0 is in one block and 1 is in the other. They can only be adjacent at the boundary (positions 2 and 3).
- Similarly {2, 3} and {4, 5} each require cross-block adjacency at the boundary.
- Only one pair can be at the boundary, so at most one of {0,1}, {2,3}, {4,5} can be satisfied.
Answer: NO
Metadata
Metadata
Assignees
Labels
Type
Projects
Status