Skip to content

[Model] ConsecutiveSets #421

@isPANN

Description

@isPANN

Motivation

CONSECUTIVE SETS (P166) from Garey & Johnson, A4 SR18. An NP-complete problem from the domain of storage and retrieval. Given a finite alphabet and a collection of subsets, the question is whether there exists a short string over the alphabet such that each subset's elements appear as a consecutive block in the string. This is a generalization of the consecutive ones property from matrices to a string-based formulation and arises in information retrieval and file organization.

Associated rules:

  • R112: Hamiltonian Path -> Consecutive Sets (as target)

Definition

Name: ConsecutiveSets
Canonical name: CONSECUTIVE SETS
Reference: Garey & Johnson, Computers and Intractability, A4 SR18

Mathematical definition:

INSTANCE: Finite alphabet Sigma, collection C = {Sigma_1, Sigma_2, ..., Sigma_n} of subsets of Sigma, and a positive integer K.
QUESTION: Is there a string w in Sigma* with |w| <= K such that, for each i, the elements of Sigma_i occur in a consecutive block of |Sigma_i| symbols of w?

Variables

  • Count: bound_k position variables, one per position in the string w.
  • Per-variable domain: Each position takes a value in {0, ..., alphabet_size} — values 0..alphabet_size-1 represent symbols, value alphabet_size represents "unused" (for strings shorter than bound_k).
  • dims(): vec![alphabet_size + 1; bound_k]
  • evaluate(): Interpret the configuration as a string w (strip trailing "unused" symbols). Return true iff for every subset Sigma_i in C, there exists a contiguous substring of w of length |Sigma_i| that contains exactly the elements of Sigma_i.

Schema (data type)

Type name: ConsecutiveSets
Variants: None

Field Type Description
alphabet_size usize Size of the alphabet Sigma (elements are {0, ..., alphabet_size - 1})
subsets Vec<Vec<usize>> The collection C of subsets of Sigma
bound_k usize The positive integer K (max string length)

Size fields (getter methods for overhead expressions and declare_variants!):

  • alphabet_size() — returns alphabet_size
  • num_subsets() — returns subsets.len() (n)
  • bound_k() — returns K

Notes:

  • This is a satisfaction (decision) problem: Metric = bool, implementing SatisfactionProblem.
  • When K = |Sigma| (number of distinct symbols), the problem is equivalent to testing a matrix for the consecutive ones property, which is polynomial-time solvable.
  • The circular variant (blocks may wrap around from end to beginning of ww) is also NP-complete [Booth, 1975].

Complexity

  • Best known exact algorithm: O(alphabet_size^bound_k * n) brute-force by trying all strings of length bound_k over the alphabet and checking if subsets form consecutive blocks. When bound_k = alphabet_size, this reduces to O(alphabet_size! * n) by considering only permutations.
  • NP-completeness: NP-complete [Kou, 1977]. Transformation from HAMILTONIAN PATH.
  • declare_variants! complexity string: "alphabet_size^bound_k * num_subsets"
  • Polynomial special case: If K equals the number of distinct symbols appearing in the subsets, the problem reduces to testing a binary matrix for the consecutive ones property [Booth and Lueker, 1976], solvable in linear time.
  • References:
    • L. T. Kou (1977). "Polynomial complete consecutive information retrieval problems." SIAM Journal on Computing, 6(1):67-75.
    • K. S. Booth (1975). "PQ Tree Algorithms." Ph.D. thesis, University of California, Berkeley.
    • K. S. Booth and G. S. Lueker (1976). "Testing for the consecutive ones property, interval graphs, and graph planarity using PQ-tree algorithms." J. Computer and System Sciences, 13:335-379.

Extra Remark

Full book text:

INSTANCE: Finite alphabet Sigma, collection C = {Sigma_1, Sigma_2, ..., Sigma_n} of subsets of Sigma, and a positive integer K.
QUESTION: Is there a string w in Sigma* with |w| <= K such that, for each i, the elements of Sigma_i occur in a consecutive block of |Sigma_i| symbols of W?
Reference: [Kou, 1977]. Transformation from HAMILTONIAN PATH.
Comment: The variant in which we ask only that the elements of each Sigma_i occur in a consecutive block of |Sigma_i| symbols of the string ww (i.e., we allow blocks that circulate from the end of w back to its beginning) is also NP-complete [Booth, 1975]. If K is the number of distinct symbols in the Sigma_i, then these problems are equivalent to determining whether a matrix has the consecutive ones property or the circular ones property and are solvable in polynomial time.

How to solve

  • It can be solved by (existing) bruteforce -- enumerate all strings w of length <= K over Sigma and verify the consecutive block condition for each subset.
  • It can be solved by reducing to integer programming -- assign position variables to symbols and linearize the consecutiveness constraints.
  • Other: Reduction to consecutive ones property testing for the special case K = |Sigma|; constraint programming for general instances.

Example Instance

Instance 1 (YES instance):
alphabet_size = 6 (symbols: {0, 1, 2, 3, 4, 5})
Subsets: C = [{0, 4}, {2, 4}, {2, 5}, {1, 5}, {1, 3}]
bound_k = 6

The identity string [0, 1, 2, 3, 4, 5] fails: {0, 4} requires 0 and 4 adjacent, but they are 4 apart.

String w = [0, 4, 2, 5, 1, 3] (length 6 = bound_k):

  • {0, 4}: positions 0-1 = [0, 4] -- consecutive. YES.
  • {2, 4}: positions 1-2 = [4, 2] -- consecutive. YES.
  • {2, 5}: positions 2-3 = [2, 5] -- consecutive. YES.
  • {1, 5}: positions 3-4 = [5, 1] -- consecutive. YES.
  • {1, 3}: positions 4-5 = [1, 3] -- consecutive. YES.
    Answer: YES (the overlapping pair constraints force a chain: 0-4-2-5-1-3)

Instance 2 (NO instance):
alphabet_size = 6 (symbols: {0, 1, 2, 3, 4, 5})
Subsets: C = [{0, 2, 4}, {1, 3, 5}, {0, 1}, {2, 3}, {4, 5}]
bound_k = 6

For any string w of length 6 that is a permutation of {0, 1, 2, 3, 4, 5}:

  • {0, 2, 4} must be consecutive: symbols 0, 2, 4 must appear in 3 adjacent positions.
  • {1, 3, 5} must be consecutive: symbols 1, 3, 5 must appear in 3 adjacent positions.
  • These two blocks of 3 must occupy positions [0-2] and [3-5] (in some order).
  • {0, 1} requires 0 and 1 adjacent, but 0 is in one block and 1 is in the other. They can only be adjacent at the boundary (positions 2 and 3).
  • Similarly {2, 3} and {4, 5} each require cross-block adjacency at the boundary.
  • Only one pair can be at the boundary, so at most one of {0,1}, {2,3}, {4,5} can be satisfied.
    Answer: NO

Metadata

Metadata

Assignees

No one assigned

    Labels

    GoodAn issue passed all checks.modelA model problem to be implemented.

    Type

    No type

    Projects

    Status

    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions