Skip to content

[Model] MultipleCopyFileAllocation #410

@isPANN

Description

@isPANN

Motivation

MULTIPLE COPY FILE ALLOCATION from Garey & Johnson, A4 SR6. An NP-complete problem (in the strong sense) from the Storage and Retrieval category. It models the problem of deciding where to place copies of a file in a computer network to minimize the combined storage and access costs. Each node in the network has a usage frequency and a storage cost; the access cost for a node depends on its shortest-path distance to the nearest file copy. This problem is fundamental to distributed systems, content delivery networks, and facility location theory. NP-complete in the strong sense, even with uniform usage and storage costs. Proved by Van Sickle and Chandy (1977) via reduction from VERTEX COVER.

Associated rules:

  • R100: Vertex Cover -> Multiple Copy File Allocation (as target)

Prerequisite note: The source rule R100 requires MinimumVertexCover which already exists in the codebase.

Definition

Name: MultipleCopyFileAllocation
Canonical name: MULTIPLE COPY FILE ALLOCATION
Reference: Garey & Johnson, Computers and Intractability, A4 SR6, p.227

Mathematical definition:

INSTANCE: Graph G = (V, E), for each v in V a usage u(v) in Z+ and a storage cost s(v) in Z+, and a positive integer K.
QUESTION: Is there a subset V' of V such that, if for each v in V we let d(v) denote the number of edges in the shortest path in G from v to a member of V', we have
sum_{v in V'} s(v) + sum_{v in V} d(v) * u(v) <= K ?

Variables

  • Count: n = |V| binary variables, one per vertex.
  • Per-variable domain: {0, 1} -- 0 means no file copy at vertex v, 1 means a file copy is placed at vertex v.
  • Meaning: x_v = 1 if v in V' (vertex stores a file copy), x_v = 0 otherwise. For each vertex v, d(v) is the shortest-path distance in G to the nearest vertex with x_w = 1. The objective is to find an assignment such that the total cost (storage + access) is at most K.
  • dims() specification: vec![2; n] where n = |V|. Standard binary variable encoding.

Schema (data type)

Type name: MultipleCopyFileAllocation
Variants: none

Field Type Description
graph SimpleGraph Network topology G = (V, E)
usage Vec<i64> Usage frequency u(v) for each vertex v in V
storage Vec<i64> Storage cost s(v) for placing a file copy at vertex v
bound i64 Maximum total cost K (storage + weighted access distance)

Size fields (getter methods for overhead expressions and declare_variants!):

  • num_vertices() — returns |V|
  • num_edges() — returns |E|

Notes:

  • This is a satisfaction (decision) problem: Metric = bool, implementing SatisfactionProblem.
  • The shortest-path distance d(v) is computed via BFS on the unweighted graph G.
  • If v in V' (v stores a copy), then d(v) = 0.
  • The problem is related to the Uncapacitated Facility Location Problem (UFLP) but uses graph shortest-path distances rather than arbitrary metric distances.

Complexity

  • Best known exact algorithm: Brute-force: enumerate all 2^n subsets V' of V, for each compute BFS from V' to determine all d(v), then sum costs. Time O(2^n * (n + m)). No sub-exponential exact algorithm is known for general graphs.
  • NP-completeness: NP-complete in the strong sense [Van Sickle and Chandy, 1977], even if all vertices have the same usage u and the same storage cost s.
  • Approximation: Related to the metric Uncapacitated Facility Location Problem, which has a 1.488-approximation algorithm [Li, 2013]. However, the graph-distance variant has its own structure.
  • Complexity string: "2^num_vertices"

declare_variants! guidance:

crate::declare_variants! {
    MultipleCopyFileAllocation => "2^num_vertices",
}
  • References:
    • L. van Sickle and K. M. Chandy (1977). "The complexity of computer network design problems." Tech report, University of Texas at Austin.
    • S. Li (2013). "A 1.488 approximation algorithm for the uncapacitated facility location problem." Information and Computation 222:45-58.

Extra Remark

Full book text:

INSTANCE: Graph G = (V,E), for each v in V a usage u(v) in Z+ and a storage cost s(v) in Z+, and a positive integer K.
QUESTION: Is there a subset V' of V such that, if for each v in V we let d(v) denote the number of edges in the shortest path in G from v to a member of V', we have
sum_{v in V'} s(v) + sum_{v in V} d(v)*u(v) <= K ?
Reference: [Van Sickle and Chandy, 1977]. Transformation from VERTEX COVER.
Comment: NP-complete in the strong sense, even if all v in V have the same value of u(v) and the same value of s(v).

How to solve

  • It can be solved by (existing) bruteforce -- enumerate all 2^n subsets V' of V, compute BFS distances from each vertex to V', compute total cost, check if any V' achieves cost <= K.
  • It can be solved by reducing to integer programming -- binary variables x_v for placement, auxiliary variables for distances with big-M constraints. Minimize sum s(v)*x_v + sum u(v)*d_v subject to distance constraints.
  • Other: Greedy heuristic: iteratively add the vertex to V' that gives the largest cost reduction, until no improvement is possible.

Example Instance

Instance 1 (YES, vertex cover placement is optimal):
Graph G with 6 vertices {0, 1, 2, 3, 4, 5} and 6 edges (cycle C_6):

  • Edges: {0,1}, {1,2}, {2,3}, {3,4}, {4,5}, {5,0}
  • Usage: u(v) = 10 for all v
  • Storage: s(v) = 1 for all v
  • Bound K = 33.

Placement V' = {1, 3, 5}:

  • Storage cost: 3 * 1 = 3
  • Distances: d(0) = 1 (adjacent to 1 and 5), d(1) = 0, d(2) = 1 (adjacent to 1 and 3), d(3) = 0, d(4) = 1 (adjacent to 3 and 5), d(5) = 0.
  • Access cost: (1+0+1+0+1+0) * 10 = 30.
  • Total cost: 3 + 30 = 33 <= K = 33.
  • Answer: YES

Instance 2 (YES, non-uniform costs):
Graph G with 6 vertices {0, 1, 2, 3, 4, 5} and 7 edges:

  • Edges: {0,1}, {1,2}, {2,3}, {3,4}, {4,5}, {0,5}, {0,3}
  • Usage: u = [5, 20, 3, 15, 8, 2]
  • Storage: s = [10, 5, 10, 5, 10, 10]
  • Bound K = 43.

Placement V' = {1, 3}:

  • Storage cost: s(1) + s(3) = 5 + 5 = 10
  • Distances: d(0) = 1 (adjacent to 1), d(1) = 0, d(2) = 1 (adjacent to 1 and 3), d(3) = 0, d(4) = 1 (adjacent to 3), d(5) = 2 (shortest path 5->0->1 = 2 or 5->0->3 = 2).
  • Access cost: 15 + 020 + 13 + 015 + 18 + 22 = 5+0+3+0+8+4 = 20.
  • Total cost: 10 + 20 = 30 <= K = 43.
  • Answer: YES

Instance 3 (NO, high cost forced):
Graph G with 6 vertices {0, 1, 2, 3, 4, 5} -- path P_6:

  • Edges: {0,1}, {1,2}, {2,3}, {3,4}, {4,5}
  • Usage: u(v) = 100 for all v
  • Storage: s(v) = 1 for all v
  • Bound K = 5.

Any V' with |V'| copies costs at least |V'| in storage. To keep access cost at most 5 - |V'|, we'd need sum d(v)*100 <= 5 - |V'|, which requires sum d(v) = 0, meaning V' = V (all 6 vertices). But then storage cost = 6 > 5 = K. Answer: NO.

Metadata

Metadata

Assignees

No one assigned

    Labels

    GoodAn issue passed all checks.modelA model problem to be implemented.

    Type

    No type

    Projects

    Status

    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions