-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Motivation
MULTIPLE COPY FILE ALLOCATION from Garey & Johnson, A4 SR6. An NP-complete problem (in the strong sense) from the Storage and Retrieval category. It models the problem of deciding where to place copies of a file in a computer network to minimize the combined storage and access costs. Each node in the network has a usage frequency and a storage cost; the access cost for a node depends on its shortest-path distance to the nearest file copy. This problem is fundamental to distributed systems, content delivery networks, and facility location theory. NP-complete in the strong sense, even with uniform usage and storage costs. Proved by Van Sickle and Chandy (1977) via reduction from VERTEX COVER.
Associated rules:
- R100: Vertex Cover -> Multiple Copy File Allocation (as target)
Prerequisite note: The source rule R100 requires MinimumVertexCover which already exists in the codebase.
Definition
Name: MultipleCopyFileAllocation
Canonical name: MULTIPLE COPY FILE ALLOCATION
Reference: Garey & Johnson, Computers and Intractability, A4 SR6, p.227
Mathematical definition:
INSTANCE: Graph G = (V, E), for each v in V a usage u(v) in Z+ and a storage cost s(v) in Z+, and a positive integer K.
QUESTION: Is there a subset V' of V such that, if for each v in V we let d(v) denote the number of edges in the shortest path in G from v to a member of V', we have
sum_{v in V'} s(v) + sum_{v in V} d(v) * u(v) <= K ?
Variables
- Count: n = |V| binary variables, one per vertex.
- Per-variable domain: {0, 1} -- 0 means no file copy at vertex v, 1 means a file copy is placed at vertex v.
- Meaning: x_v = 1 if v in V' (vertex stores a file copy), x_v = 0 otherwise. For each vertex v, d(v) is the shortest-path distance in G to the nearest vertex with x_w = 1. The objective is to find an assignment such that the total cost (storage + access) is at most K.
dims()specification:vec![2; n]where n = |V|. Standard binary variable encoding.
Schema (data type)
Type name: MultipleCopyFileAllocation
Variants: none
| Field | Type | Description |
|---|---|---|
graph |
SimpleGraph |
Network topology G = (V, E) |
usage |
Vec<i64> |
Usage frequency u(v) for each vertex v in V |
storage |
Vec<i64> |
Storage cost s(v) for placing a file copy at vertex v |
bound |
i64 |
Maximum total cost K (storage + weighted access distance) |
Size fields (getter methods for overhead expressions and declare_variants!):
num_vertices()— returns |V|num_edges()— returns |E|
Notes:
- This is a satisfaction (decision) problem:
Metric = bool, implementingSatisfactionProblem. - The shortest-path distance d(v) is computed via BFS on the unweighted graph G.
- If v in V' (v stores a copy), then d(v) = 0.
- The problem is related to the Uncapacitated Facility Location Problem (UFLP) but uses graph shortest-path distances rather than arbitrary metric distances.
Complexity
- Best known exact algorithm: Brute-force: enumerate all 2^n subsets V' of V, for each compute BFS from V' to determine all d(v), then sum costs. Time O(2^n * (n + m)). No sub-exponential exact algorithm is known for general graphs.
- NP-completeness: NP-complete in the strong sense [Van Sickle and Chandy, 1977], even if all vertices have the same usage u and the same storage cost s.
- Approximation: Related to the metric Uncapacitated Facility Location Problem, which has a 1.488-approximation algorithm [Li, 2013]. However, the graph-distance variant has its own structure.
- Complexity string:
"2^num_vertices"
declare_variants! guidance:
crate::declare_variants! {
MultipleCopyFileAllocation => "2^num_vertices",
}- References:
- L. van Sickle and K. M. Chandy (1977). "The complexity of computer network design problems." Tech report, University of Texas at Austin.
- S. Li (2013). "A 1.488 approximation algorithm for the uncapacitated facility location problem." Information and Computation 222:45-58.
Extra Remark
Full book text:
INSTANCE: Graph G = (V,E), for each v in V a usage u(v) in Z+ and a storage cost s(v) in Z+, and a positive integer K.
QUESTION: Is there a subset V' of V such that, if for each v in V we let d(v) denote the number of edges in the shortest path in G from v to a member of V', we have
sum_{v in V'} s(v) + sum_{v in V} d(v)*u(v) <= K ?
Reference: [Van Sickle and Chandy, 1977]. Transformation from VERTEX COVER.
Comment: NP-complete in the strong sense, even if all v in V have the same value of u(v) and the same value of s(v).
How to solve
- It can be solved by (existing) bruteforce -- enumerate all 2^n subsets V' of V, compute BFS distances from each vertex to V', compute total cost, check if any V' achieves cost <= K.
- It can be solved by reducing to integer programming -- binary variables x_v for placement, auxiliary variables for distances with big-M constraints. Minimize sum s(v)*x_v + sum u(v)*d_v subject to distance constraints.
- Other: Greedy heuristic: iteratively add the vertex to V' that gives the largest cost reduction, until no improvement is possible.
Example Instance
Instance 1 (YES, vertex cover placement is optimal):
Graph G with 6 vertices {0, 1, 2, 3, 4, 5} and 6 edges (cycle C_6):
- Edges: {0,1}, {1,2}, {2,3}, {3,4}, {4,5}, {5,0}
- Usage: u(v) = 10 for all v
- Storage: s(v) = 1 for all v
- Bound K = 33.
Placement V' = {1, 3, 5}:
- Storage cost: 3 * 1 = 3
- Distances: d(0) = 1 (adjacent to 1 and 5), d(1) = 0, d(2) = 1 (adjacent to 1 and 3), d(3) = 0, d(4) = 1 (adjacent to 3 and 5), d(5) = 0.
- Access cost: (1+0+1+0+1+0) * 10 = 30.
- Total cost: 3 + 30 = 33 <= K = 33.
- Answer: YES
Instance 2 (YES, non-uniform costs):
Graph G with 6 vertices {0, 1, 2, 3, 4, 5} and 7 edges:
- Edges: {0,1}, {1,2}, {2,3}, {3,4}, {4,5}, {0,5}, {0,3}
- Usage: u = [5, 20, 3, 15, 8, 2]
- Storage: s = [10, 5, 10, 5, 10, 10]
- Bound K = 43.
Placement V' = {1, 3}:
- Storage cost: s(1) + s(3) = 5 + 5 = 10
- Distances: d(0) = 1 (adjacent to 1), d(1) = 0, d(2) = 1 (adjacent to 1 and 3), d(3) = 0, d(4) = 1 (adjacent to 3), d(5) = 2 (shortest path 5->0->1 = 2 or 5->0->3 = 2).
- Access cost: 15 + 020 + 13 + 015 + 18 + 22 = 5+0+3+0+8+4 = 20.
- Total cost: 10 + 20 = 30 <= K = 43.
- Answer: YES
Instance 3 (NO, high cost forced):
Graph G with 6 vertices {0, 1, 2, 3, 4, 5} -- path P_6:
- Edges: {0,1}, {1,2}, {2,3}, {3,4}, {4,5}
- Usage: u(v) = 100 for all v
- Storage: s(v) = 1 for all v
- Bound K = 5.
Any V' with |V'| copies costs at least |V'| in storage. To keep access cost at most 5 - |V'|, we'd need sum d(v)*100 <= 5 - |V'|, which requires sum d(v) = 0, meaning V' = V (all 6 vertices). But then storage cost = 6 > 5 = K. Answer: NO.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status