Skip to content

[Model] BoyceCoddNormalFormViolation #447

@isPANN

Description

@isPANN

Motivation

BOYCE-CODD NORMAL FORM VIOLATION (P177) from Garey & Johnson, A4 SR29. A classical NP-complete problem in database theory. Determining whether a subset of attributes violates Boyce-Codd normal form is fundamental to relational database schema design and normalization.

Associated reduction rules:

  • R123: Hitting Set -> Boyce-Codd Normal Form Violation (this problem is the target)

Definition

Name: BoyceCoddNormalFormViolation
Canonical name: Boyce-Codd Normal Form Violation (also: BCNF Violation, BCNF Testing)
Reference: Garey & Johnson, Computers and Intractability, A4 SR29

Mathematical definition:

Background: A functional dependency X → Y means that the values of attributes X uniquely determine the values of attributes Y. The closure X+ of a set X under F is the set of all attributes determined (directly or transitively) by X. A set X is a superkey if X+ = A' (it determines everything). A relation satisfies Boyce-Codd Normal Form (BCNF) if every non-trivial functional dependency X → Y has X as a superkey — i.e., whenever X determines something, it determines everything.

INSTANCE: A set A of attribute names, a collection F of functional dependencies on A, and a subset A' ⊆ A.
QUESTION: Is there a subset X ⊆ A' and two attributes y, z ∈ A' \ X such that y ∈ X+ but z ∉ X+? (In other words, X determines y but not z, meaning X is not a superkey — a BCNF violation.)

Variables

  • Count: target_subset.len() binary variables, one per attribute in A'.
  • Per-variable domain: binary {0, 1} — whether the attribute is included in X.
  • dims(): vec![2; target_subset.len()]
  • evaluate(): Let X = {target_subset[i] : config[i] = 1}. Compute the closure X+ under F. Then check all pairs (y, z) in A' \ X: return true if there exist y, z ∈ A' \ X such that y ∈ X+ and z ∉ X+ (i.e., X determines y but not z, so X is not a superkey).

Schema (data type)

Type name: BoyceCoddNormalFormViolation
Variants: none (no graph or weight type parameter; functional dependencies are stored directly)

Field Type Description
num_attributes usize Number of attributes in A (attributes indexed 0..num_attributes)
functional_deps Vec<(Vec<usize>, Vec<usize>)> Collection F of functional dependencies; each is (lhs_attributes, rhs_attributes)
target_subset Vec<usize> The subset A' ⊆ A to test for BCNF violation

Size fields (getter methods for overhead expressions and declare_variants!):

  • num_attributes() — returns num_attributes
  • num_functional_deps() — returns functional_deps.len()
  • num_target_attributes() — returns target_subset.len()

Complexity

  • Decision complexity: NP-complete (Beeri and Bernstein, 1979; transformation from HITTING SET).
  • Best known exact algorithm: Brute-force: enumerate all 2^|A'| subsets X, compute closure X+ under F, check if X+ contains some but not all of A' \ X. Total: O(2^num_target_attributes * num_target_attributes^2 * num_functional_deps).
  • declare_variants! complexity string: "2^num_target_attributes * num_target_attributes^2 * num_functional_deps"
  • Parameterized: The problem remains NP-complete even when A' is required to satisfy third normal form (3NF). For the special case where functional dependencies form a hierarchy, BCNF testing is linear time.
  • References:
    • [Beeri and Bernstein, 1979] C. Beeri and P. A. Bernstein, "Computational Problems Related to the Design of Normal Form Relational Schemas", ACM Trans. Database Systems, 4(1), pp. 30-59, 1979.
    • [Bernstein and Beeri, 1976] P. A. Bernstein and C. Beeri, "An Algorithmic Approach to Normalization of Relational Database Schemas", University of Toronto, 1976.

Specialization

  • Related problems: The problem is closely related to key determination and superkey testing for relational schemas.
  • Restriction: When all functional dependencies have single-attribute right-hand sides and the dependency graph is hierarchical, the problem is solvable in polynomial time.
  • Generalization of: Testing whether a specific FD violates BCNF (which is polynomial) to testing whether any violation exists in a given attribute subset.

Extra Remark

Full book text:

INSTANCE: A set A of attribute names, a collection F of functional dependencies on A, and a subset A' ⊆ A.
QUESTION: Does A' violate Boyce-Codd normal form for the relational system <A,F>, i.e., is there a subset X ⊆ A' and two attribute names y,z in A' - X such that (X,{y}) in F* and (X,{z}) not in F*, where F* is the closure of F?
Reference: [Bernstein and Beeri, 1976], [Beeri and Bernstein, 1978]. Transformation from HITTING SET.
Comment: Remains NP-complete even if A' is required to satisfy "third normal form," i.e., if X ⊆ A' is a key for the system <A',F> and if two names y,z in A'-X satisfy (X,{y}) in F* and (X,{z}) not in F*, then z is a prime attribute for <A',F>.

How to solve

  • It can be solved by (existing) bruteforce. Enumerate all subsets X of A'; for each X and each pair (y, z) in A' - X, compute the attribute closure X+ under F and check if y in X+ but z not in X+.
  • It can be solved by reducing to integer programming.
  • Other: (none identified)

Example Instance

Instance 1 (YES — BCNF violation exists):
num_attributes = 6 (attributes: {0, 1, 2, 3, 4, 5})
functional_deps = [({0,1}, {2}), ({2}, {3}), ({3,4}, {5})]
target_subset = [0, 1, 2, 3, 4, 5] (all attributes)

The only candidate key for A' is {0, 1, 4}: closure {0,1,4} -> {2} -> {3} -> {5} = A.

Consider X = {2}, y = 3, z = 0:

  • Closure of {2}: {2} -> {3} via FD2, so {2}+ = {2, 3}
  • y = 3: 3 ∈ {2}+ → X determines y. YES.
  • z = 0: 0 ∉ {2}+ → X does not determine z. YES.
  • {2} is not a superkey (closure is only {2, 3}, not all of A').
  • Therefore X = {2} with y = 3, z = 0 witnesses a BCNF violation.

Answer: YES.

Instance 2 (NO — A' satisfies BCNF):
num_attributes = 4 (attributes: {0, 1, 2, 3})
functional_deps = [({0,1}, {2,3}), ({2,3}, {0,1}), ({0,2}, {1,3}), ({1,3}, {0,2})]
target_subset = [0, 1, 2, 3]

There are 4 candidate keys: {0,1}, {2,3}, {0,2}, {1,3} — a cyclic key structure.

Check all 16 subsets X ⊆ A':

  • Singletons {0}, {1}, {2}, {3}: no FD fires, so X+ = X. Trivial (determines nothing new).
  • {0,1}+: apply {0,1}->{2,3} → {0,1,2,3} = A'. Superkey.
  • {0,2}+: apply {0,2}->{1,3} → {0,1,2,3} = A'. Superkey.
  • {1,3}+: apply {1,3}->{0,2} → {0,1,2,3} = A'. Superkey.
  • {2,3}+: apply {2,3}->{0,1} → {0,1,2,3} = A'. Superkey.
  • {0,3}+: no FD fires (each FD needs a pair not contained in {0,3}). X+ = {0,3}. Trivial.
  • {1,2}+: no FD fires. X+ = {1,2}. Trivial.
  • All triples and A' itself: superkeys (contain at least one 2-element key).

Every subset either has trivial closure (X+ = X) or is a superkey (X+ = A'). No subset has a partial closure, so no BCNF violation exists.

Answer: NO, A' satisfies BCNF.

Metadata

Metadata

Assignees

No one assigned

    Labels

    GoodAn issue passed all checks.modelA model problem to be implemented.

    Type

    No type

    Projects

    Status

    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions