A tool for identifying the differences between two sets of structures, in particular DNA-protein complexes. Built primarily for identifying systematic differences between experimentally determined structures and computationally predicted structures generated by programs such as AlphaFold3.
- Structure Alignment: Superimpose protein-DNA complexes for comparison
- RMSD Calculation: Comprehensive RMSD decomposition (protein, DNA, interface, per-residue)
- B-factor/pLDDT Analysis: Compare experimental B-factors with AlphaFold confidence scores
- Consensus Error Mapping: Handle datasets with incidental mutations
- Principal Component Analysis: Identify systematic differences
- DNA Geometry Analysis: Analyze DNA structure using CURVES+
To install BioStructBenchmark:
pip install biostructbenchmarkbiostructbenchmark experimental.cif predicted.cifThis will align the structures and output RMSD metrics for the protein, DNA, and interface regions.
Compare experimental B-factors with AlphaFold pLDDT confidence scores:
biostructbenchmark experimental.cif predicted.cif --analyze-bfactor --output-dir ./resultsThis will generate:
- Console output with summary statistics
- CSV file with per-residue B-factor comparisons at
./results/analysis/bfactor_comparison.csv
Understanding the Output:
- Mean Experimental B-factor: Average thermal motion/disorder in experimental structure
- Mean Predicted Confidence (pLDDT): Average AlphaFold confidence (0-100 scale)
- Correlation: How well B-factors correlate with pLDDT scores
- RMSD: Root mean square difference between B-factors and pLDDT
- High/Low Confidence Accuracy: Accuracy in regions with pLDDT > 70 and ≤ 70
CSV Output Format:
residue_id,chain_id,position,experimental_bfactor,predicted_confidence,difference,normalized_bfactor
A_1,A,1,25.3,85.2,59.9,-0.5Notes:
- Only standard amino acid residues are analyzed; water molecules, ligands, and other heteroatoms are excluded
- For multi-model structures (e.g., NMR ensembles), only the first model is used
- Structures must have >50% non-zero B-factors to pass validation
Specify a custom path for the B-factor CSV:
biostructbenchmark experimental.cif predicted.cif \
--analyze-bfactor \
--bfactor-output my_analysis.csvbiostructbenchmark experimental.cif predicted.cif \
--save-structures \
--output-dir ./aligned_structuresThis project uses uv for fast, reliable package management.
# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and install with dev dependencies
git clone https://github.com/BioStructBenchmark/BioStructBenchmark.git
cd BioStructBenchmark
# Install dev dependencies and set up pre-commit hooks
make devmake help # Show all available commands
make test # Run tests with coverage (80% threshold)
make lint # Run Ruff linting and format checks
make typecheck # Run mypy type checking
make format # Auto-format code with Ruff
make quality # Run docstring coverage and dead code checks
make security # Run Bandit and pip-audit security scans
make check # Run all checks (lint, typecheck, quality, test, security)- Ruff: Fast linting and formatting (replaces Black, isort, Flake8)
- mypy: Static type checking with strict mode
- pytest-cov: Test coverage with 80% threshold
- interrogate: Docstring coverage checking
- vulture: Dead code detection
- Bandit: Security vulnerability scanning
- pip-audit: Dependency vulnerability scanning
- pre-commit: Git hooks for automated checks (includes zizmor for GitHub Actions)
All pull requests run:
- Linting and formatting checks
- Type checking
- Tests with coverage
- Security scans (Bandit + pip-audit)