Skip to content

snakemake-workflows/dna-seq-benchmark

Repository files navigation

Snakemake workflow: dna-seq-benchmark

Snakemake GitHub actions status

A Snakemake workflow for benchmarking variant calling approaches with Genome in a Bottle (GIAB) data (and other custom benchmark datasets). The workflow uses a combination of bedtools, mosdepth, rtg-tools, pandas and datavzrd.

Usage

The usage of this workflow is described in the Snakemake Workflow Catalog.

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) benchmark-giabsitory and its DOI (see above).

Output

The workflow writes both final deliverables and intermediate files under results/.

Primary result tables

  • results/fp-fn/callsets/<callset>.{fp|fn}.tsv: aggregated FP/FN tables per callset across coverages
  • results/fp-fn/benchmarks/<benchmark>.{fp|fn}.tsv: aggregated FP/FN tables per benchmark
  • results/precision-recall/benchmarks/<benchmark>.<snvs|indels>.<base|vaf-stratified>.tsv: aggregated precision/recall tables per benchmark (optionally stratified by vaf)
  • results/annotated/tsv/<benchmark>/: annotated shared FN tables
  • results/annotated/tsv/<benchmark>/<callset>.unique_<fp|fn>.annotated.tsv: annotated unique FP/FN tables
  • results/fp-fn/vcf/: VCFs generated from shared/unique FP/FN tables

Intermediates and automatic cleanup

  • Raw somatic extraction tables are written to results/intermediate/fp-fn/raw/callsets/.
  • Several per-coverage and per-callset aggregation inputs are marked as Snakemake temp() outputs and are removed automatically once downstream targets are finished.
  • If you want to keep all intermediates for debugging, run Snakemake with --notemp.

About

A snakemake workflow for benchmarking variant calling approaches with Genome in a Bottle (GIAB), CHM (syndip) or other custom datasets

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors