A Snakemake workflow for benchmarking variant calling approaches with Genome in a Bottle (GIAB) data (and other custom benchmark datasets). The workflow uses a combination of bedtools, mosdepth, rtg-tools, pandas and datavzrd.
The usage of this workflow is described in the Snakemake Workflow Catalog.
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) benchmark-giabsitory and its DOI (see above).
The workflow writes both final deliverables and intermediate files under results/.
results/fp-fn/callsets/<callset>.{fp|fn}.tsv: aggregated FP/FN tables per callset across coveragesresults/fp-fn/benchmarks/<benchmark>.{fp|fn}.tsv: aggregated FP/FN tables per benchmarkresults/precision-recall/benchmarks/<benchmark>.<snvs|indels>.<base|vaf-stratified>.tsv: aggregated precision/recall tables per benchmark (optionally stratified by vaf)results/annotated/tsv/<benchmark>/: annotated shared FN tablesresults/annotated/tsv/<benchmark>/<callset>.unique_<fp|fn>.annotated.tsv: annotated unique FP/FN tablesresults/fp-fn/vcf/: VCFs generated from shared/unique FP/FN tables
- Raw somatic extraction tables are written to
results/intermediate/fp-fn/raw/callsets/. - Several per-coverage and per-callset aggregation inputs are marked as Snakemake
temp()outputs and are removed automatically once downstream targets are finished. - If you want to keep all intermediates for debugging, run Snakemake with
--notemp.