Bug fix consensus#165
Merged
Merged
Conversation
Supports bcftools view -T <targets> to filter a VCF to only positions present in a targets VCF file. Used to strip germline variants from the phased somatic+germline VCF before emitting phased_somatic_vcf. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After LONGPHASE_PHASE_SOMATIC, use BCFTOOLS_VIEW -T to filter the combined phased somatic+germline VCF against the original somatic VCF positions. The emitted phased_somatic_vcf (used by SOMATIC_VEP) now contains only somatic variants. Phase tags (PS/HP) are preserved. This prevents germline variants from being double-annotated by both GERMLINE_VEP and SOMATIC_VEP. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Scoped to the PHASING_HAPLOTYPING subworkflow to avoid matching any future BCFTOOLS_VIEW calls. Output is intermediate (publishDir disabled). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…PHASE_SOMATIC LONGPHASE_PHASE_SOMATIC now produces an intermediate combined somatic+germline VCF (renamed to somatic_smallvariants_combined, publish disabled). BCFTOOLS_VIEW produces the final somatic-only phased VCF published as somatic_smallvariants to variants/phased/, matching the previous output name and location. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- clair_only: rewritten to use test_sheet_2.csv (5 samples). Adds assertions for sample4 replicate merging (rep1/rep2 QC dirs, single merged output), sample5 custom ONT model override (VCF header check), phased VCF existence/size, BAM files, QC dirs, and VEP outputs. - deep_only: enhanced with phased VCF size assertions and loop-based pattern consistent with other tests. - consensus: new test — both callers, germline_var_combine='consensus', somatic_var_combine='consensus'. Asserts both caller raw outputs and phased consensus VCFs exist with data. - union: new test — both callers, germline_var_combine='all', somatic_var_combine='all'. Explicit test for the union combine path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n PRs Test tagging: - default.nf.test: tag "small" — runs on every PR - clair_only, deep_only, consensus, union: tag "extended" — skipped on PRs Assertion enhancements (deep_only, consensus, union now match clair_only): - BAM .bai index checks added to all three tests - QC directory checks (cramino_aln, nanoplot_aln, mosdepth, samtools) - VEP output directory checks (germline, somatic, SVs for paired) - Severus SV outputs added to deep_only CI (nf-test.yml): - Pass tags: "small" to get-shards and nf-test actions on pull_request events - Releases and workflow_dispatch continue to run the full extended suite Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR appears to address filename collisions and output correctness in the small-variant consensus/phasing parts of the lrsomatic Nextflow pipeline, and expands nf-test coverage around different caller-selection / combine-mode scenarios.
Changes:
- Avoid VCF basename collisions by renaming DeepVariant/DeepSomatic outputs and by re-sorting/renaming
bcftools isecconsensus outputs in consensus mode. - Add a local
bcftools viewmodule and use it in phasing to publish a somatic-only phased VCF (filtering from the phased somatic+germline VCF using somatic calls as targets). - Add multiple new nf-test scenarios (union/all, consensus, deep-only, clair-only) and adjust CI to run only
small-tagged tests on pull requests.
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
subworkflows/local/small_variant_consensus.nf |
Renames/sorts consensus-mode outputs to prevent downstream filename collisions. |
modules/nf-core/deepvariant/postprocessvariants/main.nf |
Changes DeepVariant output prefix to include _germline to avoid ambiguous basenames. |
modules/local/deepsomatic/postprocessvariants/main.nf |
Changes DeepSomatic output prefix to include _somatic to avoid ambiguous basenames. |
subworkflows/local/phasing_haplotyping.nf |
Adds bcftools view filtering so published phased somatic VCF is somatic-only. |
modules/local/bcftools/view/main.nf |
New local module implementing bcftools view -T filtering with index writing. |
modules/local/bcftools/view/meta.yml |
Metadata for the new bcftools view module. |
modules/local/bcftools/view/environment.yml |
Conda environment for the new bcftools view module. |
conf/modules.config |
Adjusts Longphase somatic prefix/publishing; publishes filtered somatic VCF; configures consensus-sort renaming. |
subworkflows/local/paired/paired_smallvar_germline.nf |
Normalizes params.germline_var_keep handling (list vs scalar) and initializes channels. |
subworkflows/local/paired/paired_smallvar_somatic.nf |
Normalizes params.somatic_var_keep handling (list vs scalar) and initializes channels. |
subworkflows/local/tumor_only/tumoronly_smallvar.nf |
Normalizes *_var_keep handling (list vs scalar) and initializes channels. |
tests/default.nf.test |
Adds small tag and updates expected DeepVariant/DeepSomatic filenames. |
tests/default.nf.test.snap |
Snapshot updates for renamed outputs and added BCFTOOLS_VIEW version reporting. |
tests/consensus.nf.test |
New nf-test for consensus combine mode. |
tests/consensus.nf.test.snap |
Snapshot for consensus combine mode. |
tests/union.nf.test |
New nf-test for “all/union” combine mode. |
tests/union.nf.test.snap |
Snapshot for “all/union” combine mode. |
tests/deep_only.nf.test |
New nf-test for running only DeepVariant+DeepSomatic. |
tests/deep_only.nf.test.snap |
Snapshot for deep-only mode. |
tests/clair_only.nf.test |
New nf-test for running only Clair callers (uses an extended samplesheet). |
tests/clair_only.nf.test.snap |
Snapshot for clair-only mode. |
conf/test.config |
Changes test profile genome to GRCh38. |
nextflow.config |
Changes default caller selection/priorities (global defaults). |
.github/workflows/nf-test.yml |
Runs only small-tagged nf-tests on pull requests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+260
to
+264
| // MODULE: BCFTOOLS_VIEW (label: process_medium) | ||
| // Filter the phased somatic+germline VCF to somatic-only positions. | ||
| // Uses the original somatic VCF as a targets (-T) file so only positions | ||
| // called as somatic are retained. Phase tags (PS/HP) on somatic variants | ||
| // are preserved; germline records are dropped. |
Comment on lines
70
to
77
| "type": "string", | ||
| "default": "['clair']", | ||
| "enum": ["deepvariant", "clair"] | ||
| }, |
Comment on lines
+83
to
+85
| "items": { | ||
| "type": "string", | ||
| "default": "['clair']", |
| @@ -121,7 +121,7 @@ process DEEPSOMATIC_POSTPROCESSVARIANTS { | |||
| if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) { | |||
| error "DEEPVARIANT module does not support Conda. Please use Docker / Singularity / Podman instead." | |||
| when { | ||
| params { | ||
| outdir = "$outputDir" | ||
| input = "https://github.com/IntGenomicsLab/test-datasets/refs/heads/main/samplesheets/test_sheet_2.csv" |
| NFT_VER: ${{ env.NFT_VER }} | ||
| with: | ||
| max_shards: 7 | ||
| tags: ${{ github.event_name == 'pull_request' && 'small' || '' }} |
| profile: ${{ matrix.profile }} | ||
| shard: ${{ matrix.shard }} | ||
| total_shards: ${{ env.TOTAL_SHARDS }} | ||
| tags: ${{ github.event_name == 'pull_request' && 'small' || '' }} |
Comment on lines
+16
to
+22
| // Small variant calling options | ||
| germline_var_keep = ['deepvariant', 'clair'] | ||
| somatic_var_keep = ['deepsomatic', 'clair'] | ||
| germline_var_keep = ['clair'] | ||
| somatic_var_keep = ['clair'] | ||
| germline_var_combine = 'all' | ||
| somatic_var_combine = 'all' | ||
| prioritize_caller_germline = 'deepvariant' | ||
| prioritize_caller_somatic = 'deepsomatic' | ||
| prioritize_caller_germline = 'clair' | ||
| prioritize_caller_somatic = 'clair' |
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 33 out of 33 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
nextflow_schema.json:85
- Same schema issue as
germline_var_keep:somatic_var_keepistype: ["string"]but still hasitemsand a misplaced default. Update the schema so the default/type definition matches what the pipeline actually accepts (string and/or array).
"somatic_var_keep": {
"type": ["string"],
"description": "List of somatic variant callers to use. Must include at least one of [deepsomatic, clair].",
"items": {
"type": "string",
"default": "clair"
}
},
Comment on lines
+266
to
+270
| // Each replicate is aligned separately so that minimap2 can tag its reads with a | ||
| // unique @RG (sample + type + replicate). Replicates are merged after alignment. | ||
| // ch_samplesheet is [meta_with_replicate, [bam]] -- one item per replicate per sample. | ||
| // ch_ubams defaults to ch_samplesheet; the fiber-seq block below may override it. | ||
| ch_ubams = ch_samplesheet |
Comment on lines
70
to
77
| "germline_var_keep": { | ||
| "type": ["string", "array"], | ||
| "type": ["string"], | ||
| "description": "List of germline variant callers to use. Must include at least one of [deepvariant, clair].", | ||
| "items": { | ||
| "type": "string", | ||
| "enum": ["deepvariant", "clair"] | ||
| }, | ||
| "minItems": 1, | ||
| "default": "['deepvariant', 'clair']" | ||
| "default": "clair" | ||
| } | ||
| }, |
|
|
||
| stub: | ||
| def args = task.ext.args ?: '' | ||
| prefix = task.ext.suffix ? "${meta.id}${task.ext.suffix}" : "${meta.id}" |
Comment on lines
+275
to
+279
| // Each replicate is aligned separately so that minimap2 can tag its reads with a | ||
| // unique @RG (sample + type + replicate). Replicates are merged after alignment. | ||
| // ch_samplesheet is [meta_with_replicate, [bam]] -- one item per replicate per sample. | ||
| // ch_ubams defaults to ch_samplesheet; the fiber-seq block below may override it. | ||
| ch_ubams = ch_samplesheet |
Comment on lines
321
to
328
| if (!params.skip_normalfiber){ | ||
| // Process all samples (including normals) for fiber-seq | ||
| ubams = ch_cat_ubams | ||
| ubams = ch_samplesheet | ||
| } | ||
| else { | ||
| // Skip fiber-seq processing for normal samples; set aside normals to re-join later | ||
| ch_cat_ubams | ||
| ch_samplesheet | ||
| .branch { meta, _bams -> |
Comment on lines
+6
to
+8
| container "${workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container | ||
| ? 'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/47/474a5ea8dc03366b04df884d89aeacc4f8e6d1ad92266888e7a8e7958d07cde8/data' | ||
| : 'community.wave.seqera.io/library/bcftools_htslib:0a3fa2654b52006f'}" |
Comment on lines
+7
to
+10
| tools: | ||
| - view: | ||
| description: VCF/BCF conversion, view, subset and filter VCF/BCF files. | ||
| homepage: http://samtools.github.io/bcftools/bcftools.html |
Comment on lines
70
to
74
| "germline_var_keep": { | ||
| "type": ["string", "array"], | ||
| "description": "List of germline variant callers to use. Must include at least one of [deepvariant, clair].", | ||
| "items": { | ||
| "type": "string", | ||
| "enum": ["deepvariant", "clair"] | ||
| }, | ||
| "minItems": 1, | ||
| "default": "['deepvariant', 'clair']" | ||
| "type": "string", | ||
| "description": "Comma-separated list of germline variant callers to use. Valid values: deepvariant, clair. Example: 'deepvariant,clair'", | ||
| "default": "clair" | ||
| }, |
Comment on lines
75
to
79
| "somatic_var_keep": { | ||
| "type": ["string", "array"], | ||
| "description": "List of somatic variant callers to use. Must include at least one of [deepsomatic, clair].", | ||
| "items": { | ||
| "type": "string", | ||
| "enum": ["deepsomatic", "clair"] | ||
| }, | ||
| "minItems": 1, | ||
| "default": "['deepsomatic', 'clair']" | ||
| "type": "string", | ||
| "description": "Comma-separated list of somatic variant callers to use. Valid values: deepsomatic, clair. Example: 'deepsomatic,clair'", | ||
| "default": "clair" | ||
| }, |
Comment on lines
+25
to
+29
| - input_files: | ||
| type: file | ||
| description: BAM/CRAM file | ||
| pattern: "*.{bam,cram,sam}" | ||
| ontologies: [] |
Comment on lines
+25
to
+28
| prefix = task.ext.prefix ?: "${meta.id}" | ||
| def file_type = input_files instanceof List ? input_files[0].getExtension() : input_files.getExtension() | ||
| def reference = fasta ? "--reference ${fasta}" : "" | ||
| """ |
Comment on lines
48
to
50
| - name: Run nf-core pipelines lint | ||
| if: ${{ github.base_ref != 'master' }} | ||
| if: ${{ github.base_ref != 'master' || github.base_ref != 'main' }} | ||
| env: |
Comment on lines
56
to
58
| - name: Run nf-core pipelines lint --release | ||
| if: ${{ github.base_ref == 'master' }} | ||
| if: ${{ github.base_ref == 'master' || github.base_ref == 'main' }} | ||
| env: |
Comment on lines
+275
to
+279
| // Each replicate is aligned separately so that minimap2 can tag its reads with a | ||
| // unique @RG (sample + type + replicate). Replicates are merged after alignment. | ||
| // ch_samplesheet is [meta_with_replicate, [bam]] -- one item per replicate per sample. | ||
| // ch_ubams defaults to ch_samplesheet; the fiber-seq block below may override it. | ||
| ch_ubams = ch_samplesheet |
Comment on lines
+510
to
+513
| SAMTOOLS_MERGE( | ||
| ch_aligned_split.multiple, | ||
| [[],[],[],[]] | ||
| ) |
Comment on lines
70
to
102
| "germline_var_keep": { | ||
| "type": "string", | ||
| "description": "Comma-separated list of germline variant callers to use. Valid values: deepvariant, clair. Example: 'deepvariant,clair'", | ||
| "default": "deepvariant,clair" | ||
| "default": "clair" | ||
| }, | ||
| "somatic_var_keep": { | ||
| "type": "string", | ||
| "description": "Comma-separated list of somatic variant callers to use. Valid values: deepsomatic, clair. Example: 'deepsomatic,clair'", | ||
| "default": "deepsomatic,clair" | ||
| "default": "clair" | ||
| }, | ||
| "germline_var_combine": { | ||
| "type": "string", | ||
| "description": "When two germline callers are used, specifies how to combine them. 'consensus' keeps only variants called by both callers; 'all' keeps all variants from both callers.", | ||
| "default": "all", | ||
| "enum": ["consensus", "all"] | ||
| }, | ||
| "somatic_var_combine": { | ||
| "type": "string", | ||
| "description": "When two somatic callers are used, specifies how to combine them. 'consensus' keeps only variants called by both callers; 'all' keeps all variants from both callers.", | ||
| "default": "all", | ||
| "enum": ["consensus", "all"] | ||
| }, | ||
| "prioritize_caller_germline": { | ||
| "type": "string", | ||
| "description": "When both germline callers are used, specifies which caller's format to use for variants called by both. Must be [deepvariant, clair].", | ||
| "default": "deepvariant", | ||
| "default": "clair", | ||
| "enum": ["deepvariant", "clair"] | ||
| }, | ||
| "prioritize_caller_somatic": { | ||
| "type": "string", | ||
| "description": "When both somatic callers are used, specifies which caller's format to use for variants called by both. Must be [deepsomatic, clair].", | ||
| "default": "deepsomatic", | ||
| "default": "clair", | ||
| "enum": ["deepsomatic", "clair"] |
Comment on lines
+39
to
+42
| stub: | ||
| def args = task.ext.args ?: '' | ||
| prefix = task.ext.suffix ? "${meta.id}${task.ext.suffix}" : "${meta.id}" | ||
| def file_type = input_files instanceof List ? input_files[0].getExtension() : input_files.getExtension() |
Comment on lines
172
to
179
| // ch_samplesheet: [meta, [bam...]] | ||
| // meta fields: id, paired_data, type ('tumor'|'normal'), platform ('ont'|'pb'), | ||
| // sex, fiber ('y'|'n'), clair3_model, clairS_model, clairSTO_model, replicate | ||
| // sex, fiber ('y'|'n'), clair3_model, clairS_model, clairSTO_model, | ||
| // replicate, n_replicates | ||
| // paired_data: true for both items in a T/N pair (same value for tumor AND normal rows) | ||
| // n_replicates: total number of replicates for this sample+type combination | ||
| // bam: list of paths (multiple runs for same sample remain as a list until SAMTOOLS_CAT) | ||
| // |
Comment on lines
121
to
123
| if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) { | ||
| error "DEEPVARIANT module does not support Conda. Please use Docker / Singularity / Podman instead." | ||
| } |
Comment on lines
+7
to
+15
| tools: | ||
| - view: | ||
| description: VCF/BCF conversion, view, subset and filter VCF/BCF files. | ||
| homepage: http://samtools.github.io/bcftools/bcftools.html | ||
| documentation: http://www.htslib.org/doc/bcftools.html | ||
| tool_dev_url: https://github.com/samtools/bcftools | ||
| doi: "10.1093/bioinformatics/btp352" | ||
| licence: ["MIT"] | ||
| identifier: biotools:bcftools |
| @@ -0,0 +1,122 @@ | |||
| name: samtools_merge | |||
ljwharbers
approved these changes
May 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).