PANGOLIN: Pan-cancer Analysis of Gene regulatory landscape Of LIoness Networks

Overview

PANGOLIN is a comprehensive Snakemake pipeline for pan-cancer analysis of gene regulatory networks using TCGA data, with a specific focus on PD-1 pathway analysis. This pipeline reproduces all analyses and figures from our manuscript investigating gene regulatory landscapes across 33 cancer types.

🚀 Quick Start

Prerequisites

Snakemake ≥ 7.23.1
Conda for environment management
R ≥ 4.2.1 with Bioconductor packages
Python ≥ 3.8
Sufficient disk space: ~1TB for full workflow, ~20GB for precomputed analysis

Installation

# Clone the repository
git clone https://github.com/kuijjerlab/PANGOLIN.git
cd PANGOLIN

Configuration

Choose analysis type in config.yaml:

# For complete reproduction (takes several days/weeks due to the single-sample network reconstruction for over 9 000 samples)
analysis_type: "full_workflow"

# For a more rapid analysis and figure reproduction using precomputed data (RECOMMENDED)
analysis_type: "precomputed"

Execution

# Dry run to check workflow
snakemake --use-conda --conda-frontend conda --cores 1 -np

# Execute pipeline
snakemake --use-conda --conda-frontend conda --cores 1

📊 Analysis Workflow

Data Processing Pipeline

📥 Data Acquisition
- Downloads TCGA expression data for 33 cancer types using TCGAbiolinks package
- Retrieves clinical information
- Processes batch information and produces batch figure
🔧 Expression Data Normalization
- Combines multi-cancer expression matrices
- Applies qsmooth normalization using PySNAIL
- Performs batch effect detection and correction
🕸️ Network Inference (Full workflow only)
- Constructs gene regulatory network using PANDA
- Generates patient-specific networks with LIONESS
- Calculates network-based features (in-degree)
📈 Dimensionality Reduction & Pathway Analysis
- t-SNE analysis of expression and network features
- PORCUPINE pathway heterogeneity analysis
- Reactome pathway enrichment
🧬 PD-1 Pathway Analysis
- Extracts PD-1 pathway components and scores
- Correlates with immune infiltration (CIBERSORTx)
- Clinical association analysis (survival, molecular features)
🎯 Clustering Analysis
- Consensus clustering using Cola
- Cancer type-specific cluster characterization
- Survival analysis of identified clusters
- Comparison of clusters for PRAD

📁 Output Structure

results/
├── data_all/                              # Pan-cancer results
│   ├── gdc_data/                            # Raw TCGA data
│   ├── batch_analysis/                      # Batch effect analysis
│   ├── batch_corrected_expression/          # Batch-corrected expression data
│   ├── clinical_associations_PD1/           # PD-1 clinical associations
│   ├── cola_consensus_clustering/           # all cancers consensus clustering
│   ├── combined_gdc_data/                   # Combined normalized expression
│   ├── cox_results_all/                     # survival analysis
│   ├── porcupine/                           # PORCUPINE pathway analysis
│   ├── pysnail_normalized_individual_cancer_expression/ # Normalized data per cancer
│   ├── tsne_results/                        # t-SNE dimensionality reduction
│   └── logs/                                # Processing logs
├── data_individual_cancers/                 # Cancer-specific results
│   └── [CANCER]/                            # Individual cancer directories
│       ├── pd1_data/                          # PD-1 scores and mappings
│       ├── cox/                               # Survival analysis
│       ├── consensus_clustering/              # Cancer-specific clustering
│       ├── final_clusters/                    # Final cluster assignments
│       ├── clinical_associations/             # Clinical correlations
│       ├── indegrees_norm/                    # Network in-degree features
│       ├── porcupine/                         # Pathway analysis results
│       └── clinical/                          # Clinical data processing
├── panda_input/                           # PANDA network input files
└── figs/                                  # Publication-ready figures
    ├── MBatch_DSC.pdf                       # Batch effect summary
    ├── TSNE_*.pdf                           # t-SNE visualizations
    ├── PC_immune_correlations_cibersort.png # Immune correlations
    ├── cox_results_final_clusters_*.pdf     # Survival analysis plots
    ├── PRAD_clusters_*.pdf                  # PRAD-specific analyses
    ├── pathways_intersection_pcp.pdf        # Pathway intersection analysis
    ├── sankey_plot_indegree_expression.pdf  # Sankey diagrams
    └── summary_table_PD1.html               # Results summary table

🎯 Key Features

Analysis Types

Full Workflow: Complete analysis from raw data (a very long runtime)
- Downloads and processes all TCGA data
- Constructs patient-specific regulatory networks
- Performs all downstream analyses
Precomputed Workflow: Rapid reproduction using intermediate files (~4 hours)
- Downloads precomputed expression data from Zenodo
- Downloads precomputed network features from Zenodo
- Peforms most of the downstream analysis (exluding network generation)
- Focuses on statistical analysis and figure generation

Cancer Types Analyzed

33 TCGA cancer types: ACC, BLCA, BRCA, CESC, CHOL, COAD, DLBC, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LAML, LGG, LIHC, LUAD, LUSC, MESO, OV, PAAD, PCPG, PRAD, READ, SARC, SKCM, STAD, TGCT, THCA, THYM, UCEC, UCS, UVM

Software Versions

Snakemake: 7.23.1
R: 4.2.1
Python: 3.8+
Key R packages: check package_requirements.txt. Container is on the way...

📊 Generated Figures

The pipeline generates all manuscript figures

📚 Citation

If you use PANGOLIN in your research, please cite:

@article{.....,
  title={Pan-cancer analysis of patient-specific gene regulatory landscapes identifies recurrent PD-1 pathway dysregulation},
  author={Belova et al.},
  journal={Journal Name},
  year={2025},
  doi={10.xxxx/xxxxx}
}

Name		Name	Last commit message	Last commit date
Latest commit History 308 Commits
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.txt		MANIFEST.txt
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml
package_requirements.txt		package_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PANGOLIN: Pan-cancer Analysis of Gene regulatory landscape Of LIoness Networks

Overview

🚀 Quick Start

Prerequisites

Installation

Configuration

Execution

📊 Analysis Workflow

Data Processing Pipeline

📁 Output Structure

🎯 Key Features

Analysis Types

Cancer Types Analyzed

Software Versions

📊 Generated Figures

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PANGOLIN: Pan-cancer Analysis of Gene regulatory landscape Of LIoness Networks

Overview

🚀 Quick Start

Prerequisites

Installation

Configuration

Execution

📊 Analysis Workflow

Data Processing Pipeline

📁 Output Structure

🎯 Key Features

Analysis Types

Cancer Types Analyzed

Software Versions

📊 Generated Figures

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages