Skip to content

maartenk/edugain-qual-improv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

67 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

eduGAIN Analysis Package

CI codecov Python 3.11+ GitHub release Code style: ruff License: MIT

Table of Contents

๐ŸŽฏ Overview

A comprehensive Python package for analyzing eduGAIN federation metadata quality, privacy statement coverage, security compliance, and SIRTFI certification. Built following modern Python standards with PEP 517/518/621 compliance.

Version 3.0 Breaking Changes: Privacy statement tracking now includes Identity Providers (IdPs) alongside Service Providers (SPs). This affects CSV format, filter behavior, and statistics output. See Migration Guide and CHANGELOG for details.

Key Features

  • ๐Ÿ”ฐ SIRTFI Coverage Tracking: Comprehensive SIRTFI certification tracking across all CLI outputs (summary, CSV, markdown reports)
  • ๐Ÿ” SIRTFI Compliance Tools: Two specialized CLI tools for security contact and SIRTFI certification validation
  • ๐Ÿ”’ Privacy Statement Monitoring: HTTP accessibility validation for privacy statement URLs (both SPs and IdPs) with dedicated broken links detection tool
  • ๐Ÿ“Š Universal Privacy Tracking: Privacy statements tracked for both Service Providers and Identity Providers (v3.0+)
  • ๐ŸŒ Federation Intelligence: Automatic mapping from registration authorities to friendly names via eduGAIN API
  • ๐Ÿ’พ XDG-Compliant Caching: Smart caching system with configurable expiry (metadata: 12h, federations: 30d, URLs: 7d)
  • ๐Ÿ“ˆ Multiple Output Formats: Summary statistics, detailed CSV exports, markdown reports, and visual PDF reports
  • ๐Ÿ—๏ธ Modern Architecture: Modular design with comprehensive testing (100% for CLI, 90%+ for core modules)
  • โšก Fast Tooling: Ruff for linting and formatting
  • ๐Ÿ“‹ Comprehensive Reporting: Split statistics for SPs vs IdPs with federation-level breakdowns

๐Ÿš€ Quick Start

Install the CLI

python -m pip install --upgrade pip
python -m pip install edugain-analysis

Prefer running from a clone? Install in editable mode instead:

git clone https://github.com/maartenk/edugain-qual-improv.git
cd edugain-qual-improv
python -m pip install -e .

After installation, your environment exposes the CLI entry points edugain-analyze, edugain-seccon, edugain-sirtfi, and edugain-broken-privacy. They live in the Python environmentโ€™s bin/ (or Scripts\ on Windows) like any other console scripts.

Want to run from a clone without installing?

# Shell wrappers (from repo root)
./scripts/app/edugain-analyze.sh --report
./scripts/app/edugain-seccon.sh
./scripts/app/edugain-sirtfi.sh
./scripts/app/edugain-broken-privacy.sh

# Direct module invocation
python -m edugain_analysis.cli.main

# Convenience wrapper (repo root)
python analyze.py

Run your first analysis

# Summary of privacy, security contacts, and SIRTFI coverage
edugain-analyze

# Detailed markdown report grouped by federation
edugain-analyze --report

# CSV export of entities missing privacy statements
edugain-analyze --csv missing-privacy

# Validate privacy statement URLs while producing the summary
edugain-analyze --validate

๐Ÿ“‹ Command Reference

Command Purpose Helpful options
edugain-analyze Main privacy/security/SIRTFI analysis --report, --csv <type>, --validate, --source <file-or-url>
edugain-seccon Entities with security contacts but no SIRTFI --local-file, --no-headers
edugain-sirtfi Entities with SIRTFI but no security contact --local-file, --no-headers
edugain-broken-privacy Entities (SPs and IdPs) with broken privacy links --local-file, --no-headers, --url <metadata-url>

All commands default to the live eduGAIN aggregate metadata. Supply --source path.xml or --source https://custom to work with alternative metadata files.

edugain-analyze

The primary CLI for day-to-day reporting. Generates summaries, CSV exports, and markdown reports in one pass.

# Quick snapshot with SIRTFI coverage in the summary
edugain-analyze

# Markdown report plus live privacy URL checks
edugain-analyze --report-with-validation

# Export only SPs missing both safeguards
edugain-analyze --csv missing-both --no-headers > missing.csv

edugain-seccon

Surfaces entities that publish a security contact but have not completed SIRTFI certificationโ€”ideal for prioritizing follow-up.

# Live metadata
edugain-seccon

# Offline review against a cached aggregate
edugain-seccon --local-file reports/metadata.xml

edugain-sirtfi

Flags entities that claim SIRTFI compliance yet fail to list a security contact, highlighting policy violations.

# Focus on current gaps
edugain-sirtfi

# Custom feed (e.g., private federation snapshot)
edugain-sirtfi --url https://example/federation.xml

edugain-broken-privacy

Identifies entities (both SPs and IdPs) with privacy statement URLs that fail a lightweight accessibility check.

# Default live run with 10 parallel validators
edugain-broken-privacy

# Skip headers when piping into other tooling
edugain-broken-privacy --no-headers | tee broken-urls.csv

# Filter to SPs only if needed
edugain-broken-privacy | grep ',SP,' > sp-broken-privacy.csv

๐Ÿ“ค CSV & Report Exports

edugain-analyze --csv supports the following types (all include SIRTFI columns):

  • entities โ€“ complete view of all entities (both SPs and IdPs with privacy data)
  • federations โ€“ per-federation roll-up (includes IdP privacy statistics)
  • missing-privacy โ€“ entities without privacy statements (both SPs and IdPs)
  • missing-security, missing-both โ€“ security-focused exports
  • urls โ€“ privacy statement URLs for all entities (SPs and IdPs)
  • urls-validated โ€“ includes HTTP status data (enables live validation)

Markdown reports are produced with --report (or --report-with-validation to perform live URL checks while generating the report).

CSV Columns (Entity Export)

Column Description
Federation Friendly federation name (API-backed)
EntityType SP or IdP
OrganizationName Display name from metadata
EntityID SAML entity identifier
HasPrivacyStatement Yes/No (both SPs and IdPs)
PrivacyStatementURL Declared privacy URL (both SPs and IdPs)
HasSecurityContact Yes/No
HasSIRTFI Yes/No
(with validation) URLStatusCode, FinalURL, URLAccessible, RedirectCount, ValidationError

CSV Columns (Federation Export)

Federation-level statistics include:

  • Basic counts: TotalEntities, TotalSPs, TotalIdPs
  • SP privacy: SPsWithPrivacy, SPsMissingPrivacy
  • IdP privacy (v3.0+): IdPsWithPrivacy, IdPsMissingPrivacy
  • Security contacts: EntitiesWithSecurity, SPsWithSecurity, IdPsWithSecurity (and Missing variants)
  • SIRTFI: EntitiesWithSIRTFI, SPsWithSIRTFI, IdPsWithSIRTFI (and Missing variants)
  • Combined SP metrics: SPsWithBoth, SPsWithAtLeastOne, SPsMissingBoth

๐Ÿ”— URL Validation (optional)

Enabling --validate, --report-with-validation, or --csv urls-validated triggers live HTTP checks for privacy statement URLs. Results are cached for seven days to avoid repeat lookups. When validation runs, extra columns are appended to CSV exports:

  • URLStatusCode, FinalURL, URLAccessible, RedirectCount, ValidationError

๐Ÿ“„ Privacy Page Content Quality Analysis (optional)

--validate-content goes beyond HTTP status checks and inspects the actual HTML of each privacy statement page. It detects soft-404s, thin content, and missing GDPR-related language โ€” problems that a plain HTTP 200 would never surface.

This flag implies --validate, so you do not need to pass both.

# Content quality analysis with terminal summary
edugain-analyze --validate-content

# Export per-URL content analysis to CSV
edugain-analyze --csv urls-content-analysis

Quality score

Each privacy page receives a score from 0 to 100:

Tier Score range Meaning
Excellent 90โ€“100 Page passes all quality checks
Good 70โ€“89 Minor issues (e.g. few GDPR keywords, slightly slow)
Fair 50โ€“69 Notable issues needing attention
Poor 30โ€“49 Significant problems affecting compliance
Broken 0โ€“29 Soft-404, empty page, or severe combined issues

Quality issues detected

Issue string What it means
soft-404 Page returns HTTP 200 but displays error content (e.g. "Page not found")
thin-content HTML content under 500 bytes โ€” almost certainly not a real privacy page
empty-content HTML under 100 bytes โ€” effectively blank
non-https Privacy URL uses plain HTTP instead of HTTPS
no-gdpr-keywords Page contains fewer than 2 recognised GDPR-related terms
few-gdpr-keywords Page contains only 2 keywords (3 or more expected)
slow-response Page took longer than 5 seconds to respond
very-slow-response Page took longer than 10 seconds to respond

GDPR keyword detection covers five languages: English, German, French, Spanish, and Swedish. Language is auto-detected from the page's <html lang> attribute or by keyword frequency.

--csv urls-content-analysis

Exports one row per SP privacy URL with the following columns:

Column Description
Federation Friendly federation name
EntityID SAML entity identifier
PrivacyURL Privacy statement URL
StatusCode HTTP response code
ContentQualityScore Score 0โ€“100
HTTPS True/False
ContentLength Raw HTML size in bytes
HasGDPRKeywords True if 2+ keywords found
KeywordCount Number of GDPR keywords matched
IsSoft404 True if a soft-404 was detected
DetectedLanguage ISO 639-1 code (e.g. en, de) or blank
ResponseTimeMs Fetch time in milliseconds
QualityIssues Pipe-separated list of issue strings (e.g. thin-content|non-https)

See docs/content-quality-analysis.md for a full reference including the scoring algorithm and remediation guidance.

โš™๏ธ Advanced Configuration

Runtime defaults live in src/edugain_analysis/config/settings.py. Tweak them if you need to point at alternative metadata sources or adjust validation behaviour.

  • EDUGAIN_METADATA_URL, EDUGAIN_FEDERATIONS_API: swap these when working with staging aggregates or private federation indexes.
  • Cache knobs (METADATA_CACHE_HOURS, FEDERATION_CACHE_DAYS, URL_VALIDATION_CACHE_DAYS) define how long downloads are reused; shorten them during rapid testing, extend them to reduce network traffic.
  • URL validation controls (URL_VALIDATION_TIMEOUT, URL_VALIDATION_DELAY, URL_VALIDATION_THREADS, MAX_CONTENT_SIZE) let you balance accuracy with load on remote sites.
  • REQUEST_TIMEOUT covers metadata and federation lookups; bump it for slow-on-purpose mirrors.

After adjusting the settings file, re-run the CLIโ€”changes take effect immediately because the module is imported at runtime.

๐Ÿ“ฆ Metadata & Caching

  • Default source: eduGAIN aggregate metadata (https://mds.edugain.org/edugain-v2.xml)
  • Overrides: Use --source path.xml for local files or --url (where available) for alternate feeds
  • Caching: Metadata (12h), federation mapping (30d), and URL validation results (7d) are cached in the XDG cache directory (typically ~/.cache/edugain-analysis/)

๐Ÿงฐ Advanced (optional)

  • Run from source: python analyze.py mirrors edugain-analyze
  • Use Docker: docker compose build then docker compose run --rm cli -- --help (defaults to edugain-analyze; swap in edugain-seccon, edugain-sirtfi, or edugain-broken-privacy as needed)
  • Batch everything locally: scripts/dev/local-ci.sh runs linting, tests, coverage, and Docker smoke tests
  • Tweak the helper via env vars: SKIP_COVERAGE=1 or SKIP_DOCKER=1 to skip heavier steps

๐Ÿ› ๏ธ Developer Setup

The Makefile now provides an end-user-friendly workflow for managing the virtual environment and tooling:

# Create/update .venv and install dev+test extras (Python 3.11+)
make install EXTRAS=dev,tests PYTHON=python3.11

# Drop into an activated shell (exit with 'exit' or Ctrl-D)
make shell

# Run linting or the full test suite
make lint
make test

Prefer scripts? scripts/dev/dev-env.sh remains available if you prefer driving the setup script directly:

# Fresh environment with test tooling and coverage plugins
./scripts/dev/dev-env.sh --fresh --with-tests --with-coverage

# Add parallel pytest workers
./scripts/dev/dev-env.sh --with-parallel
  • DEVENV_PYTHON overrides the interpreter search (order: python3.14, python3.13, python3.12, python3.11, python3). Example: DEVENV_PYTHON=python3.11 ./scripts/dev/dev-env.sh.
  • Manual setup: create a virtualenv and run pip install -e ".[dev]". Layer optional extras like .[tests], .[coverage], or .[parallel] as needed.
  • scripts/maintenance/clean-env.sh and scripts/maintenance/clean-artifacts.sh remove virtualenvs and cached outputs when you need a clean slate. Use make clean-artifacts or make clean-cache for the common paths, or call the script with --artifacts-only / --cache-only yourself; add --reports to prune reports/.
  • Test coverage outputs land in artifacts/coverage/ (HTML + XML). The reports/ directory is reserved for CLI exports or cached metadata snapshots and is only pruned when you run scripts/maintenance/clean-artifacts.sh --reports.
  • Script layout: scripts/app/ (CLI wrappers), scripts/dev/ (developer tooling), scripts/maintenance/ (cache & environment cleanup).

๐Ÿ“˜ Documentation

Developer Resources:

  • CLAUDE.md - Architecture, testing, and development workflows for contributors and AI assistants
  • docs/ROADMAP.md - Comprehensive feature roadmap with prioritization and timelines (12-month plan)
  • Dockerfile & docker-compose.yml - Containerized workflow for CLI and tests

Scripts & Utilities:

  • scripts/app/ - CLI wrapper scripts for running without installation
  • scripts/dev/ - Developer tooling (env bootstrap, linting, local CI, Docker helpers)
  • scripts/maintenance/ - Cache and environment cleanup utilities

Artifacts:

  • artifacts/coverage/ - Test coverage reports (HTML + XML)
  • reports/ - CLI exports and downloaded metadata snapshots

๐Ÿ—๏ธ Package Architecture

The package follows Python best practices with a modular structure:

src/edugain_analysis/
โ”œโ”€โ”€ core/                     # Core analysis logic
โ”‚   โ”œโ”€โ”€ analysis.py          # Main analysis functions
โ”‚   โ”œโ”€โ”€ metadata.py          # Metadata handling and XDG-compliant caching
โ”‚   โ””โ”€โ”€ validation.py        # URL validation with parallel processing
โ”œโ”€โ”€ formatters/              # Output formatting
โ”‚   โ””โ”€โ”€ base.py             # Text, CSV, and markdown formatters
โ”œโ”€โ”€ cli/                     # Command-line interfaces
โ”‚   โ”œโ”€โ”€ main.py             # Primary CLI (edugain-analyze)
โ”‚   โ”œโ”€โ”€ seccon.py           # Security contact CLI (edugain-seccon)
โ”‚   โ”œโ”€โ”€ sirtfi.py           # SIRTFI compliance CLI (edugain-sirtfi)
โ”‚   โ””โ”€โ”€ broken_privacy.py   # Broken privacy links CLI (edugain-broken-privacy)
โ””โ”€โ”€ config/                  # Configuration and patterns
    โ””โ”€โ”€ settings.py         # Constants and validation patterns

๐Ÿ” Privacy Statement URL Validation

The package includes a fast privacy statement URL validation system that checks link accessibility across eduGAIN federations for both SPs and IdPs. This helps identify broken privacy statement links that need attention.

How It Works

  1. URL Collection: Extracts privacy statement URLs from entity metadata (both SPs and IdPs)
  2. Parallel Checking: Tests URLs concurrently using 16 threads for fast processing
  3. HTTP Status Validation: Simple status code check:
    • 200-399: Accessible (working link) โœ…
    • 400-599: Broken (needs fixing) โŒ
  4. Real-time Progress: Shows validation progress with visual indicators
  5. Smart Caching: Results cached for 1 week to avoid re-checking unchanged URLs

Usage Examples

# Basic validation with user-friendly summary
python analyze.py --validate

# Get detailed CSV with HTTP status codes for each URL
python analyze.py --csv urls --validate

# Export entities missing privacy statements
python analyze.py --csv missing-privacy

Sample Output

The summary shows simple, actionable information:

๐Ÿ”— Privacy Statement URL Check:
  ๐Ÿ“Š Checked 2,683 privacy statement links

  โ•ญโ”€ ๐Ÿ”— LINK STATUS SUMMARY โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
  โ”‚  โœ… 2,267 links working (84.5%) โ”‚
  โ”‚  โŒ 416 links broken (15.5%)    โ”‚
  โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

CSV Export Details

When using --csv urls --validate, you get detailed technical information:

Field Description Example
StatusCode HTTP response code 200, 404, 500
FinalURL URL after redirects https://example.org/privacy
Accessible Working status Yes / No
ContentType MIME type text/html
RedirectCount Number of redirects 0, 1, 2
ValidationError Error details Connection timeout

This gives technical staff the specific information needed to fix broken links.

๐Ÿ”ฐ SIRTFI Coverage Tracking

The main analysis tools (edugain-analyze, python analyze.py) now include comprehensive SIRTFI (Security Incident Response Trust Framework for Federated Identity) certification tracking across all output formats.

What is SIRTFI?

SIRTFI is a framework that enables the coordination of incident response activities for federated identity services. Entities with SIRTFI certification have committed to specific incident response capabilities and communication practices.

Output Examples

Summary Statistics (includes SIRTFI section):

=== eduGAIN Quality Analysis: Privacy, Security & SIRTFI Coverage ===
Total entities analyzed: 10,234 (SPs: 6,145, IdPs: 4,089)

๐Ÿ”ฐ SIRTFI Certification Coverage:
  โœ… Total entities with SIRTFI: 4,623 out of 10,234 (45.2%)
  โŒ Total entities without SIRTFI: 5,611 out of 10,234 (54.8%)
    ๐Ÿ“Š SPs: 2,768 with / 3,377 without (45.0% coverage)
    ๐Ÿ“Š IdPs: 1,855 with / 2,234 without (45.4% coverage)

CSV Entity Export (includes HasSIRTFI column):

Federation,EntityType,OrganizationName,EntityID,HasPrivacyStatement,PrivacyStatementURL,HasSecurityContact,HasSIRTFI
InCommon,SP,Example University,https://sp.example.edu,Yes,https://example.edu/privacy,Yes,Yes
DFN-AAI,IdP,Test Institute,https://idp.test.de,N/A,N/A,Yes,No

Federation Statistics CSV (includes SIRTFI columns):

Federation,TotalEntities,TotalSPs,TotalIdPs,SPsWithPrivacy,SPsMissingPrivacy,EntitiesWithSecurity,EntitiesMissingSecurity,SPsWithSecurity,SPsMissingSecurity,IdPsWithSecurity,IdPsMissingSecurity,EntitiesWithSIRTFI,EntitiesMissingSIRTFI,SPsWithSIRTFI,SPsMissingSIRTFI,IdPsWithSIRTFI,IdPsMissingSIRTFI,SPsWithBoth,SPsWithAtLeastOne,SPsMissingBoth
InCommon,3450,2100,1350,1890,210,2760,690,1680,420,1080,270,1552,1898,945,1155,607,743,1575,2058,42

Markdown Reports (includes per-federation SIRTFI statistics):

## Federation Analysis

### InCommon (3,450 entities: 2,100 SPs, 1,350 IdPs)

**SIRTFI Certification:** ๐ŸŸข 1,552/3,450 (45.0%)
  โ”œโ”€ SPs: ๐ŸŸก 945/2,100 (45.0%)
  โ””โ”€ IdPs: ๐ŸŸก 607/1,350 (45.0%)

Key Points

  • Applies to Both SPs and IdPs: Unlike privacy statements (SP-only), SIRTFI applies to both entity types
  • Always Included: The HasSIRTFI column is automatically included in all entity CSV exports
  • Federation Breakdown: Per-federation statistics show SIRTFI coverage at the federation level
  • Color-Coded Status: ๐ŸŸข (โ‰ฅ80%), ๐ŸŸก (50-79%), ๐Ÿ”ด (<50%) for visual feedback

For SIRTFI compliance validation and finding violations, use the specialized tools:

  • edugain-seccon: Find entities with security contacts but without SIRTFI (potential candidates)
  • edugain-sirtfi: Find entities with SIRTFI but without security contacts (compliance violations)

๐Ÿ’พ Cache Management

All data is stored in XDG-compliant cache directories:

Cache Location by Platform:

  • macOS: ~/Library/Caches/edugain-analysis/
  • Linux: ~/.cache/edugain-analysis/
  • Windows: %LOCALAPPDATA%\edugain\edugain-analysis\Cache\

Cache Files:

  • metadata.xml - eduGAIN metadata (expires after 12 hours)
  • federations.json - Federation name mappings (expires after 30 days)
  • url_validation.json - URL validation results (expires after 7 days)

Cache Management Commands:

# View cache location
python -c "from platformdirs import user_cache_dir; print(user_cache_dir('edugain-analysis', 'edugain'))"

# Clear cache to force fresh download
rm -rf ~/Library/Caches/edugain-analysis/metadata.xml  # macOS
rm -rf ~/.cache/edugain-analysis/metadata.xml           # Linux

๐Ÿ“Š Output Examples

Summary Statistics (v3.0+)

=== eduGAIN Quality Analysis: Privacy, Security & SIRTFI Coverage ===
Total entities analyzed: 8,234 (SPs: 3,849, IdPs: 4,385)

๐Ÿ“Š Privacy Statement URL Coverage: ๐ŸŸก 3,720/8,234 (45.2%)
  โ”œโ”€ SPs: ๐ŸŸก 2,681/3,849 (69.7%)
  โ””โ”€ IdPs: ๐Ÿ”ด 1,039/4,385 (23.7%)
โŒ Missing: 4,514/8,234 (54.8%)

๐Ÿ›ก๏ธ  Security Contact Coverage: ๐ŸŸก 4,096/8,234 (49.8%)
  โ”œโ”€ SPs: ๐ŸŸข 1,205/3,849 (31.3%)
  โ””โ”€ IdPs: ๐ŸŸข 2,891/4,385 (65.9%)

๐Ÿ”ฐ SIRTFI Certification Coverage: ๐ŸŸก 3,715/8,234 (45.1%)
  โ”œโ”€ SPs: ๐ŸŸก 1,732/3,849 (45.0%)
  โ””โ”€ IdPs: ๐ŸŸก 1,983/4,385 (45.2%)

๐ŸŒ Federation Coverage: 73 federations analyzed

Note: Privacy coverage now includes both SPs and IdPs. Previously (v2.x) only SPs were tracked.

CSV Export Formats

  • entities: All entities with privacy/security status (IdPs include privacy data)
  • federations: Federation-level statistics (includes IdP privacy columns)
  • missing-privacy: Entities without privacy statements (both SPs and IdPs)
  • missing-security: Entities without security contacts
  • missing-both: SPs missing both privacy and security
  • urls: URL validation results (with --validate) for all entity types

๐Ÿšจ Version History Highlights

v3.0 (Unreleased) - IdP Privacy Tracking

Breaking Changes:

  • Privacy statement tracking extended to IdPs (previously SP-only)
  • CSV format changes: IdP rows now show Yes/No instead of N/A for privacy
  • Federation CSV adds IdPsWithPrivacy and IdPsMissingPrivacy columns
  • Filter --csv missing-privacy now returns both SPs and IdPs
  • See Migration Guide for upgrade instructions

New Features:

  • Universal privacy tracking across all entity types
  • PDF reports include IdP privacy KPI and comparative charts
  • Enhanced statistics with SP vs IdP breakdown
  • edugain-broken-privacy validates IdP privacy URLs

v2.4.3 - Reorganization & Developer Experience

  • Reorganized script structure for clarity
  • Enhanced Makefile with user-focused guidance
  • Local CI script for offline quality checks

๐Ÿณ Run via Docker

The Docker image ships with the package pre-installed inside /opt/venv and exposes the CLI entry points through a lightweight entrypoint script:

# Build once (respects DEVENV_PYTHON and INSTALL_EXTRAS build args)
docker compose build

# Run the main analyzer (defaults to edugain-analyze when no args are given)
docker compose run --rm cli -- --report --output artifacts/report.md

# Run other CLIs by overriding the command
docker compose run --rm cli edugain-seccon --summary
docker compose run --rm cli edugain-broken-privacy --validate

Pip downloads and eduGAIN metadata caches are persisted in the named volumes declared in docker-compose.yml (pip-cache, edugain-cache). Remove them with docker volume rm if you need a cold start. The project folder is still bind-mounted into /app, so anything written to reports/ or artifacts/ is immediately available on the host.

๐Ÿ“‹ Requirements

  • Python: 3.11 or later (tested on 3.11, 3.12, 3.13, 3.14)
  • Dependencies:
    • requests (โ‰ฅ2.28.0) - HTTP requests
    • platformdirs (โ‰ฅ3.0.0) - XDG-compliant directories
  • Development Dependencies (install with [dev]):
    • pytest, pytest-cov, pytest-xdist - Testing and coverage
    • ruff - Linting and formatting
    • pre-commit - Git hooks for quality assurance

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes following the coding standards
  4. Run tests and linting (pytest && scripts/dev/lint.sh)
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Based on the original work from eduGAIN contacts analysis
  • Built for the eduGAIN community to improve federation metadata quality
  • Follows Python packaging standards (PEP 517/518/561/621)

๐Ÿ“ž Support

About

eduGAIN Quality Improvement

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages