SecPipeAI

AI-Augmented Anomaly Detection and Threat Mitigation Framework for Cloud-Native DevSecOps Pipelines

Overview

SecPipeAI is a reproducible machine-learning framework for network intrusion detection, designed to integrate into cloud-native DevSecOps pipelines. It benchmarks multiple classifiers on two widely used cybersecurity datasets (CICIDS2017 and UNSW-NB15) with full statistical rigor including multi-seed evaluation, bootstrap confidence intervals, McNemar's test, and Cliff's delta effect sizes.

The framework is CPU-only and runs on commodity hardware (8 GB RAM), making it accessible for research and production deployment without GPU dependencies.

Architecture

The framework is organized into three logical modules: a Collector for data ingestion, validation, and leakage-free preprocessing; a Detector that runs four ML classifiers (Dummy, Logistic Regression, Random Forest, XGBoost) with multi-seed evaluation; and an Orchestrator that coordinates the pipeline via a single Makefile and produces publication-ready statistical artifacts.

Key Features

Leakage-free preprocessing: imputers, scalers, and encoders fit on training data only
Multi-seed evaluation: 5 independent seeds with bootstrap 95% confidence intervals
Statistical testing: pairwise McNemar's test, Wilcoxon signed-rank, Cliff's delta
Four baseline classifiers: Dummy, Logistic Regression, Random Forest, XGBoost
Publication-ready outputs: LaTeX tables, ROC/PR curves, confusion matrices, bar charts
Full reproducibility: Makefile pipeline, pinned dependencies, SHA-256 data checksums
CPU-only: runs on 8 GB RAM without GPU

Installation

Prerequisites

Python 3.10+
8 GB RAM minimum
~5 GB disk for raw datasets

Setup

git clone https://github.com/nnolas27/SecPipeAI.git
cd SecPipeAI
make setup
source .venv/bin/activate

Docker

docker build -t secpipeai .
docker run --rm -v $(pwd)/data:/app/data -v $(pwd)/outputs:/app/outputs secpipeai make all

Quick Start

# 1. Download datasets (follow printed instructions)
make data

# 2. Run full pipeline
make all

# 3. Generate publication artifacts
make paper_artifacts

Pipeline Commands

Command	Description
`make setup`	Create venv and install pinned dependencies
`make data`	Print download instructions and verify checksums
`make preprocess_cicids2017`	Preprocess CICIDS2017 dataset
`make preprocess_unsw_nb15`	Preprocess UNSW-NB15 dataset
`make train`	Train all models on both datasets
`make eval`	Generate metrics, confusion matrices, ROC plots
`make stats`	Pairwise McNemar tests (CSV + LaTeX)
`make seeds`	Multi-seed training runs (default: 5 seeds)
`make aggregate`	Aggregate seed results (mean, std, CI)
`make stats_advanced`	Bootstrap CI, Cliff's delta, Wilcoxon
`make paper_artifacts`	Generate all publication-ready artifacts
`make serve`	Start inference API server on port 8000
`make clean`	Remove outputs and processed data

Scope to a single dataset: make train DATASET=cicids2017

DevSecOps Integration

SecPipeAI exposes trained classifiers as a REST API for integration into CI/CD pipelines, container runtime monitors, and network probes. Run make all first to train models, then start the inference server:

make serve
# or directly:
uvicorn src.api.inference:app --host 0.0.0.0 --port 8000

API Endpoints

Endpoint	Method	Description
`/detect`	POST	Classify a single network flow feature vector
`/detect/batch`	POST	Classify a batch of flows; returns `attack_rate`
`/health`	GET	Liveness probe for Kubernetes / load balancers
`/models`	GET	List available trained models and metadata
`/docs`	GET	Interactive OpenAPI documentation

Pipeline Security Gate

The /detect/batch endpoint returns an attack_rate field (fraction of flows classified as attacks). A pipeline gate reads this field and blocks the pipeline if it exceeds a threshold:

# GitHub Actions environment variable (default: 1% attack rate threshold)
SECPIPEAI_ALERT_THRESHOLD=0.01

The .github/workflows/secpipeai-devsecops.yml workflow demonstrates the full integration: dependency integrity check, API schema validation, and pipeline gate. The gate fails with exit code 1 when attack_rate > threshold.

Single-Flow Detection

curl -X POST http://localhost:8000/detect \
  -H 'Content-Type: application/json' \
  -d '{
    "features": [0.0, 1.0, ...],
    "dataset": "cicids2017",
    "model_name": "xgboost",
    "source_component": "my-pipeline-agent",
    "pipeline_id": "run-12345"
  }'

Response:

{
  "prediction": 0,
  "label": "BENIGN",
  "confidence": 0.003,
  "alert": false,
  "inference_latency_ms": 1.2,
  "timestamp_utc": "2026-04-19T12:00:00+00:00"
}

Docker (Inference API)

docker build -f Dockerfile.api -t secpipeai-api .
docker run --rm -p 8000:8000 -v $(pwd)/outputs:/app/outputs secpipeai-api

Datasets

Dataset	Source	Samples	Features
CICIDS2017	UNB CIC	2.83M	77
UNSW-NB15	UNSW	257K	190

Raw data must be placed under data/raw/ manually. Run make data for download instructions.

Note: Canonical download links for these datasets are intermittently unavailable. Kaggle and Hugging Face mirrors are reliable alternatives.

Results Summary

CICIDS2017

Model	Macro-F1 (mean +/- std)	ROC-AUC
Dummy	0.4454 +/- 0.0000	-
Logistic Regression	0.8808 +/- 0.0027	-
Random Forest	0.9967 +/- 0.0001	-
XGBoost	0.9981 +/- 0.0001	0.9999

Best model: XGBoost (Macro-F1 = 0.998, 95% CI [0.997, 0.998])

UNSW-NB15

Model	Macro-F1 (mean +/- std)	ROC-AUC
Dummy	0.4050 +/- 0.0000	-
Logistic Regression	0.8684 +/- 0.0000	-
Random Forest	0.8959 +/- 0.0004	0.9862
XGBoost	0.8919 +/- 0.0003	-

Best model: Random Forest (Macro-F1 = 0.896, 95% CI [0.892, 0.898])

All pairwise comparisons (best vs. baseline) significant at p < 0.0001 (McNemar's test). Cliff's delta = 1.0 (large effect).

Output Structure

outputs/
├── paper/
│   ├── figures/              # ROC, PR, confusion, bar charts
│   ├── final_results_table.{csv,tex}
│   ├── final_stats_table.{csv,tex}
│   ├── key_numbers.json      # Machine-readable results
│   └── README_paper_artifacts.md
├── models/<dataset>/         # Trained models (.joblib) + metadata
├── metrics/<dataset>/        # Per-model metrics, aggregate stats
└── figures/<dataset>/        # Per-dataset visualizations

Reproducibility

All experiments are fully reproducible:

configs/experiment.yaml: hyperparameters, dataset paths, seed configuration
configs/checksums.yaml: SHA-256 checksums for raw data files
requirements.txt: fully pinned Python dependencies
random_state=42 used throughout (overridden per-seed in multi-seed runs)
Dockerfile provided for containerized reproduction

Citation

If you use SecPipeAI in your research, please cite:

@software{singh2026secpipeai,
  author       = {Singh, Nihal},
  title        = {{SecPipeAI: AI-Augmented Anomaly Detection and Threat
                   Mitigation Framework for Cloud-Native DevSecOps Pipelines}},
  year         = {2026},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.18766118},
  url          = {https://doi.org/10.5281/zenodo.18766118}
}

License

This project is licensed under the Apache License 2.0. See LICENSE for details.

Contact

Nihal Singh

GitHub: nnolas27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SecPipeAI

Overview

Architecture

Key Features

Installation

Prerequisites

Setup

Docker

Quick Start

Pipeline Commands

DevSecOps Integration

API Endpoints

Pipeline Security Gate

Single-Flow Detection

Docker (Inference API)

Datasets

Results Summary

CICIDS2017

UNSW-NB15

Output Structure

Reproducibility

Citation

License

Contact

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
configs		configs
docs		docs
notebooks		notebooks
outputs/paper		outputs/paper
scripts		scripts
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.api		Dockerfile.api
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

SecPipeAI

Overview

Architecture

Key Features

Installation

Prerequisites

Setup

Docker

Quick Start

Pipeline Commands

DevSecOps Integration

API Endpoints

Pipeline Security Gate

Single-Flow Detection

Docker (Inference API)

Datasets

Results Summary

CICIDS2017

UNSW-NB15

Output Structure

Reproducibility

Citation

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages