Open-source, embedded-ML network detection and response system protecting critical infrastructure from ransomware and DDoS attacks.
📜 Living contracts: Protobuf schema · Pipeline configs · RAG API
✅ main is tagged v0.5.2-hardened — DAY 125-127 debt closure complete. 9 debts closed across 3 days.
PRE-PRODUCTION: do not deploy in hospitals until ACRL (DEBT-PENTESTER-LOOP-001) is complete.
Tag activo: v0.5.2-hardened | Branch activa: main (limpio)
- 6/6 componentes RUNNING con
resolve_config()activo make test-all: ALL TESTS COMPLETE (desde VM fría:vagrant halt → make up → make bootstrap → make test-all)- TEST-PROVISION-1: 8/8 OK
- 9 deudas cerradas en 3 días consecutivos
- Tag v0.5.2-hardened mergeado a main
- Hallazgo crítico DAY 126:
fs::is_symlink(resolved)es inútil post-weakly_canonical().lstat()sobre el path original es la única defensa correcta para material criptográfico. - Hallazgo arquitectónico DAY 127:
lexically_normal()vsweakly_canonical()— dos herramientas para dos casos de seguridad distintos. Nueva primitivaresolve_config()para paridad dev/prod via symlinks. - Consejo 8/8 DAY 127: Taxonomía safe_path formalizada. Pregunta crítica FEDER: ¿NDR standalone o federación? Clarificar con Andrés Caro Lindo antes de julio.
Ver docs/BACKLOG.md para detalle completo.
| Deuda | Prioridad | Target |
|---|---|---|
| DEBT-SNYK-WEB-VERIFICATION-001 | 🟡 Media | DAY 128 |
| DEBT-PROPERTY-TESTING-PATTERN-001 | 🟡 Media | DAY 128 |
| DEBT-SAFE-PATH-TAXONOMY-DOC-001 | 🟡 Media | DAY 128 |
| DEBT-PROVISION-PORTABILITY-001 | 🟢 Media | DAY 128 |
- DEBT-PENTESTER-LOOP-001 — ACRL: Caldera → eBPF capture → XGBoost retrain → Ed25519 sign → hot-swap
- DEBT-PENTESTER-LOOP-001 completado (datos reales ACRL)
- ADR-036 (Formal Verification Baseline)
| Variante | Estado | Descripción |
|---|---|---|
| aRGus-dev | ✅ Activa (main) |
x86-debug, imagen Vagrant completa, build-debug. Para investigación y desarrollo diario. |
| aRGus-production | 🟡 Pendiente | x86-apparmor + arm64-apparmor. Imágenes Debian optimizadas. Para hospitales, escuelas, municipios. |
| aRGus-seL4 | ⏳ Diseño futuro | Apéndice científico. Kernel seL4, libpcap (no eBPF/XDP), sniffer monohilo. Branch independiente. |
ML Defender (aRGus NDR) is documented in a peer-reviewed preprint published on arXiv cs.CR (April 2026).
ML Defender (aRGus NDR): An Open-Source Embedded ML NIDS for Botnet and Anomalous Traffic Detection in Resource-Constrained Organizations — Alonso Isidoro Román
arXiv: arXiv:2604.04952 [cs.CR] DOI: https://doi.org/10.48550/arXiv.2604.04952 Published: 3 April 2026 · Draft v16 (updated 19 April 2026) · MIT license Code: https://github.com/alonsoir/argus
Democratize enterprise-grade cybersecurity for hospitals, schools, and small organizations that cannot afford commercial solutions. Built to last decades with scientific honesty and methodical development.
Philosophy: Via Appia Quality — Systems built like Roman roads, designed to endure.
"Un escudo que aprende de su propia sombra."
ML Defender is a Network Detection and Response (NDR) system. Its guiding principle is network surveillance: every component operates on network traffic.
Physical and removable-media vectors are explicitly out of scope by conscious design decision. Complementary mode with Wazuh for file integrity monitoring.
| Metric | Value | Notes |
|---|---|---|
| F1-score (CTU-13 Neris) | 0.9985 | Stable across 4 replay runs |
| Precision | 0.9969 | |
| Recall | 1.0000 | Zero missed attacks (FN=0) |
| XGBoost Precision (CIC-IDS-2017 val) | 0.9945 | In-distribution, threshold=0.8211 |
| XGBoost Recall (CIC-IDS-2017 val) | 0.9818 | In-distribution |
| XGBoost F1 (CIC-IDS-2017 val) | 0.9881 | Val-AUCPR=0.99846 |
| XGBoost Wednesday OOD | Documented impossibility | Structural covariate shift — see §8 paper |
| Inference latency (XGBoost) | 1.986 µs/sample | Gate <2µs ✅ |
| Inference latency (RF) | 0.24–1.06 µs | Per-class, embedded C++20 |
| Throughput ceiling (virtualized) | ~33–38 Mbps | VirtualBox NIC limit, not pipeline |
| Stress test | 2,374,845 packets — 0 drops | 100 Mbps requested, loop=3 |
| RAM (full pipeline) | ~1.28 GB | Stable under load |
| Pipeline components | 6/6 RUNNING | Reproducible from make bootstrap |
| Plugin integrity | ADR-025 MERGED | Ed25519 + TOCTOU-safe dlopen |
| AppArmor | 6/6 enforce | 0 denials |
| Path traversal prevention | ADR-037 MERGED | safe_path header-only — 3 primitivas + 16+ RED→GREEN tests |
| Dev/prod parity | DAY 127 MERGED | resolve_config() — symlinks legítimos en prefix confiable |
| CI gate | TEST-PROVISION-1 8/8 |
On DAY 122, a rigorous temporal holdout evaluation on CIC-IDS-2017 revealed a structural covariate shift: Wednesday contains exclusively application-layer DoS attacks (Hulk, GoldenEye, Slowloris) absent from all training days. No threshold can simultaneously satisfy Precision≥0.99 and Recall≥0.95 on Wednesday data. This is not an XGBoost failure — it is an empirical impossibility result caused by the dataset's day-specific attack segregation design.
This finding corroborates Sommer & Paxson (2010) and provides new quantitative evidence that static classifiers trained on academic benchmarks are structurally insufficient for production NDR.
The architectural response — the Adversarial Capture-Retrain Loop (ACRL) — is proposed in §11.18 of the paper.
contrib/safe-path/ is a zero-dependency C++20 header-only library that prevents path traversal attacks across all production components. Three active primitives with distinct security semantics:
// General paths — prefix verified post-canonical resolution
const auto safe = argus::safe_path::resolve(path, "/etc/ml-defender/");
// Cryptographic seed material — lstat() PRE-resolution, symlinks strictly rejected
// (fs::is_symlink(resolved) arrives too late — weakly_canonical() already resolved it)
const int fd = argus::safe_path::resolve_seed(seed_path, keys_dir_);
// Config files with legitimate symlinks — lexically_normal() verifies prefix
// BEFORE following symlinks (enables /etc/ml-defender/ → /vagrant/ dev/prod parity)
const auto cfg = argus::safe_path::resolve_config(config_path, "/etc/ml-defender/");Taxonomy (Consejo 8/8 · DAY 127):
| Primitive | Use case | Symlinks | Verification |
|---|---|---|---|
resolve() |
General paths | Allowed post-check | weakly_canonical() post-resolution |
resolve_seed() |
Crypto material | ❌ Strictly rejected | lstat() pre-resolution |
resolve_config() |
Config files | ✅ Allowed in prefix | lexically_normal() pre-resolution |
resolve_model() |
ML models (future) | TBD | Ed25519 signature verify — backlog ADR-038 |
DAY 125-127 validated key TDH principles through empirical evidence:
// memory_utils.hpp — header-only, independently testable
[[nodiscard]] inline double compute_memory_mb(long pages, long page_size) noexcept {
return (static_cast<double>(pages) * static_cast<double>(page_size)) / (1024.0 * 1024.0);
}
// Note: double chosen over int64_t — LONG_MAX/4096 * 8192 overflows int64_t.
// Property test PropertyNeverNegative caught this latent bug in the int64_t version.Testing hierarchy (Consejo 8/8 · DAY 127):
| Layer | What it verifies | When |
|---|---|---|
| Unit tests | Specific known cases (RED→GREEN) | Every security fix |
| Property tests | Mathematical invariants | Every security fix |
| Fuzzing (libFuzzer) | Parsers and external interfaces | Post-property-testing |
| Mutation testing | Test suite quality | Pre-major-release |
Permanent rules (Council 8/8):
- Every security fix must include: (1) unit test RED→GREEN, (2) property test for invariants, (3) integration test in real component.
- Every new file-handling surface must be classified with PathPolicy before implementation.
┌──────────────────────────────────────────────────────────────────┐
│ ML Defender Pipeline │
├──────────────────────────────────────────────────────────────────┤
│ Network Traffic │
│ ↓ │
│ ┌──────────────────┐ │
│ │ sniffer (C++20) │ eBPF/XDP zero-copy packet capture │
│ │ │ ShardedFlowManager (16 shards) │
│ └──────────────────┘ │
│ ↓ ZeroMQ (ChaCha20-Poly1305 encrypted) │
│ ┌──────────────────┐ │
│ │ ml-detector │ 4× Embedded RandomForest classifiers │
│ │ (C++20) │ XGBoost plugin ADR-026 ✅ Prec=0.9945 │
│ └──────────────────┘ │
│ ↓ ZeroMQ (encrypted) │
│ ┌──────────────────┐ │
│ │ etcd-server │ Component registration + seed distrib. │
│ └──────────────────┘ │
│ ↓ │
│ ┌──────────────────┐ │
│ │ firewall-acl │ Autonomous blocking via ipset/iptables │
│ │ agent (C++20) │ safe_path::resolve_config() DAY 127 ✅ │
│ └──────────────────┘ │
│ ↓ │
│ ┌──────────────────┐ │
│ │ rag-ingester │ FAISS + SQLite event ingestion │
│ │ (C++20) │ safe_path::resolve_config() DAY 127 ✅ │
│ └──────────────────┘ │
│ ↓ │
│ ┌──────────────────┐ │
│ │ rag-security │ TinyLlama natural language interface │
│ │ (C++20+LLM) │ Local inference — no cloud exfiltration │
│ └──────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
Critical rules:
- Always use
make <target>. Never compile or install manually in the VM.- The Vagrantfile and Makefile are the single source of truth.
git clone https://github.com/alonsoir/argus.git
cd argus
make up # vagrant up — full provisioning ~20-30 min
make bootstrap # all 8 steps in one commandmake up
make pipeline-stop
make pipeline-build
make sign-plugins && make sign-models
make pipeline-start && make pipeline-status
make test-allmake test-all
# Runs: libs + components + TEST-PROVISION-1 (8/8)
# TEST-INVARIANT-SEED + plugin-integ-test (6/6 incl. TEST-INTEG-SIGN)- resolve_config() ✅ — nueva primitiva safe_path para configs con symlinks legítimos
- Makefile paths absolutos ✅ — fin de paths relativos en arranque de componentes
- Consejo 8/8 ✅ — taxonomía safe_path formalizada + pregunta crítica FEDER
- DEBT-SAFE-PATH-SEED-SYMLINK-001 ✅ — lstat() pre-resolution, 11/11 tests
- DEBT-CONFIG-PARSER-FIXED-PREFIX-001 ✅ — prefix fijo, 4/4 + 3/3 tests
- DEBT-PRODUCTION-TESTS-REMAINING-001 ✅ — seed-client + firewall 3/3 + 3/3
- DEBT-MEMORY-UTILS-BOUNDS-001 ✅ — MAX_REALISTIC_MEMORY_MB, 5/5 tests
- Tag: v0.5.2-hardened ✅
- DEBT-GITIGNORE-TEST-SOURCES-001 ✅
- DEBT-INTEGER-OVERFLOW-TEST-001 ✅ — property test caught latent bug in own fix
- DEBT-SAFE-PATH-TEST-RELATIVE-001 ✅
- DEBT-SAFE-PATH-TEST-PRODUCTION-001 ✅ (rag-ingester)
- DEBT-CRYPTO-TRANSPORT-CTEST-001 ✅
- ADR-037 —
contrib/safe-path/header-only C++20 · 9 RED→GREEN tests ✅ - Tag: v0.5.1-hardened ✅
| Priority | Task |
|---|---|
| 🟡 P1 | DEBT-SNYK-WEB-VERIFICATION-001 — Snyk web sobre v0.5.2-hardened |
| 🟡 P1 | DEBT-PROPERTY-TESTING-PATTERN-001 — docs/testing/PROPERTY-TESTING.md |
| 🟡 P1 | DEBT-SAFE-PATH-TAXONOMY-DOC-001 — docs/SECURITY-PATH-PRIMITIVES.md |
| 🟢 P2 | DEBT-PROVISION-PORTABILITY-001 — ARGUS_SERVICE_USER |
| Priority | Task |
|---|---|
| P0 | DEBT-PENTESTER-LOOP-001 — MITRE Caldera → real adversarial flows → XGBoost retraining |
| P0 | ADR-038 — ACRL formal design |
| P0 | BACKLOG-FEDER-001 — clarificar scope con Andrés Caro Lindo (NDR standalone vs federación) |
| P1 | aRGus-production images (x86 + ARM64 apparmor) |
| P2 | aRGus-seL4 research branch |
Eight large language models serve as intellectual co-reviewers:
Claude (Anthropic) · Grok (xAI) · ChatGPT (OpenAI) · DeepSeek · Qwen (Alibaba) · Gemini (Google) · Kimi (Moonshot) · Mistral
Methodology: structured disagreement. Problems must be demonstrated with compilable tests or mathematics before fixes are proposed. Documented in the preprint §6.
- ✅ DAY 111: arXiv:2604.04952 PUBLICADO 🎉
- ✅ DAY 113: ADR-025 MERGED — v0.3.0-plugin-integrity 🎉
- ✅ DAY 118: PHASE 3 COMPLETADA — v0.4.0 🎉
- ✅ DAY 120: make bootstrap + XGBoost F1=0.9978 🎉
- ✅ DAY 122: PHASE 4 COMPLETADA — v0.5.0-preproduction 🎉
- ✅ DAY 124: ADR-037 MERGED — v0.5.1-hardened 🎉
- ✅ DAY 125: 5 debts closed · property testing validates TDH · 47 test sources recovered 🎉
- ✅ DAY 126: 4 debts closed · lstat() pre-resolution · fixed prefix · v0.5.2-hardened 🎉
- ✅ DAY 127: resolve_config() · dev/prod parity · Consejo 8/8 taxonomía safe_path 🎉
- 🔜 DAY 128: Snyk verification + property testing pattern + provision portability
MIT License — See LICENSE
Via Appia Quality 🏛️ — Built to last decades.