Skip to content

SAYED-Sys-Lab/KUber_basic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kuber — A Distributed Knowledge Delivery System for Machine Learning at Scale

KUber is a proof-of-concept system in which multiple independent learning entities train deep neural networks on heterogeneous (non-i.i.d.) local datasets and voluntarily share discovered knowledge with one another through a central metadata registry and a model vault.


Repository layout

Kuber_clean_code/
├── kuber_base/        Four learning entities (DNN training + KD)
├── kuber_service/     Metadatabase REST API  (entity & knowledge registry)
├── kuber_vault/       Knowledge Vault REST API (model file storage)
└── requirements.txt

Each sub-project has its own README.md with detailed documentation.


Architecture overview

┌──────────────────────────────────────────────────────────────────┐
│                         kuber_base                               │
│                                                                  │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌──────────┐  │
│  │ entity_0   │  │ entity_1   │  │ entity_2   │  │ entity_3 │  │
│  │ (non-IID)  │  │ (non-IID)  │  │ (non-IID)  │  │(non-IID) │  │
│  └─────┬──────┘  └─────┬──────┘  └─────┬──────┘  └────┬─────┘  │
│        │               │               │               │        │
└────────┼───────────────┼───────────────┼───────────────┼────────┘
         │  register / query knowledge   │
         ▼                               ▼
┌─────────────────────┐       ┌─────────────────────┐
│   kuber_service     │       │    kuber_vault       │
│   (port 5001)       │◄─────►│    (port 5002)       │
│                     │ index │                      │
│  SQLite metabase:   │       │  Filesystem store:   │
│  • entities table   │       │  • <kid>.pt files    │
│  • knowledge table  │       │                      │
└─────────────────────┘       └─────────────────────┘

Learning loop per entity (round by round)

for each round:
    1. Train locally on private non-i.i.d. data (SGD, LOCAL_EPOCHS)
    2. Evaluate on shared common test set

    if accuracy improved  →  Case 1: SHARE
        • Upload model state-dict to kuber_vault
        • Register knowledge metadata (accuracy, vault_path) at kuber_service

    if accuracy degraded  →  Case 2: RETRIEVE
        • Query kuber_service: find knowledge with higher accuracy
        • Download teacher model from kuber_vault
        • Apply logit-based knowledge distillation to local model
        • Re-evaluate

Quick start

1. Install dependencies

pip install -r requirements.txt

2. Start kuber_service (terminal 1)

cd kuber_service
python app.py
# Listening on http://localhost:5001

3. Start kuber_vault (terminal 2)

cd kuber_vault
python app.py
# Listening on http://localhost:5002

4. Run all four learning entities (terminal 3)

cd kuber_base
python run_all.py

Or run a single entity:

cd kuber_base
python run_entity.py 0   # entity_id ∈ {0, 1, 2, 3}

Configuration

All tunable parameters are in kuber_base/config.py:

Parameter Default Description
DIRICHLET_ALPHA 0.5 Non-i.i.d. degree (lower → more heterogeneous)
NUM_ROUNDS 20 Training rounds per entity
LOCAL_EPOCHS 5 SGD epochs per round
LEARNING_RATE 0.01 Base SGD learning rate
COMMON_TEST_SIZE 2000 Shared test samples for fair comparison
KD_TEMPERATURE 4.0 Softmax temperature for knowledge distillation
KD_ALPHA 0.5 CE weight in KD loss (1−α = KD weight)
KD_EPOCHS 3 Fine-tuning epochs during distillation

Dependencies

Package Purpose
torch DNN training and inference
torchvision CIFAR-10 dataset + transforms
flask REST APIs for service and vault
requests HTTP communication between nodes
numpy Dirichlet partitioning

About

This repo is the basic implementation of Kuber — A Distributed Knowledge Delivery System for Machine Learning at Scale

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages