This repository contains the code for experiments conducted on the following datasets:
- JOB-light
- TPC-H
The models used in the experiments are:
- NeuroCard
- FACE
fisrt create a conda environment with python 3.7, then install the required packages using pip.
conda create -n cep python=3.7
pip install -r requirements.txtthen install Face needs to be installed from source:
cd torchquadMy
pip install .Download datasets into the datasets directory:
-
IMDb dataset: IMDb Dataset, You can see
scripts/directory for the download script, and it is necessary to runprepend_imdb_header.pyto add the header to the downloaded files. -
TPC-H dataset: Run the following command to generate 10GB of data:
bash scripts/tpch.shBefore running unlearning tasks, you need to generate the initial models. Use the following commands:
python run.py --run job-light --model neurocard
python run.py --run job-light --model face
python run.py --run tpch --model neurocard
python run.py --run tpch --model faceRun unlearning tasks and optionally evaluate the results by adding --eval:
python run_unlearning.py --run job-light --filter imdb-A2-1-0.5 imdb-A6-1-1 --ul-method stale retrain fine-tune cep--run: Specifies the workload (e.g.,job-light,tpch).--filter: Defines how unlearning deletes data (e.g.,R-1-0.1,R-1-0.3, etc.).--model: Specifies the model to use (e.g.,neurocard,face).--ul-method: Specifies the unlearning methods, including baselines (stale,retrain,fine-tune) and our method (cep).--eval: Enables evaluation during unlearning tasks.
To modify the evaluation checkpoints, update the configuration files located in config/eval/<method>/checkpoint_to_load.
Configuration files for the experiments are located in the config directory. You can modify these files to adjust parameters.
All known results are saved in the CACHE folder. You can check your configured cache_dir to locate the corresponding results and cache files. By default, the results are stored in the cache folder within the current directory. Different tasks generate separate subfolders for isolation.
This repository is based on the following projects:
- NeuroCard: GitHub Repository
- FACE: GitHub Repository
- Model Sparsity Can Simplify Machine Unlearning: Paper