host memory usages in load stage were much smaller than model size, so we measure actual size in forward stage.
calculate host/device memory usage:
bash run.sh cpu 1 128
calculate FLOPs:
python hf_flops_cpu.py 1 128 model_name
| Name | Name | Last commit date | ||
|---|---|---|---|---|