diff --git a/.claude/skills/ptq/references/slurm-setup-ptq.md b/.claude/skills/ptq/references/slurm-setup-ptq.md index 234b03c5e0..038823ba2e 100644 --- a/.claude/skills/ptq/references/slurm-setup-ptq.md +++ b/.claude/skills/ptq/references/slurm-setup-ptq.md @@ -28,10 +28,16 @@ If enroot import fails (e.g., permission errors on lustre), use pyxis inline pul ### Container dependency pitfalls -**New models may need newer transformers** than what's in the container. Install from source inside the job script: +**New models may need newer transformers** than what's in the container. Install from PyPI inside the job script (unset `PIP_CONSTRAINT` first if needed — see below): ```bash -pip install git+https://github.com/huggingface/transformers.git --quiet +pip install -U transformers +``` + +Only install from git if the fix you need isn't in a released version yet: + +```bash +pip install git+https://github.com/huggingface/transformers.git ``` **Prefer `PYTHONPATH`** to use the synced ModelOpt source instead of installing inside the container — this avoids risking dependency conflicts (e.g., `pip install -U nvidia-modelopt[hf]` can upgrade PyTorch and break other packages): @@ -90,4 +96,4 @@ This catches script errors cheaply before using GPU quota on a real run. See `skills/common/slurm-setup.md` section 2 for the smoke test partition pattern. -Only submit the full calibration job after the smoke test exits cleanly. +Only submit the full calibration job after the smoke test exits cleanly. \ No newline at end of file