For people looking at this external to Brown, I hard-coded the partition and some of the GPU tags, but I think this could work for you too with a few small modifications.
With Great Power Comes Great Responsibility.
Please do not hog cluster resources. Only use cluster resources for academic research. Please do not use more than 8 GPUs at one time. If I see you consistently using many GPUs, I will definitely ask why ;)
To see the historic breakdown of GPU usage per user, one can run the following command.
bash slurm_usage_stats.bash 5 3090-gcondo "" 8 # last five days, on the 3090 condo, with 8 GPUs
Always check current cluster utilization, with bash check_util.bash and bash rank_users.bash
Only then, run bash allocate.bash. This will reserve an entire node, with 8 GPUs, 128 CPU cores, and 1TB RAM. Once your terminal exits, the node will shut down (to get around this, we'd need to submit the job instead of doing it interactively, we'll get around to this soon...). Run multi-GPU training on this node! Please do not attempt to create more than one node at once. The node is reserved for 12 hours.
If the node isn't immediately allocated, you can check it's status in a new window with the following.
squeue -u $USER
Also, when on node, I often increase the process limit with the following.
ulimit -u 8192
For using more than one window, with the same compute node, is as easy as follows.
squeue -u $USER # read the compute node name
ssh <COMPUTE_NODE_NAME> # e.g: ssh gpu2503