dev_env/slurm at main · garylvov/dev_env

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
allocate.bash	allocate.bash
check_util.bash	check_util.bash
fifo_test.bash	fifo_test.bash
rank_queue.bash	rank_queue.bash
rank_users.bash	rank_users.bash
slurm_job_duration.bash	slurm_job_duration.bash
slurm_usage_stats.bash	slurm_usage_stats.bash

Name

Last commit message

Last commit date

slurm_job_duration.bash

slurm_usage_stats.bash

Note for External People

For people looking at this external to Brown, I hard-coded the partition and some of the GPU tags, but I think this could work for you too with a few small modifications.

Note for students who work with me at Brown

With Great Power Comes Great Responsibility.

Please do not hog cluster resources. Only use cluster resources for academic research. Please do not use more than 8 GPUs at one time. If I see you consistently using many GPUs, I will definitely ask why ;)

To see the historic breakdown of GPU usage per user, one can run the following command.

bash slurm_usage_stats.bash 5 3090-gcondo "" 8 # last five days, on the 3090 condo, with 8 GPUs

Always check current cluster utilization, with bash check_util.bash and bash rank_users.bash

Only then, run bash allocate.bash. This will reserve an entire node, with 8 GPUs, 128 CPU cores, and 1TB RAM. Once your terminal exits, the node will shut down (to get around this, we'd need to submit the job instead of doing it interactively, we'll get around to this soon...). Run multi-GPU training on this node! Please do not attempt to create more than one node at once. The node is reserved for 12 hours.

If the node isn't immediately allocated, you can check it's status in a new window with the following.

squeue -u $USER

Also, when on node, I often increase the process limit with the following.

ulimit -u 8192

For using more than one window, with the same compute node, is as easy as follows.

squeue -u $USER # read the compute node name
ssh <COMPUTE_NODE_NAME> # e.g: ssh gpu2503

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Note for External People

Note for students who work with me at Brown

FilesExpand file tree

slurm

Directory actions

More options

Directory actions

More options

Latest commit

History

slurm

Folders and files

parent directory

README.md

Note for External People

Note for students who work with me at Brown