Install PyTorch on NYU Greene High Performance Computing Cluster.

November 10, 2022

Aditya Wagh

This documentation is fairly primitive and is intended to introduce you to NYU’s Computing Resource Support. For specific information, please refer to the HPC Documentation. There is a search button on the top right part of this page.

How to access compute resources for Robot Perception?

Since all of you already have access to HPC, you need to remotely acess the cluster shell (terminal where you input commands). You need to be connected to the NYU network via a VPN or by being physically being on-campus. to access the cluster shell.

Use the command below to log in to the HPC Cluster. It will ask you for your password. Enter the password that you use for logging in using your NetID. It will also ask you something related to fingerprint, type yes if that prompt shows up. This is to add the cluster in the list of trusted computer.

ssh <netid>@greene.hpc.nyu.edu

SSH Login Screen

After this step, your prompt will look like this, except the (base) part. That is something specific to my shell. By default, you are in the /home/<netid>/ or ~ folder. log-2 represents the name of the login node, There are three login nodes: log-1, log-2 and log-3.

SSH Login Screen

Now you need to hop to NYU HPC’s Google Cloud Bursting nodes intended for use in coursework. To do that, use the following snippet. Don’t run jobs on NYU Greene since it’s intended only for research purposes.

ssh burst

log-burst is the login node for the HPC’s GCP Burst cluster. You have to do the next steps in this cluster.

burst ssh

Installing Conda on HPC

HPC documentation recommends using singularity to setup conda enviromments, however it’s quite complicated and not easy for beginners. I prefer the method mentioned below, that allows us an alternative to singularity.

Frst, get a compute node on GCP Burst node. For now, don’t worry about what this command does, in a later section I’ve explained what it does. This command will give you a shell in a compute node.

srun --account=rob_gy_6203-2022fa --cpus-per-task=8 --partition=interactive --mem=16GB --time=04:00:00 --pty /bin/bash

Step 1: Create directory for your conda installation

We don’t want to create the environment in the home directory because of the 50GB quota limit in the /home/<netID>/folder.

mkdir /scratch/<NetID>/miniconda3

Step 2: Download and install Miniconda

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.sh -b -p /scratch/<NetID>/miniconda3

Step 3: Create script to activate Miniconda

Create a script env.sh in /scratch/<NetID>/ using the command the command below.

touch /scratch/<netID>/env.sh

Now populate the env.sh file with the following contents. Use can use vim, vi, emacs, nano or any other favourite terminal text editor. Read more about how to use terminal editors. This is beyond the scope of this document.

#!/bin/bash

source /scratch/<NetID>/miniconda3/etc/profile.d/conda.sh
export PATH=/scratch/<NetID>/miniconda3/bin:$PATH
export PYTHONPATH=/scratch/<NetID>/miniconda3/bin:$PATH

Now, you can activate your conda package manager by doing

source /scratch/<NetID>/env.sh

By default, new conda environment and packages will be stored in /scratch/<NetID>/miniconda3

For ease of managing environments, initialize conda on shell start by using the following command after doing the above steps. This will allow is the activate environments using conda activate

conda init

Example environment using Conda and PyTorch

In this section, I will show an example where we install PyTorch v1.13.0 in a conda environment. In the following section, I will show how to execute your code.

SSH to greene using ssh <netid>@greene.hpc.nyu.edu, and then SSH to burst from greene using ssh burst. Then get a compute node on GCP Burst platform with a GPU using the command below.

srun --account=rob_gy_6203-2022fa --cpus-per-task=8 --partition=n1s8-v100-1 --mem=16GB --gres=gpu:v100:1 --time=04:00:00 --pty /bin/bash

Once you’re in this node, you need to create a conda environment and install pytorch. The relevant commands are mentioned below.

conda create -n test python=3.9 -y
conda activate test
conda install pytorch torchvision pytorch-cuda=11.7 -c pytorch -c nvidia

The above steps can be seen pictorially in the images below. The pictures say the command to log into greene is ssh greene BUT that is specific to me. You have to use the complete command ssh <netid>@greene.hpc.nyu.edu.

sshgb-1

sshgb-2

sshgb-3

sshgb-4

sshgb-5

sshgb-6

sshgb-7

sshgb-8

How to request for GPU nodes and run your code?

There are 2 ways to do this, one is interactive and one is non-interactive.

srun --account=rob_gy_6203-2022fa --cpus-per-task=8 --partition=n1s8-v100-1 --gres=gpu:v100:1 --time=04:00:00 --pty /bin/bash
sbatch test.sbatch

Contents of test.sbatch

#!/bin/bash
#SBATCH --account=rob_gy_6203-2022fa    # ask for robot perception nodes
#SBATCH --partition=n1s8-v100-1         # specify the gpu partition
#SBATCH --nodes=1                       # requests 1 compute server
#SBATCH --ntasks-per-node=1             # runs 1 task on each server
#SBATCH --cpus-per-task=2               # uses 2 compute cores per task
#SBATCH --time=1:00:00                  # for one hour
#SBATCH --mem=2GB                       # memory required for job
#SBATCH --job-name=torch-test           # name of the job
#SBATCH --output=result.out             # file to which output will be written
#SBATCH --gres=gpu:v100:1               # To request specific v100 GPU

## Initialize conda
source /scratch/amw9425/env.sh;

## activate your environment
conda deactivate; ## this is needed for some reason, which I don't know yet
conda activate torch;

## run your code
python test.py;

Contents of test.py

#!/bin/env python

import torch

print(torch.__file__)
print(torch.__version__)

# How many GPUs are there?
print(torch.cuda.device_count())

# Get the name of the current GPU
print(torch.cuda.get_device_name(torch.cuda.current_device()))

# Is PyTorch using a GPU?
print(torch.cuda.is_available())

Common commands associated with non-interactive jobs

Miscellaneous

For teams that will be using Habitat-Sim, here is a nice tutorial by Irving Fang mentioning how to set it up on HPC. Adapt it according to the GCP Burst Platform instructions before.