Installing PyTorch on NYU Greene High Performance Computing Cluster
November 10, 2022 · Aditya Wagh This guide provides an introduction to NYU's High Performance Computing resources and demonstrates how to configure a PyTorch environment on the Greene cluster. For comprehensive documentation, please refer to the official NYU HPC Documentation. Users with HPC access must connect to the cluster shell remotely. A connection to the NYU network is required, either through VPN or by being physically on-campus. To log in to the HPC cluster, execute the following command. When prompted, enter the password associated with your NetID. If a fingerprint verification prompt appears, type Upon successful authentication, the shell prompt will indicate connection to the Greene cluster. The default working directory is Next, connect to NYU HPC's Google Cloud Platform (GCP) Burst nodes, which are designated for coursework. Note that the main Greene cluster should be reserved for research purposes only. This establishes a connection to While the official HPC documentation recommends Singularity for managing conda environments, the following alternative method provides a more straightforward setup process. First, request a compute node on the GCP Burst platform. The parameters for this command are explained in detail in a later section. To avoid exceeding the 50GB quota limit in the home directory, create the conda installation in the scratch space. Create a script named Populate To activate the conda package manager, execute: By default, conda environments and packages are stored in To enable the This section demonstrates the installation of PyTorch v1.13.0 within a conda environment. Connect to Greene via Once connected to the compute node, create and configure the conda environment: There are two methods for running code on the cluster: interactive and non-interactive. Interactive mode provides a terminal shell for direct command execution. To request a Tesla V100 GPU node with 8 CPUs for a 4-hour session: Non-interactive mode submits jobs to a queue managed by the SLURM workload manager: Example Example For teams utilizing Habitat-Sim, refer to this tutorial by Irving Fang for installation instructions. Ensure compatibility with the GCP Burst Platform configuration described above.Accessing the Compute Resources
yes to add the cluster to your list of trusted hosts.
/home/<netid>/ (or ~). The prompt displays the login node name, such as [<netid>@log-2 ~]$. There are three available login nodes: log-1, log-2, and log-3.
log-burst, the login node for the GCP Burst cluster. The prompt will update to [<netid>@log-burst ~]$. All subsequent steps should be executed on this cluster.Installing Conda on HPC
Step 1: Create the Conda Installation Directory
Step 2: Download and Install Miniconda
Step 3: Create an Activation Script
env.sh in /scratch/<NetID>/:
env.sh with the following content using a text editor such as vim, nano, or emacs:#!/bin/bash
/scratch/<NetID>/miniconda3.conda activate command, initialize conda for shell integration:
Creating a PyTorch Environment
ssh <netid>@greene.hpc.nyu.edu, then to the Burst cluster via ssh burst. Request a GPU-enabled compute node:
Requesting GPU Nodes and Executing Code
Interactive Mode
Non-Interactive Mode
test.sbatch configuration:#!/bin/bash
#SBATCH --account=rob_gy_6203-2022fa # Account allocation
#SBATCH --partition=n1s8-v100-1 # GPU partition
#SBATCH --nodes=1 # Number of compute nodes
#SBATCH --ntasks-per-node=1 # Tasks per node
#SBATCH --cpus-per-task=2 # CPU cores per task
#SBATCH --time=1:00:00 # Maximum runtime
#SBATCH --mem=2GB # Memory allocation
#SBATCH --job-name=torch-test # Job identifier
#SBATCH --output=result.out # Output file
#SBATCH --gres=gpu:v100:1 # GPU resource request
# Initialize conda
;
# Activate environment
;
;
# Execute script
;
test.py for verifying the PyTorch installation:#!/bin/env python
# Number of available GPUs
# Current GPU name
# GPU availability status
Common SLURM Commands
Command Description squeue -u <netID>View submitted jobs squeue --meView submitted jobs (alternative) scancel <JobID>Cancel a specific job scancel {StartJobId..EndJobId}Cancel a range of jobs squeue -u $USER | awk '{print $1}' | tail -n+2 | xargs scancelCancel all jobs squeue --me --startView estimated job start time Additional Resources