Quickstart Guide for Carya/Sabine
How to log in
The only way to connect to our clusters is by secure shell (ssh), e.g. from a Linux/UNIX system:
ssh -l your_username carya.rcdc.uh.edu
ssh -l your_username sabine.rcdc.uh.edu
You will use cougarnet ID as your_username and cougarnet password as password to log in. Windows users will need an SSH client installed on their machine, see e.g. PuTTY or Mobaxterm or XShell.
VSCODE is not supported and not allowed on the cluster.
UH VPN is now mandatory for accessing the clusters from outside the campus network. Note, Windows users should avoid using WSL2 to connect over VPN as WSL2 is currently incompatible with UH VPN. Instead use either PuTTY or Mobaxterm or XShell.
Best Practices on Login Nodes
The login nodes are not appropriate for computational work as they are shared among all users. Parallel applications including MPI and multithreaded applications are not permitted on the frontends or login nodes; instead, short parallel test runs should be carried out using slurm batch jobs. Additionally, interactive batch jobs can be submitted, which, upon initiation, open a shell on one of the allocated compute nodes, allowing users to run interactive programs there.
Cluster Storage and Directory Access
Each cluster user has access to both a personal home directory and a shared group/project directory. The home directory is limited to 10 GB of storage per user. The group/project directory, shared by all members of the user's research group, provides a minimum storage space of 250 GB per group. Users are encouraged to run their jobs in the group/project directory.
The paths to both directories are provided in the welcome message when the account is created. The home directory can be accessed using the $HOME environment variable, while the group directory is typically located at /project/<supervisor's-lastname>, where <supervisor's-lastname> should be replaced with your supervisor's last name.
Allocations
Users without project allocations cannot run jobs on Sabine or Opuntia. Users have been given a small allocions on Opuntia to continue running jobs there. For increased job allocations please refer to the allocation request. A PI (supervisor) will have to submit a project proposal for Sabine/Opuntia.
Users can check the balance for their projects using the sbalance command:
sbalance balance statement project <projectname>
Users with multiple allocations for instance working for two different PIs(supervisors) would need to specify the allocation upon submission of the job in the batch script so the right PIs allocation is used, e.g. :
#!/bin/bash
### Specify job parameters
#SBATCH -J test_job # name of the job
#SBATCH -t 1:00:00 # time requested
#SBATCH -N 1 -n 2 # total number of nodes and processes
## if you have multiple allocations
## you can tell SLURM which account to charge this job to
#SBATCH -A #Allocation_AWARD_ID
or specify it when submitting an interactive job, e.g.:
srun -
A #Allocation_AWARD_ID--pty /bin/bash
Using tmux
Using tmux on the Carya/Sabine cluster allows you to create interactive allocations that you can detach from. Normally, if you get an interactive allocation (e.g. srun --pty) then disconnect from the cluster, for example by putting your laptop to sleep, your allocation will be terminated and your job killed. Using tmux, you can detach gracefully and tmux will maintain your allocation. Here is how to do this correctly:
- ssh to Sabine/Opuntia.
- Start tmux.
- Inside your tmux session, submit an interactive job with srun.
- Inside your job allocation (on a compute node), start your application (e.g. matlab).
- Detach from tmux by typing Ctrl+b then d .
- Later, on the same login node, reattach by running
tmux attach
Make sure to:
- run tmux on the login node, NOT on compute nodes
- run srun inside tmux, not the reverse.
X11 Forwarding
X11 forwarding is necessary to display editor windows (gvim, emacs, nedit, etc.) or similar on your desktop. To enable X11 forwarding, log in with the ssh -X or -Y options enabled
ssh -XY -l your_username carya.rcdc.uh.edu
ssh -XY -l your_username sabine.rcdc.uh.edu
Windows users need an X server to handle the local display in addition to the ssh program, see this intro (from the University of Indiana) for PuTTY users.
Transferring Data
Basic Tools
SCP (Secure CoPy): scp uses ssh for data transfer, and uses the same authentication and provides the same security as ssh. For example, copying from a local system to Carya:
scp myfile your_username@carya.rcdc.uh.edu:
scp myfile your_username@sabine.rcdc.uh.edu:
To recursively copy directory
scp -r my_directory your_username@carya.rcdc.uh.edu:
SFTP (Secure File Transfer Protocol): sftp is a file transfer program, similar to ftp, which performs all operations over an encrypted ssh transport. Example, put file from local system to Sabine (this also works for Carya):
sftp uusername@sabine.rcdc.uh.edu Password: Connected to sabine.rcdc.uh.edu sftp> put myfile
For Windows users, WinSCP is a free graphical SCP and SFTP client.
RSYNC is a utility for efficiently transferring and synchronizing files between a computer and an external hard drive and across networked computers by comparing the modification times and sizes of files. Its primary advantage over scp is for fast synchronization by only copying new or updated files. To transfer to carya, or sabine
rsync -avP file username@sabine.rcdc.uh.edu:path_to_destination_directory rsync -avP directory username@sabine.rcdc.uh.edu:path_to_destination_directory
Data Transfer With GLOBUS
The Carya and Sabine clusters both Globus endpoints. Users can transfer files to/from Carya or Sabine using the globus web application. Users can also use the globus connect personal application to initiate data transfers with the clusters from their desktop. More details on using the globus at the DSI cluster are available here. Also, a youtube tutorial for globus connect personal here.
Software Environment
Text editors
Carya/Sabine have command line editors installed including emacs, nano and vim.
Modules
Modules are a tool for users to manage the Unix environment in sabine. It is designed to simplify login scripts. A single-user command,
module add module_name
can be invoked to source the appropriate environment information within the user’s current shell. Invoking the command,
module available
or use the abbreviated form
ml
avail
or even shorter
ml av
will list the available packages on Carya, or Sabine
module rm module_name
Will remove the module from your environment
Running Jobs
The Concept
A "job" refers to a program running on the compute nodes of the Carya, Opuntia, or Sabine clusters. Jobs can be run on clusters in two different ways:
- A batch job allows you to submit a script that tells the cluster how to run your program. Your program can run for long periods of time in the background, so you don't need to be connected to the cluster. The output of your program is continuously written to an output file that you can view both during and after your program runs.
- An interactive job allows you to interact with a program by typing input, using a GUI, etc. But if your connection is interrupted, the job will abort. These are best for small, short-running jobs where you need to test out a program, or where you need to use the program's GUI.
The Code
The following shows how to run an example of a parallel program (using MPI) on Carya, Opuntia, or Sabine. MPI programs are executed as one or more processes; one process is typically assigned to one physical processor core. All the processes run the exact same program, but by receiving different inputs they can be made to do different tasks. The most common way to differentiate the processes is by their rank. Together with the total number of processes, referred to as size, they form the basic method of dividing the tasks between the processes. Getting the rank of a process and the total number of processes is therefore the goal of this example. Furthermore, all MPI-related instructions must be issued between MPI_Init() and MPI_Finalize(). Regular C instructions that are to be run locally for each process, e.g. some preprocessing that is equal for all processes, can be run outside the MPI context.
Below is a simple program that, when executed, will make each process print its name and rank as well as the total number of processes.
/* Basic MPI Example - Hello World */ #include <stdio.h> /* printf and BUFSIZ defined there */ #include <stdlib.h> /* exit defined there */ #include "mpi.h" /* all MPI-2 functions defined there */ int main(argc, argv) int argc; char *argv[]; { int rank, size, length; char name[BUFSIZ]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Get_processor_name(name, &length); printf("%s: hello world from process %d of %dn", name, rank, size); MPI_Finalize(); exit(0); }
-
- MPI_Init(); Is responsible for spawning processes and setting up the communication between them. The default communicator (collection of processes) MPI_COMM_WORLD is created.
-
- MPI_Finalize(); End the MPI program.
-
- MPI_Comm_rank( MPI_COMM_WORLD, &rank ); Returns the rank of the process within the communicator. The rank is used to divide tasks among the processes. The process with rank 0 might get some special task, while the rank of each process might correspond to distinct columns in a matrix, effectively partitioning the matrix between the processes.
-
- MPI_Comm_size( MPI_COMM_WORLD, &size ); Returns the total number of processes within the communicator. This can be useful to e.g. know how many columns of a matrix each process will be assigned.
-
- MPI_Get_processor_name( name, &length ); Is more of a curiosity than necessary in most programs; it can assure us that our MPI program is indeed running on more than one computer/node.
Compile & Run
Save the code in a file named helloworld.c. Load the Intel compiler and Intel MPI module files:
ml intel-oneapi
Compile the program with the following command:
mpiicx -o helloworld helloworld.c
Make a batch job. Add the following in a file named job.sh
#!/bin/bash
#SBATCH -J my_mpi_job
#SBATCH -o my_mpi_job.o%j
#SBATCH -t 00:01:00
#SBATCH -N 2 -n 10
ml intel-oneapi
mpirun ./helloworld
Submit the job to the queue.
sbatch job.sh Submitted batch job 906
Note that the command sbatch returns the job ID. Note that the example runs fast. It can be finished before the status command returns a job identifier. The job identifier is used to name the output from the job together with the name of the job. The job name is given with -N option in the job.sh script. In this example, it is ‘my_mpi_job’. The standard output from the processes is logged to a log file in the working directory named my_mpi_job.o. Here is the content from on batch execution of the job.sh:
cat my_mpi_job.o906 compute-2-13.local: hello world from process 9 of 10 compute-2-12.local: hello world from process 1 of 10 compute-2-12.local: hello world from process 3 of 10 compute-2-12.local: hello world from process 5 of 10 compute-2-12.local: hello world from process 6 of 10 compute-2-12.local: hello world from process 7 of 10 compute-2-12.local: hello world from process 8 of 10 compute-2-12.local: hello world from process 0 of 10 compute-2-12.local: hello world from process 2 of 10 compute-2-12.local: hello world from process 4 of 10
Note that the file my_mpi_job.e contains the output to standard error from all the processes. If the processes are executed without faults, no errors are logged (the file is empty).
SLURM Script Generator
An online job script generator application is provided at https://secure.hpedsi.uh.edu/slurm. The web application is designed to assist users in creating template SLURM job scripts for different clusters at HPE-DSI. It's a great starting point for creating various batch job workflows. Users are encouraged to customize the generated scripts to further suit their needs.
Batch Jobs
Note for Carya there is no special partition for gpus, so " -p gpu" is not needed when submitting jobs.
Users can check the status of a job with the squeue commands below.
squeue -j <JOB_ID>
Single whole node
#!/bin/bash
#SBATCH -J my_mpi_job
#SBATCH -o my_mpi_job.o%j
#SBATCH -t 00:01:00
#SBATCH -N 1 -n 28
ml intel-oneapi
mpirun ./helloworld
Multiple whole nodes
This example uses 4 nodes and 28 tasks or cores per node
#!/bin/bash
#SBATCH -J my_mpi_job
#SBATCH -o my_mpi_job.o%j
#SBATCH -t 00:01:00
#SBATCH -N 4 --ntasks-per-node=28
module load intel-oneapi
mpirun ./helloworld
Single core job utilizing 1 GPU (if you need only a single CPU core and one GPU)
#!/bin/bash
#SBATCH -J my_job
#SBATCH -o my_job.o%j
#SBATCH -t 00:01:00
#SBATCH -n 1
#SBATCH --gpus=1
ml CUDA
nvidia-smi
./helloworld
Single node job utilizing 1 GPU (if you need only one GPU but with multiple CPUs from the same node)
Single node utilizing 2 GPUs (if you need two GPUs, along with multiple CPUs all from one node)#!/bin/bash #SBATCH -J my_mpi_job #SBATCH -o my_mpi_job.o%j #SBATCH -t 00:01:00 #SBATCH -N 1 -n 16
#SBATCH --gpus=1
ml CUDA
nvidia-smi
mpirun ./helloworld
Multiple Whole nodes job 2 GPUS per node (only on Sabine). This example uses 4 nodes and 28 tasks or cores per node#!/bin/bash #SBATCH -J my_mpi_job #SBATCH -o my_mpi_job.o%j #SBATCH -t 00:01:00 #SBATCH -N 1 -n 28
#SBATCH --gpus=2
ml CUDA intel-oneapi
nvidia-smi mpirun ./helloworld
#SBATCH -J my_mpi_job #SBATCH -o my_mpi_job.o%j
#SBATCH -t 00:01:00
#SBATCH -N 4 --ntasks-per-node=28#SBATCH --gpus-per-node=2
ml CUDA intel-oneapinvidia-smi
mpirun ./helloworld
Batch Array Jobs
For running 10s to thousands for jobs you can use the job array mechanism. Assuming they have a contiguous serial number to distinguish the inputs and outputs. The SLURM_ARRAY_TASK_ID variable.,
will point to the serial number present in the file name.
Sample array job for 100 jobs
#!/bin/bash
#SBATCH -N 1 #number of nodes
#SBATCH --ntasks-per-node=1 #number of tasks per node
#SBATCH -J myn_job # job name
#SBATCH -o myjob.o%j # output and error file name (%j expands to jobID)
#SBATCH --time=4:00:00 # run time (hh:mm:ss) - 4 hours
#SBATCH --mem-per-cpu=2GB # assuming estimate for memory needed was 1-1.5 GB
#SBATCH --mail-user=johnDoe@uh.edu
#SBATCH --mail-type=end # email me when the job finishes, also includes efficiency report.
#SBATCH --array=1-100
./run_my_app input_$SLURM_ARRAY_TASK_ID.inp > output_$SLURM_ARRAY_TASK_ID.out
Interactive Jobs
To open an interactive session on a compute node using the following salloc
Same as above, but requesting 1 hour of wall time and X11 forwarding support
salloc -t 1:00:00 --x11=first
Same as above, but requesting 48 cores or a full node on Carya
salloc -t 1:00:00 -n 48 -N 1
Requesting 28 cores or a full node on Sabine
salloc -t 1:00:00 -n 28 -N 1
Same as above, but requesting 52 cores or a full node on Carya
salloc -t 1:00:00 -n 52 -N 1
Requesting GPUSRequesting 24 cores and 1 GPU on Carya
salloc -t 1:00:00 -n 24 --gpus=1 -N 1
Requesting 48 cores or a full node and 2 GPUs on Carya
salloc -t 1:00:00 -n 48 --gpus=2 -N 1
Requesting 24 cores or a full node and 1 volta architecture GPU on Carya
salloc -t 1:00:00 -n 24 -N 1 --gpus=volta:1
Requesting L40S Ada architecture GPUand 20 cores per node (on Carya)
salloc -t 1:00:00 -n 32 -N 1 --gpus=ada:1
Requesting 28 cores or a full node and 1 GPU on Sabine
salloc -t 1:00:00 -n 28 --gpus=1 -N 1
Requesting 28 cores or a full node and 2 GPUs on Sabine
salloc -t 1:00:00 -n 28 --gpus=2 -N 1
Same as above, but requesting 4 nodes and 28 cores per node (on Sabine)
salloc -t 1:00:00 -ntasks-per-node=28 -N 4
Requesting 28 cores per node, 2 GPUs per node, and 4 nodes (on Sabine)
salloc -t 1:00:00 --ntasks-per-node=28 --gpus=2 -N 4
Python Jobs
Batch Job Examples
Example of python batch job utilizing 1 CPU core
!/bin/bash
#SBATCH -J python_job
#SBATCH -o python_job.o%j
#SBATCH -t 00:01:00
#SBATCH -N 1 -n 1
#SBATCH --mem=4GB
ml Miniforge
python your_python_script.py
Conda Virtual Environments
Sometimes your python workflow requires creating a special virtual environment for running your jobs. Keep in mind the home directory has a size limit of 10 GB, so it's not advisable to install the virtual environment directly in your home directory, instead install them in your group's project directory using the path option in the conda create -p /path/to/install/virtualenv ... command.
Below are steps to create and run your virtual environment.
Creating a virtual environment for Python 3.10
module add Miniforge3/py3.10
export CONDA_PKGS_DIRS=/project/your_PIs_project_name/your_user_name/conda_cache_dir
conda create -p /project/your_PIs_project_name/your_user_name/your_virtual_env_name
Sometimes your python workflow might require a different major version of Python beyond the default version, for instance Python 3.9 or Python 3.11. Below are steps to create an env for a different Python version other than the default 3.10 version
Creating a virtual environment for Python 3.11
module add Miniforge3/py3.10
export CONDA_PKGS_DIRS=/project/your_PIs_project_name/your_user_name/conda_cache_dir
conda create -p /project/your_PIs_project_name/your_user_name/myenv_3.11 python==3.11
Using the virtual environment
module add Miniforge3/py3.10
source activate /project/your_PIs_project_name/your_user_name/your_virtual_env_name
export CONDA_PKGS_DIRS=/project/your_PIs_project_name/your_user_name/conda_cache_dir
#install python package(s) e.g. scipy and matplotlib
conda install scipy matplotlib
#test the installed package(s)
python -c "import scipy, matplotlib"
Managing Conda Cache
By default, Conda stores cached files in the user's home directory, which can quickly become full and lead to problems. To modify this behavior, you can change the cache directory by either setting the pkgs_dirs entry in the .condarc file or defining the CONDA_PKGS_DIRS environment variable. To find out the current cache directory, execute the following command:
module add Miniforge3/py3.10
Below is an example of the steps, for a potential user with the following profile
supervisor/PIs name = Dr. Dow Jones
project name = jones
cougarnet user name/id: jderick23
desired custom conda virtual environment = scikit-learn from conda-forge channel
module add Miniforge3/py3.10
source $(dirname `which python`)/../etc/profile.d/conda.sh
export CONDA_PKGS_DIRS=/project/jones/jderick23/conda_cache_dir
conda create -p /project/jones/jderick23/my-scikit-learn -c conda-forge scikit-learn
To use the my-scikit-learn virtual environment you just created
module add Miniforge3/py3.10
source activate /project/jones/jderick23/my-scikit-learn
The example below shows how to use the "my-scikit-learn" environment inside a batch job
!/bin/bash
#SBATCH -J python_job
#SBATCH -o python_job.o%j
#SBATCH -t 00:01:00
#SBATCH -N 1 -n 1
#SBATCH --mem=4GB
module load Miniforge3/py3.10
source activate /project/jones/jderick23/my-scikit-learn
python your_python_script.py
Container Environments- Singularity/Apptainer
Apptainer formerly called singularity is designed to be a flexible and user-friendly container container system that is optimized for specific and high- performance computing applications. Apptainer is installed locally on all the compute servers on the clusters.
How to Obtain an Apptainer Image
The apptainer pull
command is used to download and create an Apptainer image from a root source, functioning similarly to the docker pull
command. Below is an example of how to pull an Ubuntu image from Singularity Hub within a Slurm interactive job:
salloc -t 2:00:00 --mem=8gb
export APPTAINER_CACHEDIR=$TMPDIR
export APPTAINER_TMPDIR=$TMPDIR
apptainer pull my-container.sif shub://singularityhub/ubuntu:latest
In this example:
- The
salloc
command starts an interactive shell with sufficient memory (8 GB) and a 2-hour time limit for creating the Apptainer image. - Temporary and cache directories are set to
$TMPDIR
, ensuring that these files do not clutter the project or home directories. - The
apptainer pull
command downloads the Ubuntu image from Singularity Hub and saves it as a Singularity Image File (SIF) namedmy-container.sif
in the current directory.
How to run a batch Apptainer jobe
Below is an example of a Slurm batch script to execute a Python command inside an Apptainer container
#!/bin/bash
#SBATCH -J python_job # Job name
#SBATCH -o python_job.o%j # Output file with job ID
#SBATCH -t 01:00:00 # Time limit (1 hour)
#SBATCH -n 1 # Number of tasks
#SBATCH --mem=1GB # Memory allocation (1 GB)
apptainer run python_build.sif python -c "print('hello world')"
Code Explanation:
-
Slurm Directives>:
-J
: Sets the job name topython_job
.-o
: Specifies the output file, appending the job ID (%j
) for uniqueness.-t
: Allocates a 1-hour runtime limit.-n
: Requests 1 task for the job.--mem
: Allocates 1 GB of memory.
-
Apptainer Command:
apptainer run python_build.sif
: Runs the specified container image (python_build.sif
).python -c "print('hello world')"
: Executes an inline Python command inside the container to print "hello world".
Running an Apptainer Image with TensorFlow in a GPU Batch Job
Apptainer by default only mounts and reads the user's home directory. If you have your data and images in the project directory and need to make them available you can bind those directories. Below is an example of a Slurm batch script to run a TensorFlow container using Apptainer on a GPU-enabled node.for a potential user with the following profile
supervisor/PIs name = Dr. Dow Jones
project name = jones
cougarnet user name/id: jderick23:
#!/bin/bash
#SBATCH -J tensorflow_training # Job name
#SBATCH -o tensorflow_training.o%j # Output file with job ID
#SBATCH -t 01:00:00 # Job time limit (1 hour)
#SBATCH -N 1 # Number of nodes
#SBATCH -n 4 # Number of tasks
#SBATCH --gpus=1 # Number of GPUs
#SBATCH --mem=32GB # Memory allocation (32 GB)
apptainer run --bind /project/:/project/ \
/project/jones/jderick23/tensorflow_2.15.sif \
python /project/jones/jderick23/mytensorflow_train.py
Code Explanation:
-
Slurm directives (
#SBATCH
): Define the job parameters:-J
: Sets the job name totensorflow_training
.-o
: Specifies the output file name, appending the job ID (%j
) for distinction.-t
: Allocates a 1-hour runtime limit.-N
: Requests 1 node.-n
: Allocates 4 tasks.--gpus
: Requests 1 GPU for the job.--mem
: Allocates 32 GB of memory.
-
Apptainer command:
--bind /project/:/project/
: Mounts the/project/
directory into the container./project/jones/jderick23/tensorflow_2.15.sif
: Specifies the TensorFlow container image.python /project/jones/jderick23/mytensorflow_train.py
: Executes the training script within the container.
TensorFlow Jobs
Tensorflow is a standalone module and also within the conda python installations. The installed versions take can also advantage of GPUs, if executed on a node with GPU(s).
Batch Job Example
Single core job utilizing 1 GPU (if you need only a single CPU core and one GPU)
#!/bin/bash
#SBATCH -J tensorflow_job
#SBATCH -o tensorflow_job.o%j
#SBATCH -t 00:01:00
#SBATCH -N 1 -n 1
#SBATCH
--gpus=1
#SBATCH --mem=32GB
module load TensorFlow
python convolutional_network.py
Single node dual core job utilizing 2 GPUs and 2 CPUs (works only on Sabine)
#!/bin/bash
#SBATCH -J tensorflow_job
#SBATCH -o tensorflow_job.o%j
#SBATCH -t 00:01:00
#SBATCH -N 1 -n 2
#SBATCH
--gpus=2
#SBATCH --mem=64GB
module load TensorFlow
python convolutional_network.py
Pytorch and Torchvision Jobs
Pytocrch and Torchvision are available within python packages and as standalone modules. The installed versions take advantage of CPUs and GPUs. Note PyTorch is a dependency of torchvision, if you need the cluster compiled version of PyTorch and compatible torchvision simply load torchivison, e.g.
module add torchvision/0.15.2-foss-2022a-CUDA-11.7.0
Batch Job Examples
Single core job utilizing 1 GPU (If you need only a single CPU core and one GPU)
#!/bin/bash
#SBATCH -J torch_job
#SBATCH -o torch_job.o%j
#SBATCH -t 00:01:00
#SBATCH -N 1 -n 1
#SBATCH --gpus=1
#SBATCH --mem=32GB
ml torchvision
python pytorch_script.py
Single node dual core job utilizing 2 GPUs and 2 CPUs (works only on Sabine)
#!/bin/bash
#SBATCH -J torch_job
#SBATCH -o torch_job.o%j
#SBATCH -t 00:01:00
#SBATCH -N 1 -n 2
#SBATCH --gpus=2
#SBATCH --mem=64GB
module load torchvision
python pytorch_script.py
GROMACS Jobs
GROMACS is available as a module on the Sabine and Opuntia clusters. The installed versions can also take advantage of GPUs.
Batch GROMACS Jobs
Below are more examples for batch jobs requesting certain resources (the module names match the ones installed on Sabine - please adjust for Opuntia). Note -maxh is set to 4 hours to match the requested wall time, so Gromacs can end gracefully i.e. writing any checkpoint file or needed files before the slurm job time expires.
Single Whole node
#!/bin/bash
#SBATCH -J my_sim_job
#SBATCH -o my_sim_job.o%j
#SBATCH -t 04:00:00
#SBATCH -N 1 --ntasks-per-node=28
ml GROMACS
mpirun gmx_mpi mdrun -v -deffnm dhfr -maxh 4.0
Single Whole GPU node
#SBATCH -J my_sim_job
#SBATCH -o my_sim_job.o%j
#SBATCH -t 04:00:00
#SBATCH -N 1 --ntasks-per-node=4
#SBATCH --cpus-per-task=7
#SBATCH --gpus-per-node=2
ml GROMACS
mpirun gmx_mpi mdrun -v -deffnm dhfr -maxh 4.0
Multiple Whole nodes
#!/bin/bash
#SBATCH -J my_sim_job
#SBATCH -o my_sim_job.o%j
#SBATCH -t 04:00:00
#SBATCH -N 2 --ntasks-per-node=4
ml GROMACS
mpirun gmx_mpi mdrun -v -deffnm dhfr -maxh 4.0
Multiple Whole GPU nodes
#!/bin/bash
#SBATCH -J my_sim_job
#SBATCH -o my_sim_job.o%j
#SBATCH -t 04:00:00
#SBATCH -N 2 --ntasks-per-node=4
#SBATCH --cpus-per-task=7
#SBATCH --gpus-per-node=2
ml GROMACS
mpirun gmx_mpi mdrun -v -deffnm dhfr -maxh 4.0
NAMD Jobs
NAMD is available as a module on the Sabine and Opuntia clusters. The installed versions can also take advantage of distributed memory processors using MPI.
Batch NAMD Jobs
Below are more examples for batch jobs requesting certain resources (the module names match the ones installed on Sabine - please adjust for Carya).
Single Whole node
#!/bin/bash
#SBATCH -J my_sim_job
#SBATCH -o my_sim_job.o%j
#SBATCH -t 04:00:00
#SBATCH -N 1 --ntasks-per-node=28
module add NAMD
mpirun namd2 namd.conf
Multiple Whole nodes
#!/bin/bash
#SBATCH -J my_sim_job
#SBATCH -o my_sim_job.o%j
#SBATCH -t 04:00:00
#SBATCH -N 2
#SBATCH --ntasks-per-node=28 # Asking for 2 Nodes and 56 cores on Sabine
module add NAMD
mpirun namd2 namd.conf
MATLAB Jobs
MATLAB is available as a module on the Carya, Opuntia, and Sabine clusters. The installed versions can also take advantage of distributed memory processors. Keep in mind that Matlab jobs might require users to request more memory depending on the size of the input arrays and temporary arrays etc., used in the job.
Batch MATLAB Jobs
Below are more examples of Matlabbatch jobs requesting certain resources.
Single core/processor
#!/bin/bash
#SBATCH -J job_name
#SBATCH -o job_name.o%j
#SBATCH -t 04:00:00
#SBATCH -N 1 --ntasks-per-node=1
#SBATCH --mem-per-cpu=8gb #you might require more less of this amount
module add matlab
#assuming your matlab code is stored in mycompute.m file, you can as shown below
matlab -r mycompute
Multiple processors on a single node
#!/bin/bash
#SBATCH -J job_name
#SBATCH -o job_name.o%j
#SBATCH -t 04:00:00
#SBATCH -N 1 # Asking for 1 Node
#SBATCH --ntasks-per-node=20 # Asking for 20 cores on Opuntia
#SBATCH --mem-per-cpu=2gb #you might require more or less memory than this
#assuming your matlab code is stored in mycompute.m file, you can as shown below
module add matlab
matlab -r mycompute
R and Rstudio Jobs
The R program is available as a module on the Carya, Opuntia, and Sabine clusters. The installed versions can also take advantage of distributed memory processors. Keep in mind that R jobs might require users to request more memory depending on the size of the input arrays and temporary arrays etc., used in the job.
Sample R or Rscript
#!/bin/bash
#SBATCH -N 1 #number of nodes
#SBATCH --ntasks-per-node=1 #number of tasks per node
#SBATCH -J myn_job # job name
#SBATCH -o myjob.o%j # output and error file name (%j expands to jobID)
#SBATCH --time=0:20:00 # run time (hh:mm:ss)
#SBATCH --mem-per-cpu=4GB # assuming estimate for memory needed per cpu was 3.8 GBs
#SBATCH --mail-user=johnDoe@uh.edu
#SBATCH --mail-type=end #email me when the job finishes, also includes efficiency report.
ml R
Rscript your_R_script.R
AlphaFold Jobs
The AlphaFold program is available as a module on the Carya, and Sabine clusters. The installed versions use a Python script called "run_alphafold.py" as the main driver program. It also comes with support to run on NVIDIA GPU. Keep in mind that AlphaFold jobs might require users to request more memory than the default memory settings. A sample SLURM job script for AlphaFold is shown below, users are expected to provide their own sequence file in fasta format.
Sample AlphaFold job
#!/bin/bash
#SBATCH -J myn_job
#SBATCH -o myjob.o%j
#SBATCH --time=4:00:00
#SBATCH --mem-per-cpu=4GB
#SBATCH --mail-user=johnDoe@uh.edu
#SBATCH --mail-type=end
#SBATCH --ntasks-per-node=14
#SBATCH -N 1 --gpus=1
ml AlphaFold
run_alphafold.py --max_template_date=2021-11-01 --fasta_paths myseq.fasta --output_dir result_dir
Quantum Espresso Jobs
The Quantum Espresso Suite of programs is available as a module on the Carya, and Sabine clusters. The installed versions come with support to run in parallel across multiple CPU cores.
Sample Quantum Espresso job
#!/bin/bash
#SBATCH -J myn_job
#SBATCH -o myjob.o%j
#SBATCH --time=4:00:00
#SBATCH --mem-per-cpu=2GB
#SBATCH --mail-user=johnDoe@uh.edu
#SBATCH --mail-type=end
#SBATCH -n 32
ml QuantumESPRESSO
# using pw.x application
mpirun pw.x < input.in > output.log
LAMMPS Jobs
LAMMPS is a classical molecular dynamics code with a focus on materials modeling. The LAMMPS program is available as a module on the Carya, and Sabine clusters. The installed versions come with support to run in parallel across multiple CPU cores.
Sample LAMMPS job
#!/bin/bash
#SBATCH -J myn_job
#SBATCH -o myjob.o%j
#SBATCH --time=4:00:00
#SBATCH --mem-per-cpu=2GB
#SBATCH --mail-user=johnDoe@uh.edu
#SBATCH --mail-type=end
#SBATCH -n 32
ml LAMMPS
# using lmp application
mpirun lmp < in.protein >log.protein