Callan HPC Cluster

Callan is the general access HPC Cluster for Trinity Researcher's. It's hardware characteristics are:

12 Compute nodes, each with 64 Intel Xeon Gold 6430 CPU Core's, 256GB of RAM and a 1.8TB local scratch, /tmp, disk.
5 Compute nodes, each with 192 AMD EPYC 9654 CPU Cores, (2 sockets, each of 96 cores), 1,512GB of RAM and a 3.5TB local scratch, /tmp, disk.
1 head node with 64 Intel Xeon Gold 6430 CPU Core's, 64GB of RAM and a 1TB local scratch, /tmp, disk.
A 69TB shared parallel file system available at /home across all the nodes in the cluster.
HDR, 200 Gb/s, InfiniBand high speed interconnect.

Access

To access it you must have a Research IT account, please apply for one if you don't have one.

To request access to the cluster please email ops@tchpc.tcd.ie.

To login please connect to callan.tchpc.tcd.ie using the usual SSH instructions. It is accessible from the College network, including the VPN. To connect to it from the internet please first login to the College VPN or relay through rsync.tchpc.tcd.ie as per our instructions.

Details of the Callan file system

There is 1 file system: /home, which is accessible to all nodes in the cluster. The default user quota in /home is 250GB for all files in /home per user.
/home/users/$user - 50GB quota. Backed up for disaster recovery purposes. Accessible only to you.
/home/scratch/$user - Temporary scratch storage. Files here are automatically deleted 90 days after they are created. Do not use for long term storage. Not backed up.
/home/projects/pi-$pi - 50GB quota. Not backed up. Accessible to all users in the $pi group. To be used for group collaborative data.

Software

Software is installed with our usual modules system. You can view the available software with module available and load software with module load, e.g. module load gcc/13.1.0-gcc-8.5.0-k3cddbg. The modgrep utility will search the available module files from the head node, e.g. modgrep conda will display any modules with conda in their name.

Intel OneAPI

Suggested intel modules to load:

module load tbb/latest compiler-rt/latest oclfpga/latest compiler/latest mpi/latest

Manual activation: source /home/support/intel/oneapi/2024.1.0/setvars.sh

Running jobs

Running jobs must be done via the Slurm scheduler.

Intel and AMD partitions

There are two different CPU architectures in Callan, some nodes have Intel CPU's, others have AMD CPU's. See the top of this page for more information on that. To avoid inadvertently running jobs with the wrong architecture the node types are in different partitions.

The Intel nodes are in the compute partition. This is the default partition. To request resources in that partition you do not need to explicitly do anything, Intel nodes in the compute partition will be assigned to you by default if you do not do anything.

If you wish to manually specify Intel nodes in the compute partition use the #SBATCH -p compute directive in your batch job scripts or for interactive jobs use salloc -p compute ....

To request AMD nodes you must specify the amd partition with the #SBATCH -p amd directive in your batch job scripts or for interactive jobs use salloc -p amd ....

ccem partition

The 5 AMD nodes, callan-n[13-17] are in both the amd partition and the ccem partition. The ccem partition has a higher priority, jobs submitted to it will run sooner. The ccem partition is only accessible to those who the Head of the School of Chemistry or their delegate specify. This is because these nodes where purchased with funding through the School of Chemistry.

To access the ccem partition for interactive jobs use: salloc -n 1 -p ccem -A CCEM

Or for batch:

#SBATCH -p ccem
#SBATCH -A CCEM

Batch job example's

Node sharing is enabled.

Create a batch submission script, e.g. name: run.sh, and submit it to the queue with the sbatch command. E.g.

> sbatch run.sh

Here are some example submission scripts:

12 cores, 48GB of RAM from 1 node:

#!/bin/bash
#SBATCH -n 12
#SBATCH --mem=48GB
module load openmpi
echo "Starting"
./exe.x

1 full node:

#!/bin/bash
#SBATCH -n 64
#SBATCH --mem=256000
module load openmpi
echo "Starting"
./exe.x

2 full nodes:

#!/bin/bash
#SBATCH -n 128
#SBATCH --mem=256000
module load openmpi
echo "Starting"
./exe.x

Interactive allocation

salloc -n 12 --mem=48GB - this will automatically log you into the node once it has been assigned.

Transferring data from kelvin, lonsdale, parsons, rsync

If you need to transfer files that exist on the Kelvin, Lonsdale or Parsons HPC Clusters you can do so through our access host; rsync.tchpc.tcd.ie. Here is an example command that will transfer files:

> rsync -av rsync.tchpc.tcd.ie:source_directory destination_directory/

Ensure to update the source_directory path to the path where the data resides on rsync. And the destination_directory/ to the path where the data is to be transferred to on Callan.

See our Transferring files for more notes on this.

Further instructions

See the HPC clusters usage documentation for further instructions.