A High Performance Computing (HPC) is a specially designed network of computers capable of running applications that can exchange data efficiently.
VU MIF HPC consists of a supercomputer from the clusters (the first number is the actual and available amount):
Title | Nodes | CPU | GPU | RAM | HDD | Network | Notes |
---|---|---|---|---|---|---|---|
main | 35/36 | 48 | 0 | 384GiB | 0 | 1Gbit/s, 2x10Gbit/s, 4xEDR(100Gbit/s) infiniband | CPU |
gpu | 3/3 | 40 | 8 | 512GB/32GB | 7TB | 2x10Gbit/s, 4xEDR(100Gbit/s) infiniband | CPU NVIDIA DGX-1 |
power | 2/2 | 32 | 4 | 1024GB/32GB | 1.8TB | 2x10Gbit/s, 4xEDR(100Gbit/s) infiniband | IBM Power System AC922 |
Total 40/41 nodes, 1912 CPU cores with 17TB RAM, 32 GPU with 1TB RAM.
The processor below = CPU = core - a single core of the processor (with all hyperthreads if they are turned on).
In main and gpu partitions there are installed Qlustar 12 OS. It is based on Ubuntu 20.04 LTS. In power partition there is installed Ubuntu 18.04 LTS.
You can check the list of OS package with the command dpkg -l
(in login node hpc or in power nodes).
With the command singularity it is possible to make use of ready-made copies of container files in directories /apps/local/hpc
, /apps/local/nvidia
, /apps/local/intel
, /apps/local/lang
or to download from singularity and docker online repositories. You can also create your own singularity containers using the MIF cloud service.
You can prepare your container with singularity, for example:
$ singularity build --sandbox /tmp/python docker://python:3.8 $ singularity exec -w /tmp/python pip install package $ singularity build python.sif /tmp/python $ rm -rf /tmp/python
To use this container, it is advised to use different directory instead of home directory (for python packages don't mixup with installed to home directory).
$ mkdir ~/workdir $ singularity exec -H ~/workdir:$HOME python.sif python3 ...
Similarly, you can use R, Julia or other containers that do not require root privileges to install packages.
If you want to add OS packages to the singularity container, you need root/superuser privileges. With fakeroot, we simulate them, and copy the required library libfakeroot-sysv.so
into the container, for example:
$ singularity build --sandbox /tmp/python docker://ubuntu:18.04 $ cp /libfakeroot-sysv.so /tmp/python/ $ fakeroot -l /libfakeroot-sysv.so singularity exec -w /tmp/python apt-get update $ fakeroot -l /libfakeroot-sysv.so singularity exec -w /tmp/python apt-get install python3.8 ... $ fakeroot -l /libfakeroot-sysv.so singularity exec -w /tmp/python apt-get clean $ rm -rf /tmp/python/libfakeroot-sysv.so /tmp/python/var/lib/apt/lists (you can clean up more of what you don't need) $ singularity build python.sif /tmp/python $ rm -rf /tmp/python
There are ready-made scripts to run your hadoop tasks using the Magpie set in the directory /apps/local/bigdata
.
With JupyterHub you can run calculations with the python command line in a web browser and use the JupyterLab environment. If you install your own JupyterLab environment in your home directory, you need to install the additional batchspawner
package - this will start your environment, for example:
$ python3.7 -m pip install --upgrade pip setuptools wheel $ python3.7 -m pip install --ignore-installed batchspawner jupyterlab
Alternatively, you can use a container that you made via JupyterHub. In that container, you need to install the batchswapner
and jupyterlab
packages, and to create a script ~/.local/bin/batchspawner-singleuser
with execution permissions (chmod +x ~/.local/bin/batchspawner-singleuser
).
#!/bin/sh exec singularity exec --nv myjupyterlab.sif batchspawner-singleuser "$@"
You need to use SSH applications (ssh, putty, winscp, mobaxterm) and Kerberos or SSH key authentication to connect to HPC.
If Kerberos is used:
If SSH keys are used (e.g. if you need to copy big files):
~/.ssh
directory in the HPC file system and put your ssh public key (in OpenSSH format) into the ~/.ssh/authorized_keys
file.The first time you connect, you will not be able to run SLURM jobs for the first 5 minutes. After that, SLURM account will be created.
VU MIF HPC shared file system is available in the directory /scratch/lustre
.
The system creates directory /scratch/lustre/home/username
for each HPC user, where username is the HPC username.
The files in this file system are equally accessible on all compute nodes and on the hpc node.
Please use these directories only for their purpose and clean them up after calculations.
Partition | Time limit | RAM | Notes |
---|---|---|---|
main | 7d | 7000MB | CPU cluster |
gpu | 48h | 12000MB | GPU cluster |
power | 48h | 2000MB | IBM Power9 cluster |
The time limit for tasks is 2h in all partitions if it has not been specified. The table shows the maximum time limit.
The RAM column gives the amount of RAM allocated to each reserved CPU core.
To use computing resources of the HPC, you need to create task scenarios (sh or csh).
Example:
#!/bin/bash #SBATCH -p main #SBATCH -n4 module load openmpi mpicc -o mpi-test mpi-test.c mpirun mpi-test
After submission and confirmation of your application to the ITOAC services, you need to create a user at https://hpc.mif.vu.lt/. The created user will be included in the relevant project, which will have a certain amount of resources. In order to use the project resources for calculations, you need to provide your allocation number. Below is an example with the allocation parameter “alloc_xxxx_project” (not applicable for VU MIF users, VU MIF users do not have to specify the –account parameter).
#!/bin/bash #SBATCH --account=alloc_xxxx_projektas #SBATCH -p main #SBATCH -n4 #SBATCH --time=minutes module load openmpi mpicc -o mpi-test mpi-test.c mpirun mpi-test
It contains instructions for the task performer as special comments.
-p short - which queue to send to (main, gpu, power).
-n4 - how many processors to reserve (NOTE: if you set the number of cores to be used to x, but actually use fewer cores programmatically, the accounting will still count all the x “requested” cores, so we recommend to calculate this in advance).
The initial running directory of the task is the current directory (pwd) on the login node from where the task is run, unless it was changed to another directory by the -D parameter. For the initial running directory, use the HPC shared filesystem directories /scratch/lustre, as it must exist on the compute node and the job output file slurm-JOBID.out is created there, unless redirected by -o or -i (for these it is advisable to use the shared filesystem as well).
The generated script is sent with the command sbatch,
$ sbatch mpi-test-job
which returns the number of the submitted job JOBID.
The status of a pending or ongoing task can be checked with the command squeue
$ squeue -j JOBID
With the scancel command it is possible to cancel the running of a task or to remove it from the queue
$ scancel JOBID
If you forgot your tasks JOBID, you can check them with the command squeue
$ squeue
Completed tasks are no longer displayed in squeue.
If the specified number of processors is not available, your task is added to the queue. It will remain in the queue until a sufficient number of processors become available or until you remove it with scancel.
The output of the running job is recorded in the file slurm-JOBID.out. The error output is written to the same file unless you specified somewhere else. The file names can be changed with the sbatch command parameters -o (specify the output file) and -e (specify the error file).
More about SLURM opportunities you can read Quick Start User Guide.
Interactive tasks can be done with the srun command:
$ srun --pty $SHELL
The above command will connect you to the compute node environment assigned to SLURM and allow you to directly run and debug programs on it.
After the commands are done disconnect from the compute node with the command
$ exit
If you want to run graphical programs, you need to connect to ssh -X to uosis.mif.vu.lt and hpc:
$ ssh -X uosis.mif.vu.lt $ ssh -X hpc $ srun --pty $SHELL
In power cluster interactive tasks can be performed with
$ srun -p power --mpi=none --pty $SHELL
To use GPU you need to specify additionally
--gres gpu:N
where N is desired GPU amount.
With nvidia-smi
in the task you can check the GPU amount that was dedicated.
Example of an interactive task with 1 GPU:
$ srun -p gpu --gres gpu --pty $SHELL
Ubuntu 18.04 LTS is the packet of 2.1.1 OpenMPI version. To use the newer version 4.0.1 you need to use
module load openmpi/4.0
before running MPI commands.
An example of a simple MPI program is in the directory /scratch/lustre/test/openmpi
. mpicc (mpiCC, mpif77, mpif90, mpifort) is a framework for C (C++, F77, F90, Fortran) compilers that automatically adds the necessary MPI include and library files to the command line.
$ mpicc -o foo foo.c $ mpif77 -o foo foo.f $ mpif90 -o foo foo.f
MPI programs are started with mpirun or mpiexec. You can learn more about them with the man mpirun or man mpiexec command.
A simple (SPMD) program can be started with the following mpirun command line.
$ mpirun foo
All allocated processors will be used according to the number ordered. If you want to use less, you can specify the -np quantity parameter in mpirun. It is not recommended to use less CPU than reserved for a longer time period, as unused CPUs remain free.
ATTENTION It is strictly forbidden to use more CPU than you have reserved, as this may affect the performance of other tasks.
Find more information on OpenMPI.
–mem=X
parameter, the task can reserve more CPUs in proportion to the amount of memory it wants. For example: if you order –mem=14000
in the main queue, at least 2 CPUs will be reserved, unless other parameters specify more. If your task uses less than this, it will be an ineffective use of resources. In addition, it may run slower because it may use other memory than the executing CPU.If your tasks don't start because of AssocGrpCPUMinutesLimit or AssocGrpGRESMinutes, you need to check if there are any unused CPU/GPU resources left from your monthly limit.
The first way to see how much resources are used:
sreport -T cpu,mem,gres/gpu cluster AccountUtilizationByUser Start=0101 End=0131 User=USERNAME
Where the USERNAME - is your MIF user name. Start and End show the start and end days of the current month. You can specify them also by $(date +%m01)
and $(date +%m31)
.
NOTE Usage of resources is given in minutes, divide the number by 60 to get hours.
The second way to see how much resources are used:
sshare -l -A USERNAME_mif -p -o GrpTRESRaw,GrpTRESMins,TRESRunMins
Where USERNAME is your MIF user name. Or specify the account whose usage you want to see in -A. The data is also displayed in minutes: