en:hpc [2022/04/15 06:29] – [Registration] grikieteen:hpc [2022/07/04 08:36] – [Batch Processing of Tasks (SLURM)] grikiete
Line 1: Line 1:
 ====== Description of the Equipment ====== ====== Description of the Equipment ======
-Distributed Computing Network (DCN) is a specially designed network of computers capable of running applications that can exchange data efficiently.+High Performance Computing (HPC) is a specially designed network of computers capable of running applications that can exchange data efficiently.
-VU MIF PST consists of a supercomputer from the clusters (the first number is the actual and available amount):+VU MIF HPC consists of a supercomputer from the clusters (the first number is the actual and available amount):
 ^Title ^Nodes ^CPU ^GPU ^RAM        ^HDD    ^Network ^Notes| ^Title ^Nodes ^CPU ^GPU ^RAM        ^HDD    ^Network ^Notes|
Line 22: Line 22:
 With the command [[|singularity]] it is possible to make use of ready-made copies of container files in directories ''/apps/local/hpc'', ''/apps/local/nvidia'', ''/apps/local/intel'', ''/apps/local/lang'' or to download from singularity and docker online repositories. You can also create your own singularity containers using the MIF cloud service. With the command [[|singularity]] it is possible to make use of ready-made copies of container files in directories ''/apps/local/hpc'', ''/apps/local/nvidia'', ''/apps/local/intel'', ''/apps/local/lang'' or to download from singularity and docker online repositories. You can also create your own singularity containers using the MIF cloud service.
-With singularity you can prepare your container, for example:+You can prepare your container with singularity, for example:
 <code shell> <code shell>
 $ singularity build --sandbox /tmp/python docker://python:3.8 $ singularity build --sandbox /tmp/python docker://python:3.8
Line 45: Line 45:
 There are ready-made scripts to run your **hadoop** tasks using the [[|Magpie]] set in the directory ''/apps/local/bigdata''. There are ready-made scripts to run your **hadoop** tasks using the [[|Magpie]] set in the directory ''/apps/local/bigdata''.
-With [[|JupyterHub]] you can run calculations with the python command line in a web browser and use the [[|JupyterLab]] environment. If you install your own JupyterLab environment in your home directory, you need to install the additional ''batchspawner'' package - this will start your environment, example:+With [[|JupyterHub]] you can run calculations with the python command line in a web browser and use the [[|JupyterLab]] environment. If you install your own JupyterLab environment in your home directory, you need to install the additional ''batchspawner'' package - this will start your environment, for example:
 <code shell> <code shell>
Line 60: Line 60:
 ====== Registration ====== ====== Registration ======
-The DCN can only be used by registered users of the VU MIF computer network. Existing **VU MIF network users** can use DCN without **additional registration**.+  * **For VU MIF network users** - HPC can be used without additional registration if the available resources are enough (monthly limit - **100 CPU-h and 6 GPU-h**). Once this limit has been reached, you can request more by filling in [[|ITOAC service request form]]
-Registracijai reikia užpildyti [[|ITAPC paslaugų užsakymo formą]] ir pateikti ten nurodytu adresuParaišką patenkinussuteikiamas VU MIF kompiuterių tinklo naudotojo vardasJei esate VU darbuotojas ar studentas ir registracijos metu nurodėte savo VU elpašto adresąTada įvesti savo pradinį slaptažodįgalite per [[|pamiršto slaptažodžio]] pakeitimo procedūrąnaudodami [[|VU E.tapatybės]] duomenisKitu atveju teks atvykti į VU MIF Didlaukio g47302/304 kabdarbo metudėl tikslaus laiko galite pasitikslinti tel8 5219 5005 arba 8 5219 5006.+  * **For users of the VU computer network** - you must fill in the [[|ITOAC service request form]] to get access to MIF HPC. After the confirmation of your request, you must create your account in [[|Waldur portal]]. More details read [[waldur|here]]. 
 +  * **For other users (non-members of the VU community)** - you must fill in the [[|ITOAC service request form]] to get access to MIF HPCAfter the confirmation of your requestyou must come to VU MIF Didlaukio str47, Room 302/304 to receive your login credentials. Please arranged the exact time by phone + 370 5219 5005. With these credentials you are able to create an account in [[|Waldur portal]]. More details read [[waldur|here]]. 
 +====== Connection ====== 
 +You need to use SSH applications (ssh, putty, winscp, mobaxterm) and Kerberos or SSH key authentication to connect to **HPC**. 
 +If **Kerberos** is used: 
 +  * Log in to the Linux environment in a VU MIF classroom or public terminal with your VU MIF username and password or login to **** with your VU MIF username and password using **ssh** or **putty**. 
 +  * Check if you have a valid Kerberos key (ticket) with the **klist** command. If the key is not available or has expired, the **kinit** command must be used. 
 +  * Connect to the **hpc** node with the command **ssh hpc** (password must not be required). 
 +If **SSH keys** are used (e.g. if you need to copy big files): 
 +  * If you don't have SSH keysyou can find instructions on how to create them in a Windows environment **[[duk:ssh_key|here]]** 
 +  *     Before you can use this method, you need to log in with Kerberos at least once. Then create a ''~/.ssh'' directory in the HPC file system and put your **ssh public key** (in OpenSSH format) into the ''~/.ssh/authorized_keys'' file. 
 +  *     Connect with **ssh**, **sftp**, **scp**, **putty**, **winscp** or any other **ssh** protocol supported software to **** with your **ssh private key**, specifying your VU MIF user name. It should not require a login password, but may require your ssh private key password. 
 +The **first time** you connect, you **will not** be able to run **SLURM jobs** for the first **5 minutes**. After that, SLURM account will be created. 
 +====== Lustre - Shared File System ====== 
 +VU MIF HPC shared file system is available in the directory ''/scratch/lustre''
 +The system creates directory ''/scratch/lustre/home/username'' for each HPC user, where **username** is the HPC username. 
 +The files in this file system are equally accessible on all compute nodes and on the **hpc** node. 
 +Please use these directories only for their purpose and clean them up after calculations. 
 +====== HPC Partition ====== 
 +^Partition ^Time limit ^RAM    ^Notes| 
 +^main             ^7d            ^7000MB  ^CPU cluster| 
 +^gpu              ^48h           ^12000MB ^GPU cluster| 
 +^power            ^48h           ^2000MB  ^IBM Power9 cluster| 
 +The time limit for tasks is **2h** in all partitions if it has not been specified. The table shows the maximum time limit. 
 +The **RAM** column gives the amount of RAM allocated to each reserved **CPU** core. 
 +====== Batch Processing of Tasks (SLURM) ====== 
 +To use computing resources of the HPCyou need to create task scenarios (sh or csh).  
 +<code shell> 
 +#SBATCH -p main 
 +#SBATCH -n4 
 +module load openmpi 
 +mpicc -o mpi-test mpi-test.c 
 +mpirun mpi-test 
 +After submission and confirmation of your application to the ITOAC services, you need to create a user at The created user will be included in the relevant project, which will have a certain amount of resources. In order to use the project resources for calculations, you need to provide your allocation number. Below is an example with the allocation parameter "alloc_xxxx_project" (not applicable for VU MIF users, VU MIF users do not have to specify the --account parameter). 
 +<code shell> 
 +#SBATCH --account=alloc_xxxx_projektas 
 +#SBATCH -p main 
 +#SBATCH -n4 
 +#SBATCH --time=minutes 
 +module load openmpi 
 +mpicc -o mpi-test mpi-test.c 
 +mpirun mpi-test 
 +Jame kaip specialūkomentarai yra nurodymai užduočių vykdytojui. 
 + -p short - į kokią eilę siųsti (main, gpu, power). 
 + -n4 - kiek procesorių rezervuoti (**PASTABA:** nustačius naudotinų branduolių skaičių xtačiau realiai programiškai išnaudojant mažiau, apskaitoje vis tiek bus skaičiuojami visi x "užprašyti" branduoliai, todėl rekomenduojame apsiskaičiuoti iš anksto). 
 +Užduoties pradinis einamasis katalogas yra dabartinis katalogas (**pwd**) prisijungimo mazge iš kur paleidžiama užduotis, nebent parametru -D pakeistas į kitą. Pradiniam einamajam katalogui naudokite PST bendros failų sistemos katalogus **/scratch/lustre**, nes jis turi egzistuoti skaičiavimo mazge ir ten yra kuriamas užduoties išvesties failas **slurm-JOBID.out**nebent nukreiptas kitur parametrais -o arba -i (jiems irgi patariama naudoti bendrą failų sistemą). 
 +Suformuotą scenarijų siunčiame su komanda sbatch 
 +''$ sbatch mpi-test-job'' 
 +kuri gražina pateiktos užduoties numerį **JOBID**. 
 +Laukiančios arba vykdomos užduoties būseną galima sužinoti su komanda squeue 
 +''$ squeue -j JOBID'' 
 +Su komanda scancel galima nutraukti užduoties vykdymą arba išimti ją iš eilė
 +''$ scancel JOBID'' 
 +Jeigu neatsimenate savo užduočių **JOBID**, tai galite pasižiūrėti su komanda **squeue** 
 +''$ squeue'' 
 +Užbaigtų užduočių **squeue** jau neberodo. 
 +Jeigu nurodytas procesorių kiekis nėra pasiekiamas, tai jūsų užduotis yra įterpiama į eilę. Joje ji bus kol atsilaisvins pakankamas kiekis procesorių arba kol jūs ją pašalinsite su **scancel**. 
 +Vykdomos užduoties išvestis (**output**) yra įrašoma į failą **slurm-JOBID.out**. Jei nenurodyta kitaip, tai ir klaidų (error) išvestis yra įrašoma į tą patį failą. Failų vardus galima pakeisti su komandos **sbatch** parametrais -o (nurodyti išvesties failą) ir -e (nurodyti klaidų failą). 
 +Daugiau apie SLURM galimybes galite paskaityti [[|Quick Start User Guide]].
-Su suteiktu (pasirinktu) naudotojo vardu ir savo įvestu slaptažodžiu įgyjama teisė jungtis prie serverio ****, VU MIF mokymo klasių ir dalies VU MIF darbo vietų kompiuterių. 
-Adresu [[|Waldur]] yra savitarnos portalas, kur su savo universiteto (per **eduGAIN** arba **LITNET**) prisijungimu galima pačiam susikurti **HPC** prisijungimą. Daugiau info apie tai [[waldur|čia]]. 
