IT wiki

VU MIF STSC

User Tools

Site Tools


en:hpc

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
en:hpc [2022/04/15 06:29] – [Registration] grikieteen:hpc [2022/07/04 08:36] – [Batch Processing of Tasks (SLURM)] grikiete
Line 1: Line 1:
 ====== Description of the Equipment ====== ====== Description of the Equipment ======
  
-Distributed Computing Network (DCN) is a specially designed network of computers capable of running applications that can exchange data efficiently.+High Performance Computing (HPC) is a specially designed network of computers capable of running applications that can exchange data efficiently.
  
-VU MIF PST consists of a supercomputer from the clusters (the first number is the actual and available amount):+VU MIF HPC consists of a supercomputer from the clusters (the first number is the actual and available amount):
  
 ^Title ^Nodes ^CPU ^GPU ^RAM        ^HDD    ^Network ^Notes| ^Title ^Nodes ^CPU ^GPU ^RAM        ^HDD    ^Network ^Notes|
Line 22: Line 22:
 With the command [[https://sylabs.io/guides/3.2/user-guide/index.html|singularity]] it is possible to make use of ready-made copies of container files in directories ''/apps/local/hpc'', ''/apps/local/nvidia'', ''/apps/local/intel'', ''/apps/local/lang'' or to download from singularity and docker online repositories. You can also create your own singularity containers using the MIF cloud service. With the command [[https://sylabs.io/guides/3.2/user-guide/index.html|singularity]] it is possible to make use of ready-made copies of container files in directories ''/apps/local/hpc'', ''/apps/local/nvidia'', ''/apps/local/intel'', ''/apps/local/lang'' or to download from singularity and docker online repositories. You can also create your own singularity containers using the MIF cloud service.
  
-With singularity you can prepare your container, for example:+You can prepare your container with singularity, for example:
 <code shell> <code shell>
 $ singularity build --sandbox /tmp/python docker://python:3.8 $ singularity build --sandbox /tmp/python docker://python:3.8
Line 45: Line 45:
 There are ready-made scripts to run your **hadoop** tasks using the [[https://github.com/LLNL/magpie|Magpie]] set in the directory ''/apps/local/bigdata''. There are ready-made scripts to run your **hadoop** tasks using the [[https://github.com/LLNL/magpie|Magpie]] set in the directory ''/apps/local/bigdata''.
  
-With [[https://hpc.mif.vu.lt/hub/|JupyterHub]] you can run calculations with the python command line in a web browser and use the [[https://jupyter.org|JupyterLab]] environment. If you install your own JupyterLab environment in your home directory, you need to install the additional ''batchspawner'' package - this will start your environment, example:+With [[https://hpc.mif.vu.lt/hub/|JupyterHub]] you can run calculations with the python command line in a web browser and use the [[https://jupyter.org|JupyterLab]] environment. If you install your own JupyterLab environment in your home directory, you need to install the additional ''batchspawner'' package - this will start your environment, for example:
  
 <code shell> <code shell>
Line 60: Line 60:
 ====== Registration ====== ====== Registration ======
  
-The DCN can only be used by registered users of the VU MIF computer network. Existing **VU MIF network users** can use DCN without **additional registration**.+  * **For VU MIF network users** - HPC can be used without additional registration if the available resources are enough (monthly limit - **100 CPU-h and 6 GPU-h**). Once this limit has been reached, you can request more by filling in [[https://forms.office.com/Pages/ResponsePage.aspx?id=ghrFgo1UykO8-b9LfrHQEidLsh79nRJAvOP_wV9sgmdUM0ZMR1FINFg3TzVaNlhDSEhUN1A3QTlVUC4u|ITOAC service request form]]
  
-Registracijai reikia užpildyti [[http://mif.vu.lt/itapc#paslaug%C5%B3-u%C5%BEsakymas|ITAPC paslaugų užsakymo formą]] ir pateikti ten nurodytu adresuParaišką patenkinussuteikiamas VU MIF kompiuterių tinklo naudotojo vardasJei esate VU darbuotojas ar studentas ir registracijos metu nurodėte savo VU elpašto adresąTada įvesti savo pradinį slaptažodįgalite per [[https://mif.vu.lt/passwd2|pamiršto slaptažodžio]] pakeitimo procedūrąnaudodami [[https://id.vu.lt|VU E.tapatybės]] duomenisKitu atveju teks atvykti į VU MIF Didlaukio g47302/304 kabdarbo metudėl tikslaus laiko galite pasitikslinti tel8 5219 5005 arba 8 5219 5006.+  * **For users of the VU computer network** - you must fill in the [[https://forms.office.com/Pages/ResponsePage.aspx?id=ghrFgo1UykO8-b9LfrHQEidLsh79nRJAvOP_wV9sgmdUM0ZMR1FINFg3TzVaNlhDSEhUN1A3QTlVUC4u|ITOAC service request form]] to get access to MIF HPC. After the confirmation of your request, you must create your account in [[https://hpc.mif.vu.lt|Waldur portal]]. More details read [[waldur|here]]. 
 + 
 +  * **For other users (non-members of the VU community)** - you must fill in the [[https://forms.office.com/Pages/ResponsePage.aspx?id=ghrFgo1UykO8-b9LfrHQEidLsh79nRJAvOP_wV9sgmdUMDE1QUo3Slo3UVYwTjM4TDMyTEdZT0tSNi4u|ITOAC service request form]] to get access to MIF HPCAfter the confirmation of your requestyou must come to VU MIF Didlaukio str47, Room 302/304 to receive your login credentials. Please arranged the exact time by phone + 370 5219 5005. With these credentials you are able to create an account in [[https://hpc.mif.vu.lt|Waldur portal]]. More details read [[waldur|here]]. 
 + 
 +====== Connection ====== 
 + 
 +You need to use SSH applications (ssh, putty, winscp, mobaxterm) and Kerberos or SSH key authentication to connect to **HPC**. 
 + 
 +If **Kerberos** is used: 
 + 
 +  * Log in to the Linux environment in a VU MIF classroom or public terminal with your VU MIF username and password or login to **uosis.mif.vu.lt** with your VU MIF username and password using **ssh** or **putty**. 
 +  * Check if you have a valid Kerberos key (ticket) with the **klist** command. If the key is not available or has expired, the **kinit** command must be used. 
 +  * Connect to the **hpc** node with the command **ssh hpc** (password must not be required). 
 + 
 +If **SSH keys** are used (e.g. if you need to copy big files): 
 +  * If you don't have SSH keysyou can find instructions on how to create them in a Windows environment **[[duk:ssh_key|here]]** 
 +  *     Before you can use this method, you need to log in with Kerberos at least once. Then create a ''~/.ssh'' directory in the HPC file system and put your **ssh public key** (in OpenSSH format) into the ''~/.ssh/authorized_keys'' file. 
 +  *     Connect with **ssh**, **sftp**, **scp**, **putty**, **winscp** or any other **ssh** protocol supported software to **hpc.mif.vu.lt** with your **ssh private key**, specifying your VU MIF user name. It should not require a login password, but may require your ssh private key password. 
 + 
 +The **first time** you connect, you **will not** be able to run **SLURM jobs** for the first **5 minutes**. After that, SLURM account will be created. 
 + 
 +====== Lustre - Shared File System ====== 
 + 
 +VU MIF HPC shared file system is available in the directory ''/scratch/lustre''
 + 
 +The system creates directory ''/scratch/lustre/home/username'' for each HPC user, where **username** is the HPC username. 
 + 
 +The files in this file system are equally accessible on all compute nodes and on the **hpc** node. 
 + 
 +Please use these directories only for their purpose and clean them up after calculations. 
 + 
 +====== HPC Partition ====== 
 + 
 +^Partition ^Time limit ^RAM    ^Notes| 
 +^main             ^7d            ^7000MB  ^CPU cluster| 
 +^gpu              ^48h           ^12000MB ^GPU cluster| 
 +^power            ^48h           ^2000MB  ^IBM Power9 cluster| 
 + 
 +The time limit for tasks is **2h** in all partitions if it has not been specified. The table shows the maximum time limit. 
 + 
 +The **RAM** column gives the amount of RAM allocated to each reserved **CPU** core. 
 + 
 +====== Batch Processing of Tasks (SLURM) ====== 
 + 
 +To use computing resources of the HPCyou need to create task scenarios (sh or csh).  
 + 
 +Example: 
 + 
 +<code shell mpi-test-job.sh> 
 +#!/bin/bash 
 +#SBATCH -p main 
 +#SBATCH -n4 
 +module load openmpi 
 +mpicc -o mpi-test mpi-test.c 
 +mpirun mpi-test 
 +</code> 
 + 
 +After submission and confirmation of your application to the ITOAC services, you need to create a user at https://hpc.mif.vu.lt/. The created user will be included in the relevant project, which will have a certain amount of resources. In order to use the project resources for calculations, you need to provide your allocation number. Below is an example with the allocation parameter "alloc_xxxx_project" (not applicable for VU MIF users, VU MIF users do not have to specify the --account parameter). 
 + 
 +<code shell mpi-test-job.sh> 
 +#!/bin/bash 
 +#SBATCH --account=alloc_xxxx_projektas 
 +#SBATCH -p main 
 +#SBATCH -n4 
 +#SBATCH --time=minutes 
 +module load openmpi 
 +mpicc -o mpi-test mpi-test.c 
 +mpirun mpi-test 
 +</code> 
 + 
 + 
 +Jame kaip specialūkomentarai yra nurodymai užduočių vykdytojui. 
 + 
 + -p short - į kokią eilę siųsti (main, gpu, power). 
 + 
 + -n4 - kiek procesorių rezervuoti (**PASTABA:** nustačius naudotinų branduolių skaičių xtačiau realiai programiškai išnaudojant mažiau, apskaitoje vis tiek bus skaičiuojami visi x "užprašyti" branduoliai, todėl rekomenduojame apsiskaičiuoti iš anksto). 
 + 
 +Užduoties pradinis einamasis katalogas yra dabartinis katalogas (**pwd**) prisijungimo mazge iš kur paleidžiama užduotis, nebent parametru -D pakeistas į kitą. Pradiniam einamajam katalogui naudokite PST bendros failų sistemos katalogus **/scratch/lustre**, nes jis turi egzistuoti skaičiavimo mazge ir ten yra kuriamas užduoties išvesties failas **slurm-JOBID.out**nebent nukreiptas kitur parametrais -o arba -i (jiems irgi patariama naudoti bendrą failų sistemą). 
 + 
 +Suformuotą scenarijų siunčiame su komanda sbatch 
 + 
 +''$ sbatch mpi-test-job'' 
 + 
 +kuri gražina pateiktos užduoties numerį **JOBID**. 
 + 
 +Laukiančios arba vykdomos užduoties būseną galima sužinoti su komanda squeue 
 + 
 +''$ squeue -j JOBID'' 
 + 
 +Su komanda scancel galima nutraukti užduoties vykdymą arba išimti ją iš eilė
 + 
 +''$ scancel JOBID'' 
 + 
 +Jeigu neatsimenate savo užduočių **JOBID**, tai galite pasižiūrėti su komanda **squeue** 
 + 
 +''$ squeue'' 
 + 
 +Užbaigtų užduočių **squeue** jau neberodo. 
 + 
 +Jeigu nurodytas procesorių kiekis nėra pasiekiamas, tai jūsų užduotis yra įterpiama į eilę. Joje ji bus kol atsilaisvins pakankamas kiekis procesorių arba kol jūs ją pašalinsite su **scancel**. 
 + 
 +Vykdomos užduoties išvestis (**output**) yra įrašoma į failą **slurm-JOBID.out**. Jei nenurodyta kitaip, tai ir klaidų (error) išvestis yra įrašoma į tą patį failą. Failų vardus galima pakeisti su komandos **sbatch** parametrais -o (nurodyti išvesties failą) ir -e (nurodyti klaidų failą). 
 + 
 +Daugiau apie SLURM galimybes galite paskaityti [[https://slurm.schedmd.com/quickstart.html|Quick Start User Guide]].
  
-Su suteiktu (pasirinktu) naudotojo vardu ir savo įvestu slaptažodžiu įgyjama teisė jungtis prie serverio **uosis.mif.vu.lt**, VU MIF mokymo klasių ir dalies VU MIF darbo vietų kompiuterių. 
  
-Adresu [[https://hpc.mif.vu.lt|Waldur]] yra savitarnos portalas, kur su savo universiteto (per **eduGAIN** arba **LITNET**) prisijungimu galima pačiam susikurti **HPC** prisijungimą. Daugiau info apie tai [[waldur|čia]]. 
  
en/hpc.txt · Last modified: 2024/02/21 12:50 by rolnas

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki