Differences

This shows you the differences between two versions of the page.

--- en:hpc [2022/07/04 08:33] – [Batch Processing of Tasks (SLURM)] grikiete
+++ en:hpc [2022/07/18 08:09] – [MPI programų vykdymas] grikiete
@@ Line 119: / Line 119: @@
 </code>
-After submission and confirmation of your application to the ITOAC services, you need to create a user at https://hpc.mif.vu.lt/. The created user will be included in the relevant project, which will have a certain amount of resources. In order to use the project resources for calculations, you need to provide your allocation number. Below is an example with the allocation parameter "alloc_xxxx_project" set (not applicable for VU MIF users, VU MIF users do not have to specify the -- account parameter).
+After submission and confirmation of your application to the ITOAC services, you need to create a user at https://hpc.mif.vu.lt/. The created user will be included in the relevant project, which will have a certain amount of resources. In order to use the project resources for calculations, you need to provide your allocation number. Below is an example with the allocation parameter "alloc_xxxx_project" (not applicable for VU MIF users, VU MIF users do not have to specify the --account parameter).
 <code shell mpi-test-job.sh>
@@ Line 133: / Line 133: @@
-Jame kaip specialūs komentarai yra nurodymai užduočių vykdytojui.
+It contains instructions for the task performer as special comments.
- -p short - į kokią eilę siųsti (main, gpu, power).
+ -p short - which queue to send to (main, gpu, power).
- -n4 - kiek procesorių rezervuoti (**PASTABA:** nustačius naudotinų branduolių skaičių x, tačiau realiai programiškai išnaudojant mažiau, apskaitoje vis tiek bus skaičiuojami visi x "užprašyti" branduoliai, todėl rekomenduojame apsiskaičiuoti iš anksto).
+ -n4 - how many processors to reserve (**NOTE:** if you set the number of cores to be used to x, but actually use fewer cores programmatically, the accounting will still count all the x "requested" cores, so we recommend to calculate this in advance).
-Užduoties pradinis einamasis katalogas yra dabartinis katalogas (**pwd**) prisijungimo mazge iš kur paleidžiama užduotis, nebent parametru -D pakeistas į kitą. Pradiniam einamajam katalogui naudokite PST bendros failų sistemos katalogus **/scratch/lustre**, nes jis turi egzistuoti skaičiavimo mazge ir ten yra kuriamas užduoties išvesties failas **slurm-JOBID.out**, nebent nukreiptas kitur parametrais -o arba -i (jiems irgi patariama naudoti bendrą failų sistemą).
+The initial running directory of the task is the current directory (**pwd**) on the login node from where the task is run, unless it was changed to another directory by the -D parameter. For the initial running directory, use the HPC shared filesystem directories **/scratch/lustre**, as it must exist on the compute node and the job output file **slurm-JOBID.out** is created there, unless redirected by -o or -i (for these it is advisable to use the shared filesystem as well).
-Suformuotą scenarijų siunčiame su komanda sbatch
+The generated script is sent with the command //sbatch//,
 ''$ sbatch mpi-test-job''
-kuri gražina pateiktos užduoties numerį **JOBID**.
+which returns the number of the submitted job **JOBID**.
-Laukiančios arba vykdomos užduoties būseną galima sužinoti su komanda squeue
+The status of a pending or ongoing task can be checked with the command //squeue//
 ''$ squeue -j JOBID''
-Su komanda scancel galima nutraukti užduoties vykdymą arba išimti ją iš eilės
+With the //scancel// command it is possible to cancel the running of a task or to remove it from the queue
 ''$ scancel JOBID''
-Jeigu neatsimenate savo užduočių **JOBID**, tai galite pasižiūrėti su komanda **squeue**
+If you forgot your tasks **JOBID**, you can check them with the command //squeue//
 ''$ squeue''
-Užbaigtų užduočių **squeue** jau neberodo.
+Completed tasks are no longer displayed in **squeue**.
-Jeigu nurodytas procesorių kiekis nėra pasiekiamas, tai jūsų užduotis yra įterpiama į eilę. Joje ji bus kol atsilaisvins pakankamas kiekis procesorių arba kol jūs ją pašalinsite su **scancel**.
+If the specified number of processors is not available, your task is added to the queue. It will remain in the queue until a sufficient number of processors become available or until you remove it with **scancel**.
-Vykdomos užduoties išvestis (**output**) yra įrašoma į failą **slurm-JOBID.out**. Jei nenurodyta kitaip, tai ir klaidų (error) išvestis yra įrašoma į tą patį failą. Failų vardus galima pakeisti su komandos **sbatch** parametrais -o (nurodyti išvesties failą) ir -e (nurodyti klaidų failą).
+The **output** of the running job is recorded in the file **slurm-JOBID.out**. The error output is written to the same file unless you specified somewhere else. The file names can be changed with the **sbatch** command parameters -o (specify the output file) and -e (specify the error file).
-Daugiau apie SLURM galimybes galite paskaityti [[https://slurm.schedmd.com/quickstart.html|Quick Start User Guide]].
+More about SLURM opportunities you can read [[https://slurm.schedmd.com/quickstart.html|Quick Start User Guide]].
+====== Interactive Tasks (SLURM) ======
+Interactive tasks can be done with the //srun// command:
+<code>
+$ srun --pty $SHELL
+</code>
+The above command will connect you to the compute node environment assigned to SLURM and allow you to directly run and debug programs on it.
+After the commands are done disconnect from the compute node with the command
+<code>
+$ exit
+</code>
+If you want to run graphical programs, you need to connect to **ssh -X** to **uosis.mif.vu.lt** and **hpc**:
+<code>
+$ ssh -X uosis.mif.vu.lt
+$ ssh -X hpc
+$ srun --pty $SHELL
+</code>
+In **power** cluster interactive tasks can be performed with
+<code>
+$ srun -p power --mpi=none --pty $SHELL
+</code>
+====== GPU Tasks (SLURM) ======
+To use GPU you need to specify additionally <code>--gres gpu:N</code> where N is desired GPU amount.
+With ''nvidia-smi'' in the task you can check the GPU amount that was dedicated.
+Example of an interactive task with 1 GPU:
+<code>
+$ srun -p gpu --gres gpu --pty $SHELL
+</code>
+====== Introduction to OpenMPI ======
+Ubuntu 18.04 LTS is the packet of **2.1.1** OpenMPI version.
+To use the newer version **4.0.1** you need to use
+<code>
+module load openmpi/4.0
+</code>
+before running MPI commands.
+===== MPI Compiling Programs =====
+An example of a simple MPI program is in the directory ''/scratch/lustre/test/openmpi''. **mpicc** (**mpiCC**, **mpif77**, **mpif90**, **mpifort**) is a framework for C (C++, F77, F90, Fortran) compilers that automatically adds the necessary **MPI** include and library files to the command line.
+<code>
+$ mpicc -o foo foo.c
+$ mpif77 -o foo foo.f
+$ mpif90 -o foo foo.f
+</code>
+===== Implementation of MPI Programmes =====
+MPI programs are started with **mpirun** or **mpiexec**. You can learn more about them with the **man mpirun** or **man mpiexec** command.
+A simple (SPMD) program can be started with the following mpirun command line.
+<code>
+$ mpirun foo
+</code>
+Tai naudos visus paskirtus procesorius, pagal tai, kiek jų buvo užsakyta. Jeigu norima pasinaudoti mažiau, tai **mpirun** galima nurodyti parametrą ''-np kiekis''. Nepageidaujama ilgesniam laikui naudoti mažiau, nei rezervuota, nes neišnaudoti CPU lieka laisvi. Didesnį kiekį, nei rezervuotą, yra griežtai draudžiama naudoti, nes tai gali turėti įtakos kitų užduočių vykdymui.
+Daugiau apie instaliuotą **OpenMPI** yra [[https://www.open-mpi.org|OpenMPI]] puslapyje.
+====== Užduočių efektyvumas ======
+  * Prašome išnaudoti ne mažiau 50% užsakyto CPU kiekio.
+  * Naudoti daugiau CPU, nei užsakyta, nepadidins efektyvumo, nes jūsų užduotis galės naudoti tik tiek CPU, kiek buvo užsakyta.
+  * Jeigu naudosite parametrą ''--mem=X'', tai užduotis gali rezervuoti daugiau **CPUs** proporcingai norimos atminties kiekiui. Pvz: užsakius ''--mem=14000'' eilėje **main**, bus užsakyti ne mažiau 2 CPUs, jei kiti parametrai nenurodo daugiau. Jeigu jūsų užduotis naudos mažiau, tai bus neefektyvus resursų naudojimas, be to tai gali veikti lėčiau, nes gali būti naudojama kita, nei vykdančio, procesoriaus atmintis.
+====== Resursų limitai ======
+Jeigu jūsų užduotys nestartuoja su priežastimi **AssocGrpCPUMinutesLimit** arba **AssocGrpGRESMinutes**,
+ tai pasitikrinkite ar užduotims dar liko neišnaudotų CPU/GPU resursų iš (mėnesio) limito.
+Peržiūrėti kiek išnaudota resursų
+<code>
+sreport -T cpu,mem,gres/gpu cluster AccountUtilizationByUser Start=0101 End=0131 User=USERNAME
+</code>
+kur **USERNAME** jūsų MIF naudotojo vardas, o **Start** ir **End** nurodo einamojo mėnesio pradžios ir pabaigos datas. Jas galima nurodyti ir kaip ''$(date +%m01)'' ir ''$(date +%m31)'', kas nurodo einamojo mėnesio pirmą ir paskutines dienas.
+Atkreipkite dėmesį, kad naudojimas pateikiamas minutėmis, o į valandas konvertuoti reikia dalinant iš 60.
+Kitas būdas pažiūrėti limitus ir jų išnaudojimą
+<code>
+sshare -l -A USERNAME_mif -p -o GrpTRESRaw,GrpTRESMins,TRESRunMins
+</code>
+kur **USERNAME** MIF naudotojo vardas. Arba parametre **-A** nurodyti tą sąskaitą (account), kurio naudojimą norima pažiūrėti. Duomenys pateikiami minutėmis. **GrpTRESRaw** - kiek išnaudota. **GrpTRESMins** - koks yra limitas. **TRESRunMins** - likę resursai dar vis vykdomų užduočių.