IT wiki

VU MIF STSC

User Tools

Site Tools


en:hpc

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
en:hpc [2022/07/04 08:13] – [PST eilės (partition)] grikieteen:hpc [2022/07/18 07:45] – [Įvadas į OpenMPI] grikiete
Line 93: Line 93:
 Please use these directories only for their purpose and clean them up after calculations. Please use these directories only for their purpose and clean them up after calculations.
  
-====== PST Partition ======+====== HPC Partition ======
  
 ^Partition ^Time limit ^RAM    ^Notes| ^Partition ^Time limit ^RAM    ^Notes|
Line 104: Line 104:
 The **RAM** column gives the amount of RAM allocated to each reserved **CPU** core. The **RAM** column gives the amount of RAM allocated to each reserved **CPU** core.
  
 +====== Batch Processing of Tasks (SLURM) ======
 +
 +To use computing resources of the HPC, you need to create task scenarios (sh or csh). 
 +
 +Example:
 +
 +<code shell mpi-test-job.sh>
 +#!/bin/bash
 +#SBATCH -p main
 +#SBATCH -n4
 +module load openmpi
 +mpicc -o mpi-test mpi-test.c
 +mpirun mpi-test
 +</code>
 +
 +After submission and confirmation of your application to the ITOAC services, you need to create a user at https://hpc.mif.vu.lt/. The created user will be included in the relevant project, which will have a certain amount of resources. In order to use the project resources for calculations, you need to provide your allocation number. Below is an example with the allocation parameter "alloc_xxxx_project" (not applicable for VU MIF users, VU MIF users do not have to specify the --account parameter).
 +
 +<code shell mpi-test-job.sh>
 +#!/bin/bash
 +#SBATCH --account=alloc_xxxx_projektas
 +#SBATCH -p main
 +#SBATCH -n4
 +#SBATCH --time=minutes
 +module load openmpi
 +mpicc -o mpi-test mpi-test.c
 +mpirun mpi-test
 +</code>
 +
 +
 +It contains instructions for the task performer as special comments.
 +
 + -p short - which queue to send to (main, gpu, power).
 +
 + -n4 - how many processors to reserve (**NOTE:** if you set the number of cores to be used to x, but actually use fewer cores programmatically, the accounting will still count all the x "requested" cores, so we recommend to calculate this in advance).
 +
 +The initial running directory of the task is the current directory (**pwd**) on the login node from where the task is run, unless it was changed to another directory by the -D parameter. For the initial running directory, use the HPC shared filesystem directories **/scratch/lustre**, as it must exist on the compute node and the job output file **slurm-JOBID.out** is created there, unless redirected by -o or -i (for these it is advisable to use the shared filesystem as well).
 +
 +The generated script is sent with the command //sbatch//,
 +
 +''$ sbatch mpi-test-job''
 +
 +which returns the number of the submitted job **JOBID**.
 +
 +The status of a pending or ongoing task can be checked with the command //squeue//
 +
 +''$ squeue -j JOBID''
 +
 +With the //scancel// command it is possible to cancel the running of a task or to remove it from the queue
 +
 +''$ scancel JOBID''
 +
 +If you forgot your tasks **JOBID**, you can check them with the command //squeue//
 +
 +''$ squeue''
 +
 +Completed tasks are no longer displayed in **squeue**.
 +
 +If the specified number of processors is not available, your task is added to the queue. It will remain in the queue until a sufficient number of processors become available or until you remove it with **scancel**.
 +
 +The **output** of the running job is recorded in the file **slurm-JOBID.out**. The error output is written to the same file unless you specified somewhere else. The file names can be changed with the **sbatch** command parameters -o (specify the output file) and -e (specify the error file).
 +
 +More about SLURM opportunities you can read [[https://slurm.schedmd.com/quickstart.html|Quick Start User Guide]].
 +
 +====== Interactive Tasks (SLURM) ======
 +
 +Interactive tasks can be done with the //srun// command:
 +
 +<code>
 +$ srun --pty $SHELL
 +</code>
 +
 +The above command will connect you to the compute node environment assigned to SLURM and allow you to directly run and debug programs on it.
 +
 +After the commands are done disconnect from the compute node with the command
 +
 +<code>
 +$ exit
 +</code>
 +
 +If you want to run graphical programs, you need to connect to **ssh -X** to **uosis.mif.vu.lt** and **hpc**:
 +
 +<code>
 +$ ssh -X uosis.mif.vu.lt
 +$ ssh -X hpc
 +$ srun --pty $SHELL
 +</code>
 +
 +In **power** cluster interactive tasks can be performed with
 +
 +<code>
 +$ srun -p power --mpi=none --pty $SHELL
 +</code>
 +
 +====== GPU Tasks (SLURM) ======
 +
 +To use GPU you need to specify additionally <code>--gres gpu:N</code> where N is desired GPU amount.
 +
 +With ''nvidia-smi'' in the task you can check the GPU amount that was dedicated.
 +
 +Example of an interactive task with 1 GPU:
 +<code>
 +$ srun -p gpu --gres gpu --pty $SHELL
 +</code>
 +
 +====== Introduction to OpenMPI ======
 +
 +Ubuntu 18.04 LTS is the packet of **2.1.1** OpenMPI version.
 +To use the newer version **4.0.1** you need to use
 +<code>
 +module load openmpi/4.0
 +</code>
 +before running MPI commands.
 +
 +===== MPI programų kompiliavimas =====
 +
 +Paprastos MPI programos pavyzdys yra kataloge ''/scratch/lustre/test/openmpi''. **mpicc** (**mpiCC**, **mpif77**, **mpif90**, **mpifort**) yra apvalkalai C (C++, F77, F90, Fortran) kompiliatoriams, kurie automatiškai įtraukia į komandų eilutę reikiamus **MPI** intarpų (include) ir bibliotekų failus.
 +
 +<code>
 +$ mpicc -o foo foo.c
 +$ mpif77 -o foo foo.f
 +$ mpif90 -o foo foo.f
 +</code>
 +===== MPI programų vykdymas =====
 +
 +MPI programos startuojamos su **mpirun** arba **mpiexec** programa. Daugiau apie jas galima sužinoti su komanda **man mpirun** arba **man mpiexec**.
 +
 +Paprasta (SPMD) programa gali būti startuojama su tokia mpirun komandų eilute.
 +
 +<code>
 +$ mpirun foo
 +</code>
 +
 +Tai naudos visus paskirtus procesorius, pagal tai, kiek jų buvo užsakyta. Jeigu norima pasinaudoti mažiau, tai **mpirun** galima nurodyti parametrą ''-np kiekis''. Nepageidaujama ilgesniam laikui naudoti mažiau, nei rezervuota, nes neišnaudoti CPU lieka laisvi. Didesnį kiekį, nei rezervuotą, yra griežtai draudžiama naudoti, nes tai gali turėti įtakos kitų užduočių vykdymui.
 +
 +Daugiau apie instaliuotą **OpenMPI** yra [[https://www.open-mpi.org|OpenMPI]] puslapyje.
 +
 +====== Užduočių efektyvumas ======
 +
 +  * Prašome išnaudoti ne mažiau 50% užsakyto CPU kiekio.
 +  * Naudoti daugiau CPU, nei užsakyta, nepadidins efektyvumo, nes jūsų užduotis galės naudoti tik tiek CPU, kiek buvo užsakyta.
 +  * Jeigu naudosite parametrą ''--mem=X'', tai užduotis gali rezervuoti daugiau **CPUs** proporcingai norimos atminties kiekiui. Pvz: užsakius ''--mem=14000'' eilėje **main**, bus užsakyti ne mažiau 2 CPUs, jei kiti parametrai nenurodo daugiau. Jeigu jūsų užduotis naudos mažiau, tai bus neefektyvus resursų naudojimas, be to tai gali veikti lėčiau, nes gali būti naudojama kita, nei vykdančio, procesoriaus atmintis.
 +
 +====== Resursų limitai ======
 +
 +Jeigu jūsų užduotys nestartuoja su priežastimi **AssocGrpCPUMinutesLimit** arba **AssocGrpGRESMinutes**,
 + tai pasitikrinkite ar užduotims dar liko neišnaudotų CPU/GPU resursų iš (mėnesio) limito.
 +
 +Peržiūrėti kiek išnaudota resursų
 +
 +<code>
 +sreport -T cpu,mem,gres/gpu cluster AccountUtilizationByUser Start=0101 End=0131 User=USERNAME
 +</code>
 +
 +kur **USERNAME** jūsų MIF naudotojo vardas, o **Start** ir **End** nurodo einamojo mėnesio pradžios ir pabaigos datas. Jas galima nurodyti ir kaip ''$(date +%m01)'' ir ''$(date +%m31)'', kas nurodo einamojo mėnesio pirmą ir paskutines dienas.
 +
 +Atkreipkite dėmesį, kad naudojimas pateikiamas minutėmis, o į valandas konvertuoti reikia dalinant iš 60.
 +
 +Kitas būdas pažiūrėti limitus ir jų išnaudojimą
 +
 +<code>
 +sshare -l -A USERNAME_mif -p -o GrpTRESRaw,GrpTRESMins,TRESRunMins
 +</code>
 +
 +kur **USERNAME** MIF naudotojo vardas. Arba parametre **-A** nurodyti tą sąskaitą (account), kurio naudojimą norima pažiūrėti. Duomenys pateikiami minutėmis. **GrpTRESRaw** - kiek išnaudota. **GrpTRESMins** - koks yra limitas. **TRESRunMins** - likę resursai dar vis vykdomų užduočių.
  
  
  
en/hpc.txt · Last modified: 2024/02/21 12:50 by rolnas

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki