IT wiki

VU MIF STSC

User Tools

Site Tools


en:hpc

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
en:hpc [2022/07/04 08:33] – [Batch Processing of Tasks (SLURM)] grikieteen:hpc [2022/07/18 10:48] – [The Limits of Resources] grikiete
Line 119: Line 119:
 </code> </code>
  
-After submission and confirmation of your application to the ITOAC services, you need to create a user at https://hpc.mif.vu.lt/. The created user will be included in the relevant project, which will have a certain amount of resources. In order to use the project resources for calculations, you need to provide your allocation number. Below is an example with the allocation parameter "alloc_xxxx_project" set (not applicable for VU MIF users, VU MIF users do not have to specify the -- account parameter).+After submission and confirmation of your application to the ITOAC services, you need to create a user at https://hpc.mif.vu.lt/. The created user will be included in the relevant project, which will have a certain amount of resources. In order to use the project resources for calculations, you need to provide your allocation number. Below is an example with the allocation parameter "alloc_xxxx_project" (not applicable for VU MIF users, VU MIF users do not have to specify the --account parameter).
  
 <code shell mpi-test-job.sh> <code shell mpi-test-job.sh>
Line 133: Line 133:
  
  
-Jame kaip specialūs komentarai yra nurodymai užduočių vykdytojui.+It contains instructions for the task performer as special comments.
  
- -p short - į kokią eilę siųsti (main, gpu, power).+ -p short - which queue to send to (main, gpu, power).
  
- -n4 - kiek procesorių rezervuoti (**PASTABA:** nustačius naudotinų branduolių skaičių x, tačiau realiai programiškai išnaudojant mažiauapskaitoje vis tiek bus skaičiuojami visi x "užprašytibranduoliaitodėl rekomenduojame apsiskaičiuoti iš anksto).+ -n4 - how many processors to reserve (**NOTE:** if you set the number of cores to be used to x, but actually use fewer cores programmaticallythe accounting will still count all the x "requestedcoresso we recommend to calculate this in advance).
  
-Užduoties pradinis einamasis katalogas yra dabartinis katalogas (**pwd**) prisijungimo mazge iš kur paleidžiama užduotisnebent parametru -D pakeistas į kitąPradiniam einamajam katalogui naudokite PST bendros failų sistemos katalogus **/scratch/lustre**, nes jis turi egzistuoti skaičiavimo mazge ir ten yra kuriamas užduoties išvesties failas **slurm-JOBID.out**, nebent nukreiptas kitur parametrais -o arba -i (jiems irgi patariama naudoti bendrą failų sistemą).+The initial running directory of the task is the current directory (**pwd**) on the login node from where the task is rununless it was changed to another directory by the -D parameterFor the initial running directory, use the HPC shared filesystem directories **/scratch/lustre**, as it must exist on the compute node and the job output file **slurm-JOBID.out** is created thereunless redirected by -o or -i (for these it is advisable to use the shared filesystem as well).
  
-Suformuotą scenarijų siunčiame su komanda sbatch+The generated script is sent with the command //sbatch//,
  
 ''$ sbatch mpi-test-job'' ''$ sbatch mpi-test-job''
  
-kuri gražina pateiktos užduoties numerį **JOBID**.+which returns the number of the submitted job **JOBID**.
  
-Laukiančios arba vykdomos užduoties būseną galima sužinoti su komanda squeue+The status of a pending or ongoing task can be checked with the command //squeue//
  
 ''$ squeue -j JOBID'' ''$ squeue -j JOBID''
  
-Su komanda scancel galima nutraukti užduoties vykdymą arba išimti ją iš eilės+With the //scancel// command it is possible to cancel the running of a task or to remove it from the queue
  
 ''$ scancel JOBID'' ''$ scancel JOBID''
  
-Jeigu neatsimenate savo užduočių **JOBID**, tai galite pasižiūrėti su komanda **squeue**+If you forgot your tasks **JOBID**, you can check them with the command //squeue//
  
 ''$ squeue'' ''$ squeue''
  
-Užbaigtų užduočių **squeue** jau neberodo.+Completed tasks are no longer displayed in **squeue**.
  
-Jeigu nurodytas procesorių kiekis nėra pasiekiamastai jūsų užduotis yra įterpiama į eilęJoje ji bus kol atsilaisvins pakankamas kiekis procesorių arba kol jūs ją pašalinsite su **scancel**.+If the specified number of processors is not availableyour task is added to the queueIt will remain in the queue until a sufficient number of processors become available or until you remove it with **scancel**.
  
-Vykdomos užduoties išvestis (**output**) yra įrašoma į failą **slurm-JOBID.out**. Jei nenurodyta kitaip, tai ir klaidų (error) išvestis yra įrašoma į tą patį failąFailų vardus galima pakeisti su komandos **sbatch** parametrais -o (nurodyti išvesties failąir -e (nurodyti klaidų failą).+The **output** of the running job is recorded in the file **slurm-JOBID.out**. The error output is written to the same file unless you specified somewhere elseThe file names can be changed with the **sbatch** command parameters -o (specify the output fileand -e (specify the error file).
  
-Daugiau apie SLURM galimybes galite paskaityti [[https://slurm.schedmd.com/quickstart.html|Quick Start User Guide]].+More about SLURM opportunities you can read [[https://slurm.schedmd.com/quickstart.html|Quick Start User Guide]].
  
 +====== Interactive Tasks (SLURM) ======
 +
 +Interactive tasks can be done with the //srun// command:
 +
 +<code>
 +$ srun --pty $SHELL
 +</code>
 +
 +The above command will connect you to the compute node environment assigned to SLURM and allow you to directly run and debug programs on it.
 +
 +After the commands are done disconnect from the compute node with the command
 +
 +<code>
 +$ exit
 +</code>
 +
 +If you want to run graphical programs, you need to connect to **ssh -X** to **uosis.mif.vu.lt** and **hpc**:
 +
 +<code>
 +$ ssh -X uosis.mif.vu.lt
 +$ ssh -X hpc
 +$ srun --pty $SHELL
 +</code>
 +
 +In **power** cluster interactive tasks can be performed with
 +
 +<code>
 +$ srun -p power --mpi=none --pty $SHELL
 +</code>
 +
 +====== GPU Tasks (SLURM) ======
 +
 +To use GPU you need to specify additionally <code>--gres gpu:N</code> where N is desired GPU amount.
 +
 +With ''nvidia-smi'' in the task you can check the GPU amount that was dedicated.
 +
 +Example of an interactive task with 1 GPU:
 +<code>
 +$ srun -p gpu --gres gpu --pty $SHELL
 +</code>
 +
 +====== Introduction to OpenMPI ======
 +
 +Ubuntu 18.04 LTS is the packet of **2.1.1** OpenMPI version.
 +To use the newer version **4.0.1** you need to use
 +<code>
 +module load openmpi/4.0
 +</code>
 +before running MPI commands.
 +
 +===== MPI Compiling Programs =====
 +
 +An example of a simple MPI program is in the directory ''/scratch/lustre/test/openmpi''. **mpicc** (**mpiCC**, **mpif77**, **mpif90**, **mpifort**) is a framework for C (C++, F77, F90, Fortran) compilers that automatically adds the necessary **MPI** include and library files to the command line.
 +
 +<code>
 +$ mpicc -o foo foo.c
 +$ mpif77 -o foo foo.f
 +$ mpif90 -o foo foo.f
 +</code>
 +===== Implementation of MPI Programmes =====
 +
 +MPI programs are started with **mpirun** or **mpiexec**. You can learn more about them with the **man mpirun** or **man mpiexec** command.
 +
 +A simple (SPMD) program can be started with the following mpirun command line.
 +
 +<code>
 +$ mpirun foo
 +</code>
 +
 +All allocated processors will be used according to the number ordered. If you want to use less, you can specify the -np quantity parameter in **mpirun**. It is not recommended to use less CPU than reserved for a longer time period, as unused CPUs remain free.
 +
 +**ATTENTION** It is strictly forbidden to use more CPU than you have reserved, as this may affect the performance of other tasks.
 +
 +Find more information on [[https://www.open-mpi.org|OpenMPI]].
 +
 +====== Task Efficiency ======
 +
 +  * Please use at least 50% of the ordered CPU quantity.
 +  * Using more CPUs than ordered will not improve performance, as your task will only be able to use the CPUs ordered.
 +  * If you use the ''--mem=X'' parameter, the task can reserve more **CPUs** in proportion to the amount of memory it wants. For example: if you order ''--mem=14000'' in the **main** queue, at least 2 CPUs will be reserved, unless other parameters specify more. If your task uses less than this, it will be an ineffective use of resources. In addition, it may run slower because it may use other memory than the executing CPU.
 +
 +====== The Limits of Resources ======
 +
 +If your tasks don't start because of **AssocGrpCPUMinutesLimit** or **AssocGrpGRESMinutes**,
 + you need to check if there are any unused CPU/GPU resources left from the monthly limit.
 +
 +Peržiūrėti kiek išnaudota resursų
 +
 +<code>
 +sreport -T cpu,mem,gres/gpu cluster AccountUtilizationByUser Start=0101 End=0131 User=USERNAME
 +</code>
 +
 +kur **USERNAME** jūsų MIF naudotojo vardas, o **Start** ir **End** nurodo einamojo mėnesio pradžios ir pabaigos datas. Jas galima nurodyti ir kaip ''$(date +%m01)'' ir ''$(date +%m31)'', kas nurodo einamojo mėnesio pirmą ir paskutines dienas.
 +
 +Atkreipkite dėmesį, kad naudojimas pateikiamas minutėmis, o į valandas konvertuoti reikia dalinant iš 60.
 +
 +Kitas būdas pažiūrėti limitus ir jų išnaudojimą
 +
 +<code>
 +sshare -l -A USERNAME_mif -p -o GrpTRESRaw,GrpTRESMins,TRESRunMins
 +</code>
 +
 +kur **USERNAME** MIF naudotojo vardas. Arba parametre **-A** nurodyti tą sąskaitą (account), kurio naudojimą norima pažiūrėti. Duomenys pateikiami minutėmis. **GrpTRESRaw** - kiek išnaudota. **GrpTRESMins** - koks yra limitas. **TRESRunMins** - likę resursai dar vis vykdomų užduočių.
 +
 +====== Nuorodos ======
 +
 +  * [[waldur|HPC Waldur portalo aprašymas]]
 +  * [[http://mif.vu.lt/itapc#paslaug%C5%B3-u%C5%BEsakymas|ITAPC paslaugų užsakymas]]
 +  * [[https://slurm.schedmd.com/quickstart.html|Quick Start User Guide (SLURM)]]
 +  * [[https://docs.qlustar.com/Qlustar/11.0/HPCstack/hpc-user-manual.html|HPC User Manual (Qlustar)]]
 +  * [[http://www.mcs.anl.gov/research/projects/mpi/|MPI standartas]]
 +  * pagalba@mif.vu.lt - problemų su **HPC** registracija
  
  
en/hpc.txt · Last modified: 2024/02/21 12:50 by rolnas

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki