en:hpc
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
en:hpc [2022/04/15 12:22] – [Registration] grikiete | en:hpc [2022/07/04 15:20] – [Interactive Tasks (SLURM)] grikiete | ||
---|---|---|---|
Line 64: | Line 64: | ||
* **For users of the VU computer network** - you must fill in the [[https:// | * **For users of the VU computer network** - you must fill in the [[https:// | ||
- | * **For other users (non-members of the VU community)** - you must fill in the [[https:// | + | * **For other users (non-members of the VU community)** - you must fill in the [[https:// |
+ | |||
+ | ====== Connection ====== | ||
+ | |||
+ | You need to use SSH applications (ssh, putty, winscp, mobaxterm) and Kerberos or SSH key authentication to connect to **HPC**. | ||
+ | |||
+ | If **Kerberos** is used: | ||
+ | |||
+ | * Log in to the Linux environment in a VU MIF classroom or public terminal with your VU MIF username and password or login to **uosis.mif.vu.lt** with your VU MIF username and password using **ssh** or **putty**. | ||
+ | * Check if you have a valid Kerberos key (ticket) with the **klist** command. If the key is not available or has expired, the **kinit** command must be used. | ||
+ | * Connect to the **hpc** node with the command **ssh hpc** (password must not be required). | ||
+ | |||
+ | If **SSH keys** are used (e.g. if you need to copy big files): | ||
+ | * If you don't have SSH keys, you can find instructions on how to create them in a Windows environment **[[duk: | ||
+ | * | ||
+ | * | ||
+ | |||
+ | The **first time** you connect, you **will not** be able to run **SLURM jobs** for the first **5 minutes**. After that, SLURM account will be created. | ||
+ | |||
+ | ====== Lustre - Shared File System ====== | ||
+ | |||
+ | VU MIF HPC shared file system is available in the directory ''/ | ||
+ | |||
+ | The system creates directory ''/ | ||
+ | |||
+ | The files in this file system are equally accessible on all compute nodes and on the **hpc** node. | ||
+ | |||
+ | Please use these directories only for their purpose and clean them up after calculations. | ||
+ | |||
+ | ====== HPC Partition ====== | ||
+ | |||
+ | ^Partition ^Time limit ^RAM | ||
+ | ^main | ||
+ | ^gpu ^48h | ||
+ | ^power | ||
+ | |||
+ | The time limit for tasks is **2h** in all partitions if it has not been specified. The table shows the maximum time limit. | ||
+ | |||
+ | The **RAM** column gives the amount of RAM allocated to each reserved **CPU** core. | ||
+ | |||
+ | ====== Batch Processing of Tasks (SLURM) ====== | ||
+ | |||
+ | To use computing resources of the HPC, you need to create task scenarios (sh or csh). | ||
+ | |||
+ | Example: | ||
+ | |||
+ | <code shell mpi-test-job.sh> | ||
+ | # | ||
+ | #SBATCH -p main | ||
+ | #SBATCH -n4 | ||
+ | module load openmpi | ||
+ | mpicc -o mpi-test mpi-test.c | ||
+ | mpirun mpi-test | ||
+ | </ | ||
+ | |||
+ | After submission and confirmation of your application to the ITOAC services, you need to create a user at https:// | ||
+ | |||
+ | <code shell mpi-test-job.sh> | ||
+ | # | ||
+ | #SBATCH --account=alloc_xxxx_projektas | ||
+ | #SBATCH -p main | ||
+ | #SBATCH -n4 | ||
+ | #SBATCH --time=minutes | ||
+ | module load openmpi | ||
+ | mpicc -o mpi-test mpi-test.c | ||
+ | mpirun mpi-test | ||
+ | </ | ||
+ | |||
+ | |||
+ | It contains instructions for the task performer as special comments. | ||
+ | |||
+ | -p short - which queue to send to (main, gpu, power). | ||
+ | |||
+ | -n4 - how many processors to reserve (**NOTE:** if you set the number of cores to be used to x, but actually use fewer cores programmatically, | ||
+ | |||
+ | The initial running directory of the task is the current directory (**pwd**) on the login node from where the task is run, unless it was changed to another directory by the -D parameter. For the initial running directory, use the HPC shared filesystem directories **/ | ||
+ | |||
+ | The generated script is sent with the command // | ||
+ | |||
+ | '' | ||
+ | |||
+ | which returns the number of the submitted job **JOBID**. | ||
+ | |||
+ | The status of a pending or ongoing task can be checked with the command // | ||
+ | |||
+ | '' | ||
+ | |||
+ | With the //scancel// command it is possible to cancel the running of a task or to remove it from the queue | ||
+ | |||
+ | '' | ||
+ | |||
+ | If you forgot your tasks **JOBID**, you can check them with the command // | ||
+ | |||
+ | '' | ||
+ | |||
+ | Completed tasks are no longer displayed in **squeue**. | ||
+ | |||
+ | If the specified number of processors is not available, your task is added to the queue. It will remain in the queue until a sufficient number of processors become available or until you remove it with **scancel**. | ||
+ | |||
+ | The **output** of the running job is recorded in the file **slurm-JOBID.out**. The error output is written to the same file unless you specified somewhere else. The file names can be changed with the **sbatch** command parameters -o (specify the output file) and -e (specify the error file). | ||
+ | |||
+ | More about SLURM opportunities you can read [[https:// | ||
+ | |||
+ | ====== Interactive Tasks (SLURM) ====== | ||
+ | |||
+ | Interactive tasks can be done with the //srun// command: | ||
+ | |||
+ | < | ||
+ | $ srun --pty $SHELL | ||
+ | </ | ||
+ | |||
+ | The above command will connect you to the compute node environment assigned to SLURM and allow you to directly run and debug programs on it. | ||
+ | |||
+ | After the commands are done disconnect from the compute node with the command | ||
+ | |||
+ | < | ||
+ | $ exit | ||
+ | </ | ||
+ | |||
+ | If you want to run graphical programs, you need to connect to **ssh -X** to **uosis.mif.vu.lt** and **hpc**: | ||
+ | |||
+ | < | ||
+ | $ ssh -X uosis.mif.vu.lt | ||
+ | $ ssh -X hpc | ||
+ | $ srun --pty $SHELL | ||
+ | </ | ||
+ | |||
+ | In **power** cluster interactive tasks can be performed with | ||
+ | |||
+ | < | ||
+ | $ srun -p power --mpi=none --pty $SHELL | ||
+ | </ | ||
+ | |||
+ | ====== GPU užduotys (SLURM) ====== | ||
+ | |||
+ | Norint pasinaudoti GPU, reikia papildomai nurodyti < | ||
+ | |||
+ | Su '' | ||
+ | |||
+ | Pavyzdys interaktyvios užduoties su 1 GPU: | ||
+ | < | ||
+ | $ srun -p gpu --gres gpu --pty $SHELL | ||
+ | </ | ||
+ | |||
+ | ====== Įvadas į OpenMPI ====== | ||
+ | |||
+ | Ubuntu 18.04 LTS yra **2.1.1** versijos OpenMPI paketai. | ||
+ | Norint pasinaudoti naujesne **4.0.1** versija reikia naudoti | ||
+ | < | ||
+ | module load openmpi/ | ||
+ | </ | ||
+ | prieš vykdant MPI komandas. | ||
+ | |||
+ | ===== MPI programų kompiliavimas ===== | ||
+ | |||
+ | Paprastos MPI programos pavyzdys yra kataloge ''/ | ||
+ | |||
+ | < | ||
+ | $ mpicc -o foo foo.c | ||
+ | $ mpif77 -o foo foo.f | ||
+ | $ mpif90 -o foo foo.f | ||
+ | </ | ||
+ | ===== MPI programų vykdymas ===== | ||
+ | |||
+ | MPI programos startuojamos su **mpirun** arba **mpiexec** programa. Daugiau apie jas galima sužinoti su komanda **man mpirun** arba **man mpiexec**. | ||
+ | |||
+ | Paprasta (SPMD) programa gali būti startuojama su tokia mpirun komandų eilute. | ||
+ | |||
+ | < | ||
+ | $ mpirun foo | ||
+ | </ | ||
+ | |||
+ | Tai naudos visus paskirtus procesorius, | ||
+ | |||
+ | Daugiau apie instaliuotą **OpenMPI** yra [[https:// | ||
+ | |||
+ | ====== Užduočių efektyvumas ====== | ||
+ | |||
+ | * Prašome išnaudoti ne mažiau 50% užsakyto CPU kiekio. | ||
+ | * Naudoti daugiau CPU, nei užsakyta, nepadidins efektyvumo, nes jūsų užduotis galės naudoti tik tiek CPU, kiek buvo užsakyta. | ||
+ | * Jeigu naudosite parametrą '' | ||
+ | |||
+ | ====== Resursų limitai ====== | ||
+ | |||
+ | Jeigu jūsų užduotys nestartuoja su priežastimi **AssocGrpCPUMinutesLimit** arba **AssocGrpGRESMinutes**, | ||
+ | tai pasitikrinkite ar užduotims dar liko neišnaudotų CPU/GPU resursų iš (mėnesio) limito. | ||
+ | |||
+ | Peržiūrėti kiek išnaudota resursų | ||
+ | |||
+ | < | ||
+ | sreport -T cpu, | ||
+ | </ | ||
+ | |||
+ | kur **USERNAME** jūsų MIF naudotojo vardas, o **Start** ir **End** nurodo einamojo mėnesio pradžios ir pabaigos datas. Jas galima nurodyti ir kaip '' | ||
+ | |||
+ | Atkreipkite dėmesį, kad naudojimas pateikiamas minutėmis, o į valandas konvertuoti reikia dalinant iš 60. | ||
+ | |||
+ | Kitas būdas pažiūrėti limitus ir jų išnaudojimą | ||
+ | |||
+ | < | ||
+ | sshare -l -A USERNAME_mif -p -o GrpTRESRaw, | ||
+ | </ | ||
+ | |||
+ | kur **USERNAME** MIF naudotojo vardas. Arba parametre **-A** nurodyti tą sąskaitą (account), kurio naudojimą norima pažiūrėti. Duomenys pateikiami minutėmis. **GrpTRESRaw** - kiek išnaudota. **GrpTRESMins** - koks yra limitas. **TRESRunMins** - likę resursai dar vis vykdomų užduočių. | ||
+ | |||
en/hpc.txt · Last modified: 2024/02/21 12:50 by rolnas