IT wiki

VU MIF STSC

User Tools

Site Tools


en:hpc

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:hpc [2022/05/06 07:41] – [Connection] rolnasen:hpc [2024/02/21 12:50] (current) – [Singularity] rolnas
Line 16: Line 16:
 ====== Software ====== ====== Software ======
  
-In **main** and **gpu** partitions there are installed [[https://docs.qlustar.com/Qlustar/11.0/HPCstack/hpc-user-manual.html|Qlustar 11]] OS. It is based on Ubuntu 18.04 LTS. In **power** partition there is installed Ubuntu 18.04 LTS.+In **main** and **gpu** partitions there are installed [[https://docs.qlustar.com/Qlustar/11.0/HPCstack/hpc-user-manual.html|Qlustar 12]] OS. It is based on Ubuntu 20.04 LTS. In **power** partition there is installed Ubuntu 18.04 LTS.
  
 You can check the list of OS package with the command ''dpkg -l'' (in login node **hpc** or in **power** nodes). You can check the list of OS package with the command ''dpkg -l'' (in login node **hpc** or in **power** nodes).
 +
 +===== Singularity =====
  
 With the command [[https://sylabs.io/guides/3.2/user-guide/index.html|singularity]] it is possible to make use of ready-made copies of container files in directories ''/apps/local/hpc'', ''/apps/local/nvidia'', ''/apps/local/intel'', ''/apps/local/lang'' or to download from singularity and docker online repositories. You can also create your own singularity containers using the MIF cloud service. With the command [[https://sylabs.io/guides/3.2/user-guide/index.html|singularity]] it is possible to make use of ready-made copies of container files in directories ''/apps/local/hpc'', ''/apps/local/nvidia'', ''/apps/local/intel'', ''/apps/local/lang'' or to download from singularity and docker online repositories. You can also create your own singularity containers using the MIF cloud service.
Line 29: Line 31:
 $ rm -rf /tmp/python $ rm -rf /tmp/python
 </code> </code>
 +
 +
 +To use this container, it is advised to use different directory instead of home directory (for python packages don't mixup with installed to home directory).
 +<code shell>
 +$ mkdir ~/workdir
 +$ singularity exec -H ~/workdir:$HOME python.sif python3 ...
 +</code>
 +
 Similarly, you can use R, Julia or other containers that do not require root privileges to install packages. Similarly, you can use R, Julia or other containers that do not require root privileges to install packages.
  
Line 42: Line 52:
 $ rm -rf /tmp/python $ rm -rf /tmp/python
 </code> </code>
 +
 +===== Hadoop =====
  
 There are ready-made scripts to run your **hadoop** tasks using the [[https://github.com/LLNL/magpie|Magpie]] set in the directory ''/apps/local/bigdata''. There are ready-made scripts to run your **hadoop** tasks using the [[https://github.com/LLNL/magpie|Magpie]] set in the directory ''/apps/local/bigdata''.
 +
 +===== JupyterHub =====
  
 With [[https://hpc.mif.vu.lt/hub/|JupyterHub]] you can run calculations with the python command line in a web browser and use the [[https://jupyter.org|JupyterLab]] environment. If you install your own JupyterLab environment in your home directory, you need to install the additional ''batchspawner'' package - this will start your environment, for example: With [[https://hpc.mif.vu.lt/hub/|JupyterHub]] you can run calculations with the python command line in a web browser and use the [[https://jupyter.org|JupyterLab]] environment. If you install your own JupyterLab environment in your home directory, you need to install the additional ''batchspawner'' package - this will start your environment, for example:
Line 60: Line 74:
 ====== Registration ====== ====== Registration ======
  
-  * **For VU MIF network users** - HPC can be used without additional registration if the available resources are enough (monthly limit - **100 CPU-h and GPU-h**). Once this limit has been reached, you can request more by filling in [[https://forms.office.com/Pages/ResponsePage.aspx?id=ghrFgo1UykO8-b9LfrHQEidLsh79nRJAvOP_wV9sgmdUM0ZMR1FINFg3TzVaNlhDSEhUN1A3QTlVUC4u|ITOAC service request form]]. +  * **For VU MIF network users** - HPC can be used without additional registration if the available resources are enough (monthly limit - **500 CPU-h and 60 GPU-h**). Once this limit has been reached, you can request more by filling in [[https://forms.office.com/Pages/ResponsePage.aspx?id=ghrFgo1UykO8-b9LfrHQEidLsh79nRJAvOP_wV9sgmdUM0ZMR1FINFg3TzVaNlhDSEhUN1A3QTlVUC4u|ITOAC service request form]]. 
  
   * **For users of the VU computer network** - you must fill in the [[https://forms.office.com/Pages/ResponsePage.aspx?id=ghrFgo1UykO8-b9LfrHQEidLsh79nRJAvOP_wV9sgmdUM0ZMR1FINFg3TzVaNlhDSEhUN1A3QTlVUC4u|ITOAC service request form]] to get access to MIF HPC. After the confirmation of your request, you must create your account in [[https://hpc.mif.vu.lt|Waldur portal]]. More details read [[waldur|here]].   * **For users of the VU computer network** - you must fill in the [[https://forms.office.com/Pages/ResponsePage.aspx?id=ghrFgo1UykO8-b9LfrHQEidLsh79nRJAvOP_wV9sgmdUM0ZMR1FINFg3TzVaNlhDSEhUN1A3QTlVUC4u|ITOAC service request form]] to get access to MIF HPC. After the confirmation of your request, you must create your account in [[https://hpc.mif.vu.lt|Waldur portal]]. More details read [[waldur|here]].
Line 77: Line 91:
  
 If **SSH keys** are used (e.g. if you need to copy big files): If **SSH keys** are used (e.g. if you need to copy big files):
-  * If you don't have SSH keys, you can find instructions on how to create them in a Windows environment **[[duk:ssh_key|here]]**+  * If you don't have SSH keys, you can find instructions on how to create them in a Windows environment **[[en:duk:ssh_key|here]]**
   *     Before you can use this method, you need to log in with Kerberos at least once. Then create a ''~/.ssh'' directory in the HPC file system and put your **ssh public key** (in OpenSSH format) into the ''~/.ssh/authorized_keys'' file.   *     Before you can use this method, you need to log in with Kerberos at least once. Then create a ''~/.ssh'' directory in the HPC file system and put your **ssh public key** (in OpenSSH format) into the ''~/.ssh/authorized_keys'' file.
   *     Connect with **ssh**, **sftp**, **scp**, **putty**, **winscp** or any other **ssh** protocol supported software to **hpc.mif.vu.lt** with your **ssh private key**, specifying your VU MIF user name. It should not require a login password, but may require your ssh private key password.   *     Connect with **ssh**, **sftp**, **scp**, **putty**, **winscp** or any other **ssh** protocol supported software to **hpc.mif.vu.lt** with your **ssh private key**, specifying your VU MIF user name. It should not require a login password, but may require your ssh private key password.
  
-The **first time** you connect, you **will not** be able to run **SLURM tasks** for the first **5 minutes**. After that, SLURM account will be created.+The **first time** you connect, you **will not** be able to run **SLURM jobs** for the first **5 minutes**. After that, SLURM account will be created.
  
 ====== Lustre - Shared File System ====== ====== Lustre - Shared File System ======
Line 92: Line 106:
  
 Please use these directories only for their purpose and clean them up after calculations. Please use these directories only for their purpose and clean them up after calculations.
 +
 +====== HPC Partition ======
 +
 +^Partition ^Time limit ^RAM    ^Notes|
 +^main             ^7d            ^7000MB  ^CPU cluster|
 +^gpu              ^48h           ^12000MB ^GPU cluster|
 +^power            ^48h           ^2000MB  ^IBM Power9 cluster|
 +
 +The time limit for tasks is **2h** in all partitions if it has not been specified. The table shows the maximum time limit.
 +
 +The **RAM** column gives the amount of RAM allocated to each reserved **CPU** core.
 +
 +====== Batch Processing of Tasks (SLURM) ======
 +
 +To use computing resources of the HPC, you need to create task scenarios (sh or csh). 
 +
 +Example:
 +
 +<code shell mpi-test-job.sh>
 +#!/bin/bash
 +#SBATCH -p main
 +#SBATCH -n4
 +module load openmpi
 +mpicc -o mpi-test mpi-test.c
 +mpirun mpi-test
 +</code>
 +
 +After submission and confirmation of your application to the ITOAC services, you need to create a user at https://hpc.mif.vu.lt/. The created user will be included in the relevant project, which will have a certain amount of resources. In order to use the project resources for calculations, you need to provide your allocation number. Below is an example with the allocation parameter "alloc_xxxx_project" (not applicable for VU MIF users, VU MIF users do not have to specify the --account parameter).
 +
 +<code shell mpi-test-job.sh>
 +#!/bin/bash
 +#SBATCH --account=alloc_xxxx_projektas
 +#SBATCH -p main
 +#SBATCH -n4
 +#SBATCH --time=minutes
 +module load openmpi
 +mpicc -o mpi-test mpi-test.c
 +mpirun mpi-test
 +</code>
 +
 +
 +It contains instructions for the task performer as special comments.
 +
 + -p short - which queue to send to (main, gpu, power).
 +
 + -n4 - how many processors to reserve (**NOTE:** if you set the number of cores to be used to x, but actually use fewer cores programmatically, the accounting will still count all the x "requested" cores, so we recommend to calculate this in advance).
 +
 +The initial running directory of the task is the current directory (**pwd**) on the login node from where the task is run, unless it was changed to another directory by the -D parameter. For the initial running directory, use the HPC shared filesystem directories **/scratch/lustre**, as it must exist on the compute node and the job output file **slurm-JOBID.out** is created there, unless redirected by -o or -i (for these it is advisable to use the shared filesystem as well).
 +
 +The generated script is sent with the command //sbatch//,
 +
 +''$ sbatch mpi-test-job''
 +
 +which returns the number of the submitted job **JOBID**.
 +
 +The status of a pending or ongoing task can be checked with the command //squeue//
 +
 +''$ squeue -j JOBID''
 +
 +With the //scancel// command it is possible to cancel the running of a task or to remove it from the queue
 +
 +''$ scancel JOBID''
 +
 +If you forgot your tasks **JOBID**, you can check them with the command //squeue//
 +
 +''$ squeue''
 +
 +Completed tasks are no longer displayed in **squeue**.
 +
 +If the specified number of processors is not available, your task is added to the queue. It will remain in the queue until a sufficient number of processors become available or until you remove it with **scancel**.
 +
 +The **output** of the running job is recorded in the file **slurm-JOBID.out**. The error output is written to the same file unless you specified somewhere else. The file names can be changed with the **sbatch** command parameters -o (specify the output file) and -e (specify the error file).
 +
 +More about SLURM opportunities you can read [[https://slurm.schedmd.com/quickstart.html|Quick Start User Guide]].
 +
 +====== Interactive Tasks (SLURM) ======
 +
 +Interactive tasks can be done with the //srun// command:
 +
 +<code>
 +$ srun --pty $SHELL
 +</code>
 +
 +The above command will connect you to the compute node environment assigned to SLURM and allow you to directly run and debug programs on it.
 +
 +After the commands are done disconnect from the compute node with the command
 +
 +<code>
 +$ exit
 +</code>
 +
 +If you want to run graphical programs, you need to connect to **ssh -X** to **uosis.mif.vu.lt** and **hpc**:
 +
 +<code>
 +$ ssh -X uosis.mif.vu.lt
 +$ ssh -X hpc
 +$ srun --pty $SHELL
 +</code>
 +
 +In **power** cluster interactive tasks can be performed with
 +
 +<code>
 +$ srun -p power --mpi=none --pty $SHELL
 +</code>
 +
 +====== GPU Tasks (SLURM) ======
 +
 +To use GPU you need to specify additionally <code>--gres gpu:N</code> where N is desired GPU amount.
 +
 +With ''nvidia-smi'' in the task you can check the GPU amount that was dedicated.
 +
 +Example of an interactive task with 1 GPU:
 +<code>
 +$ srun -p gpu --gres gpu --pty $SHELL
 +</code>
 +
 +====== Introduction to OpenMPI ======
 +
 +Ubuntu 18.04 LTS is the packet of **2.1.1** OpenMPI version.
 +To use the newer version **4.0.1** you need to use
 +<code>
 +module load openmpi/4.0
 +</code>
 +before running MPI commands.
 +
 +===== MPI Compiling Programs =====
 +
 +An example of a simple MPI program is in the directory ''/scratch/lustre/test/openmpi''. **mpicc** (**mpiCC**, **mpif77**, **mpif90**, **mpifort**) is a framework for C (C++, F77, F90, Fortran) compilers that automatically adds the necessary **MPI** include and library files to the command line.
 +
 +<code>
 +$ mpicc -o foo foo.c
 +$ mpif77 -o foo foo.f
 +$ mpif90 -o foo foo.f
 +</code>
 +===== Implementation of MPI Programmes =====
 +
 +MPI programs are started with **mpirun** or **mpiexec**. You can learn more about them with the **man mpirun** or **man mpiexec** command.
 +
 +A simple (SPMD) program can be started with the following mpirun command line.
 +
 +<code>
 +$ mpirun foo
 +</code>
 +
 +All allocated processors will be used according to the number ordered. If you want to use less, you can specify the -np quantity parameter in **mpirun**. It is not recommended to use less CPU than reserved for a longer time period, as unused CPUs remain free.
 +
 +**ATTENTION** It is strictly forbidden to use more CPU than you have reserved, as this may affect the performance of other tasks.
 +
 +Find more information on [[https://www.open-mpi.org|OpenMPI]].
 +
 +====== Task Efficiency ======
 +
 +  * Please use at least 50% of the ordered CPU quantity.
 +  * Using more CPUs than ordered will not improve performance, as your task will only be able to use the CPUs ordered.
 +  * If you use the ''--mem=X'' parameter, the task can reserve more **CPUs** in proportion to the amount of memory it wants. For example: if you order ''--mem=14000'' in the **main** queue, at least 2 CPUs will be reserved, unless other parameters specify more. If your task uses less than this, it will be an ineffective use of resources. In addition, it may run slower because it may use other memory than the executing CPU.
 +
 +====== The Limits of Resources ======
 +
 +If your tasks don't start because of **AssocGrpCPUMinutesLimit** or **AssocGrpGRESMinutes**, you need to check if there are any unused CPU/GPU resources left from your monthly limit.
 +
 +//The first way to see how much resources are used://
 +
 +<code>
 +sreport -T cpu,mem,gres/gpu cluster AccountUtilizationByUser Start=0101 End=0131 User=USERNAME
 +</code>
 +
 +Where the **USERNAME** - is your MIF user name. **Start** and **End** show the start and end days of the current month. You can specify them also by ''$(date +%m01)'' and ''$(date +%m31)''.
 +
 +**NOTE** Usage of resources is given in minutes, divide the number by 60 to get hours.
 +
 +//The second way to see how much resources are used://
 +
 +<code>
 +sshare -l -A USERNAME_mif -p -o GrpTRESRaw,GrpTRESMins,TRESRunMins
 +</code>
 +
 +Where **USERNAME** is your MIF user name. Or specify the account whose usage you want to see in **-A**. The data is also displayed in minutes: 
 +  * **GrpTRESRaw** - how much is used. 
 +  * **GrpTRESMins** - what is the limit. 
 +  * **GGRTRESRunMins** - the remaining resources for tasks that are still running.
 +
 +====== The Links ======
 +
 +  * [[waldur|HPC Waldur portal description]]
 +  * [[https://mif.vu.lt/lt3/en/about/structure/it-research-center#ordering|ITOAC service ordering]]
 +  * [[https://slurm.schedmd.com/quickstart.html|Quick Start User Guide (SLURM)]]
 +  * [[https://docs.qlustar.com/Qlustar/11.0/HPCstack/hpc-user-manual.html|HPC User Manual (Qlustar)]]
 +  * [[http://www.mcs.anl.gov/research/projects/mpi/|MPI standart]]
 +  * [[pagalba@mif.vu.lt]] - registration of the **HPC** problems.
  
  
en/hpc.1651822870.txt.gz · Last modified: 2022/05/06 07:41 by rolnas

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki