IT wiki

VU MIF STSC

User Tools

Site Tools


en:hpc

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
en:hpc [2022/04/20 07:06] – [Login] grikieteen:hpc [2022/07/04 09:48] – [Batch Processing of Tasks (SLURM)] grikiete
Line 72: Line 72:
 If **Kerberos** is used: If **Kerberos** is used:
  
-  * Log in to the Linux environment in a VU MIF classroom or public terminal with your VU MIF username and password+  * Log in to the Linux environment in a VU MIF classroom or public terminal with your VU MIF username and password or login to **uosis.mif.vu.lt** with your VU MIF username and password using **ssh** or **putty**.
-  * or login to **uosis.mif.vu.lt** with your VU MIF username and password using ssh or putty.+
   * Check if you have a valid Kerberos key (ticket) with the **klist** command. If the key is not available or has expired, the **kinit** command must be used.   * Check if you have a valid Kerberos key (ticket) with the **klist** command. If the key is not available or has expired, the **kinit** command must be used.
   * Connect to the **hpc** node with the command **ssh hpc** (password must not be required).   * Connect to the **hpc** node with the command **ssh hpc** (password must not be required).
-  * The **first time** you log in, you must wait **5 minutes** and then you can start to use HPC. 
  
 If **SSH keys** are used (e.g. if you need to copy big files): If **SSH keys** are used (e.g. if you need to copy big files):
Line 83: Line 81:
   *     Connect with **ssh**, **sftp**, **scp**, **putty**, **winscp** or any other **ssh** protocol supported software to **hpc.mif.vu.lt** with your **ssh private key**, specifying your VU MIF user name. It should not require a login password, but may require your ssh private key password.   *     Connect with **ssh**, **sftp**, **scp**, **putty**, **winscp** or any other **ssh** protocol supported software to **hpc.mif.vu.lt** with your **ssh private key**, specifying your VU MIF user name. It should not require a login password, but may require your ssh private key password.
  
-The first time you connect, you **will not** be able to run **SLURM tasks** for the first **5 minutes**. After that, you will be automatically created as a **SLURM user** and assigned **resource limits**.+The **first time** you connect, you **will not** be able to run **SLURM jobs** for the first **5 minutes**. After that, SLURM account will be created
 + 
 +====== Lustre - Shared File System ====== 
 + 
 +VU MIF HPC shared file system is available in the directory ''/scratch/lustre''
 + 
 +The system creates directory ''/scratch/lustre/home/username'' for each HPC user, where **username** is the HPC username. 
 + 
 +The files in this file system are equally accessible on all compute nodes and on the **hpc** node. 
 + 
 +Please use these directories only for their purpose and clean them up after calculations. 
 + 
 +====== HPC Partition ====== 
 + 
 +^Partition ^Time limit ^RAM    ^Notes| 
 +^main             ^7d            ^7000MB  ^CPU cluster| 
 +^gpu              ^48h           ^12000MB ^GPU cluster| 
 +^power            ^48h           ^2000MB  ^IBM Power9 cluster| 
 + 
 +The time limit for tasks is **2h** in all partitions if it has not been specified. The table shows the maximum time limit. 
 + 
 +The **RAM** column gives the amount of RAM allocated to each reserved **CPU** core. 
 + 
 +====== Batch Processing of Tasks (SLURM) ====== 
 + 
 +To use computing resources of the HPC, you need to create task scenarios (sh or csh).  
 + 
 +Example: 
 + 
 +<code shell mpi-test-job.sh> 
 +#!/bin/bash 
 +#SBATCH -p main 
 +#SBATCH -n4 
 +module load openmpi 
 +mpicc -o mpi-test mpi-test.c 
 +mpirun mpi-test 
 +</code> 
 + 
 +After submission and confirmation of your application to the ITOAC services, you need to create a user at https://hpc.mif.vu.lt/. The created user will be included in the relevant project, which will have a certain amount of resources. In order to use the project resources for calculations, you need to provide your allocation number. Below is an example with the allocation parameter "alloc_xxxx_project" (not applicable for VU MIF users, VU MIF users do not have to specify the --account parameter). 
 + 
 +<code shell mpi-test-job.sh> 
 +#!/bin/bash 
 +#SBATCH --account=alloc_xxxx_projektas 
 +#SBATCH -p main 
 +#SBATCH -n4 
 +#SBATCH --time=minutes 
 +module load openmpi 
 +mpicc -o mpi-test mpi-test.c 
 +mpirun mpi-test 
 +</code> 
 + 
 + 
 +It contains instructions for the task performer as special comments. 
 + 
 + -p short - which queue to send to (main, gpu, power). 
 + 
 + -n4 - how many processors to reserve (**NOTE:** if you set the number of cores to be used to x, but actually use fewer cores programmatically, the accounting will still count all the x "requested" cores, so we recommend to calculate this in advance). 
 + 
 +The initial running directory of the task is the current directory (**pwd**) on the login node from where the task is run, unless it was changed to another directory by the -D parameter. For the initial running directory, use the HPC shared filesystem directories **/scratch/lustre**, as it must exist on the compute node and the job output file **slurm-JOBID.out** is created there, unless redirected by -o or -i (for these it is advisable to use the shared filesystem as well). 
 + 
 +The generated script is sent with the command //sbatch//, 
 + 
 +''$ sbatch mpi-test-job'' 
 + 
 +which returns the number of the submitted job **JOBID**. 
 + 
 +The status of a pending or ongoing task can be checked with the command //squeue// 
 + 
 +''$ squeue -j JOBID'' 
 + 
 +With the //scancel// command it is possible to cancel the running of a task or to remove it from the queue 
 + 
 +''$ scancel JOBID'' 
 + 
 +If you forgot your tasks **JOBID**, you can check them with the command //squeue// 
 + 
 +''$ squeue'' 
 + 
 +Completed tasks are no longer displayed in **squeue**. 
 + 
 +If the specified number of processors is not available, your task is added to the queue. It will remain in the queue until a sufficient number of processors become available or until you remove it with **scancel**. 
 + 
 +The **output** of the running job is recorded in the file **slurm-JOBID.out**. The error output is written to the same file unless you specified somewhere else. The file names can be changed with the **sbatch** command parameters -o (specify the output file) and -e (specify the error file). 
 + 
 +More about SLURM opportunities you can read [[https://slurm.schedmd.com/quickstart.html|Quick Start User Guide]]. 
 + 
  
en/hpc.txt · Last modified: 2024/02/21 12:50 by rolnas

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki