Viper Cluster@BIOPOLIS

Viper Cluster @ BIOPOLIS



Hardware Configuration


Viper Cluster
 Head NodesCompute Nodes
Hostnameviper.bii.a-star.edu.sg
vipera.bii.a-star.edu.sg
viper[1-32]
128 compute processors
SystemDell PowerEdge 2950Dell PowerEdge 1950
Processors2 x 3GHz Intel Xeon 5160 "Woodcrest" dual-core CPUs
(4 cores per node)
Memory Per Node4GB DDR2 ECC SDRAM8GB DDR2 ECC SDRAM
NetworksGigabit EthernetGigabit Ethernet
Quadrics QsNetII
Disk Space 1.5TB for home directories
1.4TB global /scratch
35GB /tmp per node

Accounts

Application:
All A*STAR scientific staff are entitled to an account on the Viper Cluster.
The account application form is available @ A*STAR Computational Resource Centre (A*CRC).
Please let us know if you are applying as an external collaborators with A*STAR RIs.

Students and Interns:
Please visit the A*CRC website mentioned above to download the application form if you need an account on the Viper Cluster.

Expired Accounts:
Your accounts will normally expire within one week after you leave your RI. (For non-A*STAR users, it will be within one week upon the completion of your project.) Your files will then be archived for a year before they are erased. If you need an extension of your account (e.g. to complete your project etc.) please send the request through your supervisor along with the justification/s.

Quotas

We are monitoring the situation and will be imposing a disk quota in due course.

Rules and Regulations

In general, the IT Usage Policy shall apply. Additional rules and regulations governing the use of compute clusters can be found here.

Temporary Space

If you need a temporary work space for your jobs, each node has about 35GB of local /tmp. In addition, there is also a global /scratch of about 1.4TB. While no quota is imposed, files residing in these areas will not be backed up and may be removed if they have not been accessed for a certain period of time.

Software

All nodes are on CentOS 4.4 with a Quadrics-modified 2.6 SMP kernel. Besides the standard Centos stuff, the following software are currently installed on the cluster.

SoftwareRemarks
CHARMM CHARMM (Chemistry at HARvard Macromolecular Mechanics) is a program for macromolecular simulations, including energy minimization, molecular dynamics and Monte Carlo simulations.

Our CHARMM is currently at version 32b2. The types of CHARMM binaries available are:
c32b2-s - Extra Large model, single CPU
c32b2-xl - Extra Large model, MPI version

CHARMM licensing agreement restricts usage strictly to research groups who have their own license. Research groups which intend to use CHARMM in their work should refer to the official Harvard University CHARMM website for instructions for obtaining a research license. Users who intend to use our CHARMM binaries are required to produce the either the source code or the license agreement with with Harvard University.

CHARMM is located in /usr/local/charmm. The path to the CHARMM executables is /usr/local/charmm/exec/gnu and the documentation can be found here.

More links on CHARMM development and support can be found here.

Intel C/C++/Fortran Compilers & MKL Besides the standard GCC stuff that comes with OS, the Intel C/C++/Fortran Compilers and Intel Math Kernel Library (MKL) available on the Viper head node.

The Intel compiler's suite comprises the C, C++ and Fortran compilers and a debugger. The software version currently installed is 9.1. PDF versions of the documentation can be found in /opt/intel/compiler/doc/ on the head node.

The license is a single-user node-locked license, i.e., only 1 person may use the suite on the head node at any one time.

LSF Job management suite from Platform Computing. Used to manage serial and parallel jobs on the cluster. Based on past usage patterns the following queues are enabled:
normal - for "normal" jobs. This is also the default queue. There is a limit of 24 RUNTIME hours imposed on each job in this queue.
short - for very short urgent jobs. Jobs in this queue run with the highest priority, has a limit of 60mins RUNTIME imposed on each job in this queue.
long - for jobs that require longer CPU time. Jobs in this queue run in the background with the lowest priority with time limit of 120 RUNTIME hours imposed on each job in this queue.
(Type "bqueues" for more information.) Fairshare has been implemented to ensure that resources are shared as equally as possible among users.

A few basic LSF commands are given here:

bsub [-q <QUEUE_NAME>] [-n <NUM_CPUS>] <COMMAND_LINE>   [<COMMAND_OPTIONS>....]
Submits the job "COMMAND_LINE>   [<COMMAND_OPTIONS>....]" to the <QUEUE_NAME> queue, requesting <NUM_CPUS> processors. The default queue is normal and the default number of processors is 1.

bjobs
To see the status of your jobs.

bkill <JOB_ID>
To kill job number <JOB_ID>. To kill all your jobs use "bkill 0". Note that you can only kill your own jobs.

A Brief Introduction to A*CRC Resources presentation (dated 26 Sep 2007) can be downloaded here.

MPI The Quadrics-enabled verion of MPI or MPI-QsNet (now at version MPI.1.24-49) employs low level Quadrics communication routines instead of a TCP stack. As such the latency of each message is much lower due to reduced communication overheads. This version of MPI can be found in /usr/lib/mpi. MPI-QsNet users should note the following:
  1. The command to launch an MPI job is prun (see RMS below) rather than mpirun. prun is a simple job scheduler for parallel programs and is part of the RMS cluster management suite from Quadrics.
  2. The head nodes do not have a Quadrics adapter. In other words, you CANNOT use the head nodes to run any MPI stuff per se. Use prun (see below) together LSF on the head nodes to schedule your jobs on the compute nodes instead.

Users who are running MPI-QsNet jobs may use this as a template for their job scripts:

	#/bin/bash
	cd /home/myhome/myworkingdir
	for i in `echo $LSB_HOSTS`
	do
		j=`echo $i|cut -f 1 -d .`
		echo "1 ${j}:/home/myhome/myworkdir/ mympijob"
	done > myprocfile
	chmod +x myprocfile
	prun -f myprocfile
For those who prefer standard MPICH-P4, it is found in /usr/local/mpich.
RMS A suite of administrative tools to manage a Quadrics cluster. The following commands are of particular interest to the end user :
  • prun - this command schedules and runs parallel jobs on the cluster. Please refer to the prun man pages for more information.
  • rinfo - displays resource usage and availability information for parallel jobs.

    The RMS reference manual can be found here.

    The RMS user guide can be found here.

    The SHMEM programming manual can be found here.

    A Brief Introduction to A*CRC Resources presentation (dated 26 Sep 2007) can be downloaded here.
     

  • Other Software To be installed in due course.

     
     
     
    This compute resource is managed by A*STAR Computational Resource Centre (A*CRC)'s HPC Team @ BIOPOLIS.
    Contact A*CRC HPC Team via cluster@acrc.a-star.edu.sg