Below is a rough procedure on how to use the cluster. I'll refine this page later.Please note that the procedure has been tested to be working. If your code can't run, it should be due to some bad hardware; please report it to me and I'll try to fish out the bad node.
1). Log in to the master node using ssh -l $USER pleiades.ucsc.edu, where $USER is your username;
2). The first time you log into pleiades, it will generate an ssh key for you. When it ask you to enter passphrase, just press the Enter key, for no passphrase. This ssh key will be only used between the nodes of this cluster. Do NOT use it at any other place!
3). We use module to manage software environment. Once you log in, run module load hpc/mvapich-topspin, to load the MPI environment, and module load intel/ifort to load the Intel Fortran compiler (or module load intel/icc to load the Intel C/C++ compiler). You can put those 2 commands in your shell initialization file if you always use those modules.
4). Use mpif90.i or mpif77.i to compile your fortran 90 or 77 code. mpif90.i and mpif77.i are just wrappers of the intel compiler ifort, but they save you the trouble of linking the MPI libraries. You can use all the usual ifort options with mpif90.i or mpif77.i (-O2 -ftz -ipo -IPF_fltacc -IPF_fma may be the best generic options).
For example, you can compile ring_f90.f90 using: mpif90.i -o ring_f90.x ring_f90.f90 You don't need to specify the library and include file for MPI. mpif90.i handles it for you under the hood.
5). Now assume you've successfully compiled your code, and you have binary to run. DO NOT run your code from your home directory, type cd work, where work is symbolic link to the fast storage.
To run ring_f90.x or your binary, do,
either). mpirun_ssh -np 8 -machinefile Machines ./ring_f90.x
The file Machines looks like:
Since each node has 4 cores, you have to specify each node 4 times. For a cluster with 800 cores, this process is very tedious. But please test this to make sure your code can successfully run on 8 cores.
or). This is the preferred method: use the Lava job scheduler (a free version of LSF, quite similar to PBS):
First copy mpich-mpirun_ssh to your work directory
cp `which mpich-mpirun_ssh` /home/$USER/work
Then run your MPI job in your work directory
bsub -n 8 -o ring.out ./mpich-mpirun_ssh -np 8 ./ring_f90.x
The script mpich-mpirun_ssh is a wrapper of mpirun_ssh, it works with bsub to generate the necessary machinefile. Using this method, you only need to specify the number of processors (8 in this case) to run your code, the job scheduler will do everything else for you.
or you can put everything into a script (check the example myscript), and run
bsub < myscript
6). Serial jobs. Running serial jobs on Pleiades is not encouraged. If you really have to do this, below is how you should submit your serial jobs.
To submit a single serial job:
bsub -o my.out -i my.in ./my.x,
where my.x is the name of your executable, my.in the standard input, and my.out is the standard output.
Most likely you'd like run the same executable but with many different inputs, then run,
bsub < myserial
The bash script myserial looks like:
#!/bin/bash for i in `seq 1 10`;do bsub -o out.$i -i input.$i ./my.x done