Welcome to the Q Beowulf Cluster
System Description
-
One Master Node (q.ucolick.org):
Dual AMD Opteron 246 processor, 4GB RAM,
2TB software RAID5, 250GB SCSI hard drive, Gigabit Ethernet
-
Fifteen Compute Node (q01-q15):
Dual AMD Opteron 246 processor, 2GB RAM,
250GB SCSI hard drive, Gigabit Ethernet
-
One Storage Node (qdat):
Dual AMD Opteron 246 processor, 4GB RAM,
4.8TB software RAID5, Gigabit Ethernet
Instructions
-
Storage System (qdat):
qdat is on both the cluster's local network and the internet.
You can login to qdat with ssh. If you are inside the lick
network, you can simply do "ssh qdat" to login.
Your home directory on qdat /home/YourUserName is
mounted on both the master node and all the compute nodes at
/qdat/YourUserName. Therefore you do not have to use
ssh or scp to access qdat if you are on q. If you need to
transfer a large amount of data between your desktop (not q) and
qdat, you should connect
to qdat directly, not via q:/qdat. As you know,
Lick has a firewall. If you are on a computer outside the lick
network which is also behind a firewall. You may want to use ssh
tunneling to penetrate the firewalls. For example, you can do
ssh -L 9022:qdat.ucolick.org:22 -N ssh.ucolick.org first
and then you can do "ssh -p 9022 localhost" to login to
qdat or "scp -P 9022 a_local_file
localhost:destination_on_qdat" to copy a file to qdat.
Note that a captial P is used in scp,
whereas a lower case p in ssh. If your user names are different
on these computers, you can use user@hostname instead of
hostname only.
-
Node Access:
Login to nodes (q01-q15) with ssh.
-
File System:
NO FILES ARE BACKED UP.
Your home directory, /home/YourUserName,
is on the RAID disk and is mounted
on every compute node using NFS. When you login to a node,
you are put into your home directory by default. You also have a
local directory, /localdata/YourUserName,
on each compute node.
-
Compiler:
PGI compiler is installed at /raid/pgi. You can
compile your codes on the master node and run them on the computer
nodes.
-
Single Processor Job:
Please remember that each computer node has
two processors. Therefore you can run two jobs on one node.
Do NOT use your home directory for
jobs with intensive data I/O!
Run these jobs in /localdata/YourUserName.
-
Parallel Job:
MPICH is installed at /raid/mpich. The currently
installed version is 1.2.7. Here is an example of how to compile
a parallel code. "pgf90 -fast -o YourExecutable -r8
-i4 -I/raid/mpich/include YourFortranFile -L
/raid/mpich/lib -lmpich -lfmpich".
You can run a parallel job by
issuing command "mpirun" on the master node. Here is an
example: "mpirun -np 8 -nolocal
-machinefile YourMachines YourExecutable".
This will use 8 processors (4 nodes) for the job. Do not
forget the option
"-nolocal". The file YourMachines
is a list of
nodes to run, one node name per line. Since each node has two
processors, you should list it like this:
q0x:2.
Often times you want to put your job into the background and maybe
you even want to log off q. Then you can use nohup.
For example, you can put a line like this
"mpirun -np 8 -nolocal -machinefile YourMachines
YourExecutable > mpi.out &" into an executable file
run.sh and start your job by issuing
"nohup ./run.sh &". This will allow the job to continue
running after you log off q.
-
Cluster Load:
Try "Top" or "nodetop".
If you are inside the Lick firewall, you can watch
the cluster load by visiting
http://q.ucolick.org/ganglia.
-
Other Softwares: The following
softwares are also installed at /raid/: HDF5, HDF4 and
IDL.
Weiqun Zhang
(zhang_AT_ucolick_DOT_org)