Linux Cluster Info


Configuration

top

Access

top

Known problems

top

Compilers

GCC

Intel

Portland

PathScale

License has expired! top

MPICH

Compile commands

  C C++ F77 F90
GNU mpicc mpiCC mpif77 (broken)
Intel mpicc-icc mpiCC-icc mpif90-ifort
Portland mpicc-pgcc mpiCC-pgCC mpif77-pgf77 mpif90-pgf90
PathScale mpicc-pathcc n/a mpif90-pathf90

Run command

mpirun -np <np> -machinefile <machine-file> <program>

Parallel jobs should be submitted via the batch system.

top


Batch system

Batch jobs are managed by Torque (version 1.2.0p6), a free version of PBS and successor of OpenPBS.

Queues and their (tentative) limits

queue jobs jobs/user nodes CPU-time/job Wall-time/job access
single 2   36h   free
para 5 1 16   1h free
gerhold     6   7d gerhold
bqcd     12     aivoigt
ilgenfri
heimel     8     heimel
abinit     12     fgrosse
tania 6     200h   romanczuk
chroma     16     grieger
fluid 1   4     tillack
sten 10     42d   sten
knittel 10     14d   aknittel
grochol 10     7d   grochol
tmqcd     8   96h marcuspe
diplomfg 4     14d   gudny
megow
manohar 4     20d   manohar
ddhmc     16   48h leder
bartek 6     200h   bartek
overlap     8   8h obaer
sschaef

Node info

      pbsnodes -a             # show status of all nodes
      pbsnodes -a nodeNN      # show status of specified node
      pbsnodes -l             # list inactive nodes
      pbsnodelist             # list status of all nodes (one per line)

Queue info

      qstat -Q                      # show all queues
      qstat -Q <queue>              # show status of specified queue
      qstat -f -Q <queue>           # show full info for specified queue
      qstat -q                      # show all queues (alternative format)
      qstat -q <queue>              # show status of specified queue (alt.)

Job submission and monitoring

      qsub <jobscript>              # submit to default queue
      qsub -q <queue> <jobscript>   # submit to specified queue
      qsub -l nodes=4:ppn=2 <jobscript>   # request 4x2 processors
      qsub -l nodes=nodeNN  <jobscript>   # run on specified node

      qsub -l cput=HH:MM:SS <jobscript>       # limit on CPU time (serial job)
      qsub -l walltime=HH:MM:SS <jobscript>   # limit on wallclock time (parallel job)

      qdel <job_no>                 # delete job (with <job_no> from qstat)

      qstat -a                      # show all jobs
      qstat -a <queue>              # show all jobs in specified queue        
      qstat -f <job_no>             # show full info for specified job
      qstat -n                      # show all jobs and the nodes they occupy

Jobscript

The jobscript is a shell script. Above the shell commands, it may contain qsub options in lines starting with #PBS .
Sample script for a serial job
Sample script for a parallel job

Note that the batch system does not perform an interactive login, but only starts the job script with a remote shell command. This means that the job does not "see" your full interactive environment. You may have to extend the PATH explicitly or specify commands with full pathname. As an illustration, run a job with the command "set" only - it writes the environment (as seen by the job script) to the output file.

top


Temporary disk space

Temporary files may be stored in The large global disk (/data/cluster) should be considered as scratch space - clean up after use, please. Files not accessed for one year will be deleted automatically.
In the small local directories (/scratch, /tmp), files not accessed for 30 days will be deleted automatically.

top


Transition to 64 bits

The main problem in porting programs to a 64-bit system is with integers in C: in the 32-bit world, both long and int usually define the same integer (32 bits), and long is often used without care.

In a 64-bit environment, however, int remains 32 bits, but long grows to 64 bits. The type of long causes trouble with library functions and with constants, which were typed without the L suffix.

The general advice is to convert declarations of long to int, if 32 bits are sufficient.
Exception: data allocations (output of sizeof) are of type long, and functions like malloc() expect an argument of long.

See also the following article.

Fortran90 programs don't seem to pose problems:

top
B Bunk
last modified: Wed Apr 7 12:17:49 CEST 2010