Large File FAQ

Powerful supercomputers can produce massive amounts of output in a very short time. Recently, our Lustre File System (LFS) has experienced some users generating millions of files occupying tens of terabytes of data over just a few days. This amount of data causes stress on the system, impacting other users, and greatly increases the difficulty of system management.

We have developed this FAQ in order to assist users in developing strategies to deal with their supercomputer simulation outputs. The following tips will help you manage the number and size of your files:

Check to see if you need to write output files at every time step during the simulation. For image outputs (to generate movies), often writing every 10th time step suffices.

If you are writing more than one type of file, say VTK files, PPM files, and CAD files, create a separate directory for each type and write each to their proper subdirectory. This type of directory structure facilitates compression and archiving of data.

Tar up and archive your files. Tar files, just by themselves, can free up a significant amount of space. Individual files often allocate just a little more space than they need, which adds up when generating millions of files. The tar command also has a compression switch which is advantageous for text and other uncompressed file formats. Once your files are tarred, it's important to actually archive the tar file over to mira.nrl.navy.mil.

Finally, often there might be a compressed version of your output file. For example, if you are creating image files, make sure you are using an image file format that uses some sort of compression scheme, such as jpeg files. For one of our users, jpeg reduced the size of his ppm files by a factor of 75.

Below, we have included a sample job script which you may consult when creating your own job scripts:

#!/bin/csh
#PBS -V
#PBS -l select=1:ncpus=8:mpiprocs=8,walltime=01:00,place=scatter:excl
#PBS -N out_ug
#PBS -o out.log
#PBS -e out.err
#PBS -A STAFF
#PBS -m abe


# set the WRK and TMP directories
set WRK = $PBS_WORKDIR
set TMP = $WRK/$PBS_JOBID

# make the 'submit' directory the current directory ...
cd $WRK

# set some parameters ...
limit stacksize unlimited
limit coredumpsize 0

# set some parameters for debugging MPI jobs ...
#setenv MPI_STATS                       1
#setenv MPI_CHECK_ARGS                  1
#setenv MPI_DISPLAY_SETTINGS            1
#setenv MPI_REQUEST_DEBUG               1

# set some parameters for optimizing MPI jobs ...
#setenv MPI_BUFS_PER_HOST 1024
#setenv MPI_BUFS_PER_PROC 1024
#setenv MPI_BUFFER_MAX 1000000
#setenv MPI_IB_SINGLE_COPY_BUFFER_MAX 1000000
#setenv MPI_IB_SINGLE_COPY_BUFFER_MAX 0
#setenv MPI_DEFAULT_SINGLE_COPY_BUFFER_MAX   0

# set some parameters for infiniband ...
set MPI_DSM_DISTRIBUTE
set MPI_USE_IB
set MPI_MEMMAP_OFF

# confine output files to subdirectories of the TMP directory ...
mkdir -p $TMP/{dbl,flt,ppm,vtk}

# copy executable and any input files to the TMP directory ...
cp $WRK/{input,bin}/* $TMP

# make the TMP directory your current directory ...
cd $TMP

# set the striping parameters of the output directories for Lustre 
# comment out for non-lustre scratch directories
lfs setstripe -s 4M -i -1 -c -1 dbl flt ppm vtk

# run the executable ...
time mpiexec ./xadveds > xadveds.out

# tar up the output file directories and delete the original files ...
tar -cvzf dbl.tgz dbl && rm -rf dbl
tar -cvzf flt.tgz flt && rm -rf flt
tar -cvzf ppm.tgz ppm && rm -rf ppm
tar -cvzf vtk.tgz vtk && rm -rf vtk

# cleaning up ...
foreach i ($WRK/{input,bin}/*)
        rm -f $TMP/$i:t
end

exit