ATLAS
and AMD
ATLAS (Automatically Tuned Linear Algebra Software) supports C and Fortran77
bindings for the Basic Linear Algebra Subprograms (BLAS), a widely used, performance-critical,
linear algebra kernel library. It also provides some optimized versions of the
higher level LAPACK (Linear Algebra PACKage) API. Further details on ATLAS can
be found at its homepage: http://math-atlas.sourceforge.net/
ATLAS has long been an available set of BLAS for the AMD Athlon processor.
The theoretical peak for x87 floating point code on an AMD Athlon is twice the
clock rate. Initially, ATLAS' portable code generator achieved roughly 60% of
theoretical peak for matrix multiply, the most important BLAS kernel. Later
on, Julian Ruhe, an open source programmer, provided a hand-tuned assembler
kernel employing many techniques not available in any compiler, to boost the
potential performance to greater than 77% of peak (out of cache).
The AMD Opteron and AMD Athlon 64 processors are ideal for demanding applications,
such as high performance computing. AMD is working with the founder of the ATLAS
project, R. Clint Whaley, in the production of a highly efficient BLAS library
for the AMD Opteron and AMD Athlon 64 processors. Initial AMD64 tuning and functionality
is available in the ATLAS 3.5.0 library release, available at:
http://sourceforge.net/project/showfiles.php?group_id=23725
The AMD Opteron and AMD Athlon 64 processors have a peak floating point performance
of twice the clock speed for double precision,a for single precision. ATLAS presently achieves roughly 85% of this peak for
real matrix multiply, and 83-84% for complex matmul on an AMD Athlon 64.
In order to get this percentage of peak, hand-tuned kernels were developed.
These open source kernels utilize AMD64 assembly to exploit the expanded SSE
resources available on these new chips.