Timing for interval matrix multiplication
Dear all,
here are some timings concerning mid-rad representation and alternatives
for interval matrix-matrix multiplication. Christian Keil, who is currently
at Waseda university in Tokyo kindly produced the following results.
The setting is as follows. Given two interval matrices A,B in infsup-
representation, calculate the matrix product A*B. All implementations
are performed in C on a 2 GHz Intel Core2 Duo processor, and Atlas
BLAS was used.
As you may know, Profil is a widely used library for interval scalar,
vector and matrix computations developed by my (former) collaborators
Olaf Knueppel, Dirk Husung and a new version by Christian Keil. The
first breakthrough by Profil was not to look at matrix-matrix
multiplication as a sequence of scalar operations with many switches
of the rounding mode. Profil is fast and widely used in the community.
We give timings for different dimensions for the following five
alternatives:
1 INTLAB: synonym for the mid-rad approach. First, infsup-representation
is converted into midrad, then multiplication by well-known formulas, then
the result is transformed back into infsup.
+ uses BLAS 3, thus blocked code, very fast
+ very few switches of rounding mode
- overestimation for very wide intervals
2 Profil dev: The current Profil version under development using
[-inf,sup]
representation, otherwise essentially the "old" Profil
+ very few switches of rounding mode
- speed depends on distribution of zero/nonzero intervals, see below
3 Profil U/D: the current version of Profil with rounding Upwards/Downward
- speed depends on distribution of zero/nonzero intervals
4/5 Outer product version using rank-1 updates
4 min/max performed inline
+ fast, independent of distribution of zero intervals
5 using BLAS 2 for outer product
- cache misses
The computing time of the two Profil versions 2 and 3 depends on the
distribution of intervals in A and B containing and not containing zero,
whereas INTLAB and the outer product variants 4 and 5 do not.
In the following table for dimensions 200/500/1000 for A and B the
computing time for INTLAB is normed to 1. For Profil we tested four
possibilites:
a All intervals in A and B positive
b All intervals in A and B intervals properly containing zero
c Evenly mixed positive/negative intervals in A and B
d Mixed positive/zero/negative intervals in A and B with prob. 1/3
Here are the results:
200 500 1000
--------------------------------------------------------------------
INTLAB 1 1 1
Profil dev 2.0/3.3/5.3/6.7 3.2/4.5/7.1/9.1 3.6/5.1/8.0/10.2
Profil U/D 41/44/43/45 55/59/59/61 62/66/66/68
outer inline 3.4 6.1 8.0
outer BLAS 2 16.4 73.8 85.6
As you see the midrad approach outperforms other approaches.
In particular using BLAS 3 there is no interpretation overhead which
is important for INTLAB. Moreover, the BLAS3 approach in INTLAB allows
easy parallelization.
The number of case distinctions for all positive (case a) and mixed
positive/negative (case c) intervals is the same, however, the latter
is much slower for Profil. This seems due to branch prediction.
The inline outer product may be some compromise for wide intervals,
not so good but also not too bad, and independent of the distribution
of zero/nonzero input intervals.
Best wishes
Siegfried M. Rump
--
=====================================================
Prof. Dr. Siegfried M. Rump
Institute for Reliable Computing
Hamburg University of Technology
Schwarzenbergstr. 95
21071 Hamburg
Germany
phone +49 40 42878 3027
fax +49 40 42878 2489
http://www.ti3.tu-harburg.de
and
Visiting Professor at Waseda University
Faculty of Science and Engineering
Shinjuku Lambdax Bldg. 902
2-4-12 Okubo, Shinjuku-ku
Tokyo 169-0072
Japan
phone/fax in Japan +81 3 5286 3414