Hello,
I am writing a paper about interval arithmetic using SSE2 instructions which is part of
my library for exact real number computations, and while doing it I realized SSE3 could have been quite helpful if it were done slightly differently.
My exact question is: I am curious why did Intel prefer to include a addsub instruction instead of multiplication with one of the arguments negated, i.e. something like
mulpnpd xmm1,xmm2
giving xmm1.1 * xmm2.1, (-xmm1.0) * xmm2.0
Using this the addsubpd instruction would not be needed to compute complex multiplications and divisions.
What I believe to be more important, however, is the behavior of Intel's sample SSE3 code for complex multiplication when the rounding mode is set to something other than rounding-to-nearest. More specifically, the SSE3 complex multiplication code would not compute upper bounds for the product when the rounding is to +inf, nor lower bounds for -inf, because the rounding of the multiplication that computes the substracted component would be rounded incorrectly.
This would not be the case if a mulpn instruction were available instead of addsub, because the result of the multiplication would be rounded the correct way. A mulpn would also be very useful for single or double precision interval arithmetic using the SIMD registers.
Does anyone know why Intel preferred addsub to this?