Gather instruction would help automatic vectorization for algorithms such as interpolation, raytracing and physics processing.
Surely it would do more help than some of the recently added instructions (movddup comes to mind as a completely useless thing accomplishable otherwise using shuffle).
What irritates me the most in your answer however is the part where you say "Many of us don't care" so arrogantly as if you are the voice of God. Guess what? I don't care if any of you (whoever you might be) care or not! I love optimized code and I just adore usefull instructions and not stupid and redundant ones.
And surely it would be more convenient to write:
mov esi, dword ptr [pix] ; base
gmovps xmm0, xmmword ptr [ip] ; ip
Instead of:
mov esi, dword ptr [pix] ; base
mov eax, dword ptr [ip] ; offset
movd xmm0, [esi + eax]
mov edx, dword ptr [ip + 4]
movd xmm1, [esi + edx]
unpcklps xmm0, xmm1
mov eax, dword ptr [ip + 8]
movd xmm2, [esi + eax]
mov edx, dword ptr [ip + 12]
movd xmm3, [esi + edx]
unpcklps xmm2, xmm3
movhps xmm0, xmm2
?
Regards,
Igor Levicki
http://www.levicki.net/