I tested with different combinations of ipo, ipo-separate, with or without ipo-jobs8. The same message appear:
An internal threshold was exceeded: loops may not be vectorized or parallelized. Try to reduce routine size.
The message repeated a lot of times. I wonder if there is a way to turn the message off. The memory usage at peak is about 1.5G during the ipo procedure. The message is not shown when the code is compiled with Intel 9.0 compiler on a 32-bits system. This 10.1 compiler is running on an EM64T system.
I also observed there are at most 2 processes (I have more than 8 jobs to process) during the ipo procedure when I use -ipo-jobs4 option on a 4-core Linux machine. Is there any dependency issue limit the number of ipo process can not go beyond 2? As a result, we can expect no more than two times faster, which have been confirmed with my test.
I like the new parallel IPO option, while it would be better if it could better take advantage of multicore (4, 8 ...) computer. Also, it would be nice if this feature is available for building large static library with xild.