-
canepag
-
-
-
Joined on 10-18-2007
-
-
Posts 26
-
-
|
I readed in a wrong way your instruction so I optimized and parallelized the recursive algorithm.
Then I read again the instruction and I tested the iterative version of your algorithm: it is 10 times slower than the recursive (on 100000 points), 16 times (on 10000), 12 times (on 1000) and 15 times (on 50 points).
Now a serious question:
may you give me a good reason to try to optimize such a bad algorithm when I was able to run the paralelized recursive algorithm with a very good scaling (50% of the original recursive algorithm with two processors but with a scalable algorithm)?
Respectfully
Gaetano
|
|
| |
|
|
Re: Big "Ethical" problem
In my opinion, before you start to parallelize an algorithm, you should try to eliminate any inefficiencies there. For example, the given algorithm en-queue jobs with any size, which is not a good idea. If you only choose large enough jobs to push in the queue, and do the tiny bits right away, then the performance would be much better. Moreover, if you use std::queue, it would also be better than allocating small pieces of memory every time. I can share your feeling that the given code is not in a good shape. It would be nice if we can use a neat code base, so that we can really focus on the parallelism rather than tuning the original code.
|
|
| |
-
canepag
-
-
-
Joined on 10-18-2007
-
-
Posts 26
-
-
|
Re: Big "Ethical" problem
It is more that not in a good shape: it takes at least 10 time more to do the sam job. you can optimize as much as you want but you will never get to the effeciency of the other code. It will be simple to just apply the parallel while to the code but is a real nonsense.
Gaetano
|
|
| |
|
|
Re: Big "Ethical" problem
I'm not sure if 10X slower is accurate. In my testing environment, the performance of iterative version is only <10% slower than the recursive one, with only <20 lines of change. With tail recursion optimization, it could be even closer. The code base is not that bad, except for the worst case of sorted input.
|
|
| |
-
canepag
-
-
-
Joined on 10-18-2007
-
-
Posts 26
-
-
|
Re: Big "Ethical" problem
Try to use at least 40 points, use a certain amount of random vectors to average and, please, dont measure the time used to read the file (use tick_count). You will be amazed!
The most of the time is spent reading !
Gaetano
|
|
| |
|
|
Re: Big "Ethical" problem
Actually, the numbers I mentioned is done in a pretty reasonable environment. The input file contains 1M to 100M entries of data with uniform distribution over [-2^31, 2^31-1] range, and I use tick_count to measure read/sort/write sections individually, and only the sorting time is considered because only <10% is used for sorting. The performance can be close to std::sort within 30% after some tuning. I guess it cannot be within 10% because std::sort uses merge sort and insertion sort for shorter arrays. On a 4-core machine, parallalizing the iterative version has around 3.3x gain, which is almost 3 times faster than a single-threaded std::sort.
|
|
| |
-
canepag
-
-
-
Joined on 10-18-2007
-
-
Posts 26
-
-
|
Re: Big "Ethical" problem
Have you tried
parallel_sort
on your data? How it compares with your algorithm using only two processors?
With a 100000 element vector (averaged on 1000 realizations) my optimized serial recursive algorithm employ the same time on a 2 processor machine (using insertion sort). My parellel serial is approximately 60% better but it cannot use insertion sort (sincronization reduce the performance)
Gaetano
|
|
| |
-
canepag
-
-
-
Joined on 10-18-2007
-
-
Posts 26
-
-
|
Re: Big "Ethical" problem
Hi wychen,
after removing most of the work limiting the eunqueing to (r-p)>=1000 I got finally a time that is comparable with the recursive algorithm: actually that is just due to the fact that most of the work is done by the recursive algorithm!
I will try to parallelize this algorithm, just for fun!
Gaetano
|
|
| |
|
|
Re: Big "Ethical" problem
Not putting jobs with (r-p)<1k in the queue is in the right dirrection. Good for you. After parallelizing this version, performance comparable to parallel_sort can be expected.
|
|
| |