ddrogahn:Totally not enough time in two weekends, I expected there to be more pipelining in the contest stages. I dont suppose we could hope for late entry with penalty, in the future?
I was too busy at work, and figuring out tasking, and searching for access to a dual core for testing (HT didnt do much here) to check the calendar. (Wasted several hours because MS C++2005 cannot use separate templated member functions...)
Next month, watch out...
I too felt time constrained and I think that is just the nature of the beast. If Intel had given us three months it still wouldn't have been enough time, cause there is always that "one more thing ...". I had not used C++ in years and was "syntactically challenged" ... unlike you I'm not likely to be any better at it next month :-(
Anyone have a nice C# port of tbb :-D
ddrogahn:
I had hopes for parallelizing Partition... Anyone do anything with that?
I think that might be pretty tough. I would think that the costs of locking the elements so you could do a safe compare/swap would be higher than the gain you could reasonable expect to be returned. Did you have ideas about ways to overcome this?
ddrogahn:
Shouldn't give away all the secrets (sort will almost certainly be useful later):
* Iterative: breadth first (stack) better than depth first (queue) [locality of reference]; tail "recursion"
* concurrent_queue is slow
* recursive threshold helps iterative
* for presorted: Median of 3 partition good (but possibly bad in general); Random pivot good
* pivot balancing, for when there are large amounts of data that match pivot ["discrete" below, 100 of each value randomly distributed]
I doubt that any of us will have many secrets ... I'd hope that Intel will make all the code (or at least the winning code) available to all. After all when we hit submit that code became Intel's property and I would expect them to use it to provide both tbb examples and to improve tbb as well as to promote their multi-core chips.
I wanted to reduce the number of malloc's needed for the queue/while, but didn't have the time. From what I can tell I think that a random pivot is good for diminishing worst case behavior, but I chose a fixed pivot because I figured the time I spent generating a random number could be better spent sorting (and hoping I didn't hit a worst-case case). I also used thresholds (to move from parallel to serial and to move from recursive to in-line). I didn't notice concurrent_queue being slow, but then I don't think I compared it to anything else. I didn't notice that in the example code they gave us, there was a queue class that claimed to be thread safe, but clearly was not (sneaky sneaky Intel).
ddrogahn:
Didnt get around to testing tasked-single threaded, or parallel_sort.
A few numbers:
Thousands of Int's sorted per sec [bigger is better, I/O excluded]
...
Sorry I don't remember my numbers, probably wouldn't mean much since my machine is 5+ years old :-). You did a nice job of laying out your results.