This is not about being the winner, or the best. It is only a question on fair judging.
The problem specified 2^32 for the limit, doing a sort on 250 000 INT is trivial. This is less than 0.006% of the specified range. Everybody will say that most of our algorithm were tuned for an extremely higher quantity of integers.
Our sort for a random 50 000 000 signed longs takes 1.12 sec on a quad core 3ghz
On the same computer the winner's solution took 7.0593 sec for the same 50 000 000 elements. That is 6 times slower.
It was said that those who will optimize the read/write will have an advantage, and over 50 000 000 elements, we read, sort and write in 13 secs, when the winner took 45 secs for the same operation. This is an overall 3.4 times slower.
Also, using Intel Vtune Thread Analyzer on Wei-Yin Chen's application crash it and the software do not write the result. I mean, those are Intel tools, they should be used for scoring the application, since we used them to develop the application.
Here is the conclusive result using the Intel specified test files.
random 100 000 :
Jean-Philippe Doiron's Version:
read_data 0.0025199
sort 0.0031155
write_data 0.00761727
overall 0.0135201
Wei-Yin Chen's version:
read_data 0.0274919
sort 0.0024196788
write_data 0.0352129824
overall 0.0663712130
JPD's version overall runtime is 4.91x faster
random 250 000 :
Jean-Philippe Doiron's Version:
read_data 0.00798503
sort 0.00794066
write_data 0.0218019
overall 0.0380246
Wei-Yin Chen's version:
read_data 0.0711102
sort 0.0061736824
write_data 0.0917116927
overall 0.1703533901
JPD's version overall runtime is 4.48x faster
sorted 250 000 with inversed 0-125000 :
Jean-Philippe Doiron's Version:
read_data 0.00713012
sort 0.000928429
write_data 0.0221195
overall 0.0304531
Wei-Yin Chen's version:
read_data 0.0675642
sort 0.0015339248
write_data 0.0914670419
overall 0.1618304710
JPD's version overall runtime is 5.31x faster
random 1 000 000 :
Jean-Philippe Doiron's Version:
read_data 0.0419785
sort 0.0304754
write_data 0.11227
overall 0.185196
Wei-Yin Chen's version:
read_data 0.293949
sort 0.0267700296
write_data 0.3752933219
overall 0.6975016388
JPD's version overall runtime is 3.77x faster
random 50 000 000 :
Jean-Philippe Doiron's Version:
read_data 2.65815
sort 1.12388
write_data 9.93338
overall 13.7214
Wei-Yin Chen's version:
read_data 15.7649
sort 7.6782576421
write_data 21.6691115932
overall 45.1196687726
JPD's version overall runtime is 3.29x faster (6.83192x faster sort!!!)
sorted 50 000 000 with inversed 0-25000000:
Jean-Philippe Doiron's Version:
read_data 1.90845
sort 0.367165
write_data 9.62512
overall 11.9175
Wei-Yin Chen's version:
read_data 14.7956
sort 7.0593976408
write_data 21.8377143905
overall 43.7023949770
JPD's version overall runtime is 3.67x faster (19.2267x faster sort!!!)
Intel Calculation for Execution & Time Score:
Jean-Philippe Doiron's version:
0.013 + 0.0380246 + 0.030 = 0.081
300 - 0.08 = 299.918 / 3 = 99.972
Penalties : Incorrect output : no. All output verified.(no CRLF on last line)
No use of thread : no. Optimal use of thread.
incorrect use of file : no. Optimal read/write function.
amount of code changed : no. MS VC solution ready to build with no change.
Wei-Yin Chen's version:
0.0663 + 0.1703 + 0.1618 = 0.398
300 - 0.3985 = 299.6014 / 3 = 99.867
How It is possible to receive a ZERO score, this is a little bit confusing?!.
Real-Test that should have been done at least since the problem describes a quantity up to ~4 000 000 000 int.
[adding the 50 000 000 sorted test]
Jean-Philippe Doiron's version:
0.01352 + 0.0380 + 0.0304 + 11.9175 = 12.024
400 - 12.024 = 387.975 / 4 = 96.993
Wei-Yin Chen's version:
0.06637 + 0.17035 + 0.16183 + 43.70239 = 44.1009
400 - 44.1009 = 355.8990 / 4 = 88.97
Correction from first post : I removed : read/write rewritten using multi tread for conversion,