-
dgeld
-
-
-
Joined on 12-18-2004
-
-
Posts 99
-
-
|
Hi all,
if you want to test your code with a non empty grid, I have made a gosper.txt file (attached).
It is a Gosper Gun (http://en.wikipedia.org/wiki/Gun_(CA))
It will fill your grid with gliders
Best regards.
|
|
| |
|
|
Thanks for uploading this. It saved me having to do it
|
|
| |
-
bugman
-
-
-
Joined on 10-14-2007
-
-
Posts 92
-
-
|
Thanks a lot!
It would be intersting to compare the performance of different implementations.
My program produced the output for this file in 33.95 seconds on Core2 Quad@ 2.88 GHz.
BR,
Peter
|
|
| |
-
dgeld
-
-
-
Joined on 12-18-2004
-
-
Posts 99
-
-
|
Hello Peter,
on my (almost serial) version of GameOfLife, I got 22.32 seconds
MacPro, DualCore Intel Xeon, 2.66Ghz, 4MB L2 cache
Intel C++ compiler 10.1
profile guided optimization
I only optimized the cache friendliness of the structures used, the memory allocation and I got a 7 times speedup.
I have tried many parallel version splitting the list spatially and dealing with overlapping. But the cost of these structures was not counterbalanced by the "parallel" gain.
Good luck for the following problems.
|
|
| |
-
denghui0815
-
-
-
Joined on 12-17-2007
-
-
Posts 41
-
-
|
My program produced the output for this file in 25.9 seconds on Core@ 1.6 GHz.
dell640m t2050 1G
GENERATION #100000
-----------------------**--------------- -----------------------**--------------- ----------*----*----------**------**---- --------*-*----*----------***-----**---- **----**-------*----------**------------ **----**-----------**--**--------------- ------**--------**--*--**--------------- --------*-*-----****-------------------- ----------*-------*--------------------- ----------------------------------------
|
|
| |
-
bugman
-
-
-
Joined on 10-14-2007
-
-
Posts 92
-
-
|
Hmm,
my solution seems to be a good example of negative scalability :-(
1 thread: 33.6 seconds
2 threads: 20.6 seconds
3 threads: 27.8 seconds
4 threads: 32.6 seconds
I am wondering what would happen on 8-core judges' machine :-(
|
|
| |
-
jbstjohn
-
-
-
Joined on 10-12-2007
-
-
Posts 70
-
-
|
Oh, now I'm a bit depressed. I finally got around to timing things (using "time" of all the crazy ideas!) and my serial version took 33s, and the parallel one 39s (with two cores). I hope with more cores, and a different problem set it is actually worth it.
And denghui, I kept the output form of the sample program, and printed out a vertical strip, rather than a horizontal one.
25.9s seems very quick to me! Congratulations.
|
|
| |
-
jbstjohn
-
-
-
Joined on 10-12-2007
-
-
Posts 70
-
-
|
You're not alone! I'm wondering if I have a bug, as I just tried mine out, and get
Threads : Seconds
1 34.9 2 38.8 3 39.2 4 38.2 8 37.8 16 37.2
The later values actually make sense, as I get better load balancing with more threads, since I chop up the world more evenly. What seems strange is that the two thread version is so much worse than the single thread (especially given that I don't use any locks or atomics, which could have contention.)
I didn't use affinity -- I wonder if that would play an important role. I also re-create the tasks, rather than recycling them, but that *should* be minor.
(I have a core 2 6600 that is only running at 1600 MHz for some reason, not 2.40G)
|
|
| |
-
bugman
-
-
-
Joined on 10-14-2007
-
-
Posts 92
-
-
|
It seems to me that the problem exhibited by striped implementations is caused by very small number of active cells. This increases the overhead of synchronization and task management.The problem becomes almost serial in this case. But if the grid is uniformly filles with 1E7 active cells, the scalability of my implementation seems to be much better:
1 thread - 143.7 s
2 threads - 95.79 s
3-threads 86.1017 s
4 threads - 85.8872 s
|
|
| |
-
jbstjohn
-
-
-
Joined on 10-12-2007
-
-
Posts 70
-
-
|
Well, you left open the key question: How many cores does your machine have? (Yours at least shows a very good speed-up with a second core.)
|
|
| |
-
bugman
-
-
-
Joined on 10-14-2007
-
-
Posts 92
-
-
|
It was Core2Quad @2.88 GHz
|
|
| |