Forum Jump

Select Group :
Select Forum :
Sorted By :
Sort Order :
From The :
 
 
Thread Tools  Search this thread 
nzea
June 30, 2008 5:20 PM PDT
TBB tasks return_list
I'm writing a benchmark making use of tbb tasks spawned largely with the "allocate_additional_child_of(...)" command, due to not knowing how many tasks to spawn until runtime. Currently I performan set_ref_count(1), then in a loop perform as many allocates and spawns as necessary, then doing a wait_for_all. This seems to run fine for small numbers of tasks, but if I start getting into larger values (thousands), I begin to get random assertion failures, the most common of which is "another thread emptied the return_list."

Due to getting different assertion failures for the same command line arguments, I assumed it was a memory corruption issue, and have tried tracing it down with both microsoft's debug heap and OSX's guardmalloc, both of which haven't found anything (on the contrary, the benchmark runs without problem when using malloc debug tools, possibly due to the slowdown removing any race conditions).

Is there any information on what this return_list is? The assertion fails when I try to spawn another task using allocate_additional_child_of(*this), but I'm not sure what it means. Other assertions that pop up tend to do with "small_task_count corrupted" and "attempt to spawn task that is not in allocated state", even though I only spawn tasks immediately after allocating them. Perhaps how I'm spawning tasks isn't correct? Hopefully understanding this return_list may help me trace down the issue.
Andrey Marochko
Total Points: 1436
Status Points: 936
Brown Belt
July 1, 2008 2:25 AM PDT
Rate
 
#1
I have composed a simple test for this kind of spawning. It runs from 100 to 1.000.000 tasks with different workloads (from 10 to 10K flops). It works fine on dual procesor 8 core machine. Could you check if it works for you, and if it correctly represents what you tried to do in your app?


static volatile double g_vol = 0;

class leaf_task : public tbb::task
{
    tbb::task* execute () {
        ++g_tasks_started;
        for ( size_t i = 0; i < 10000; ++i )
            g_vol += i/g_vol;
        return NULL;
    }
}; // class leaf_task

class simple_root_task : public tbb::task
{
    const size_t m_task_count;
    const size_t m_task_load;

    tbb::task* execute () {
        set_ref_count(1);
        for ( size_t i = 0; i < m_task_count; ++i ) {
            spawn( *new( allocate_additional_child_of(*this) ) leaf_task );
        }
        wait_for_all();
        return NULL;
    }
public:
    simple_root_task ( size_t tasks, size_t taskLoad )
        : m_task_count(tasks)
        , m_task_load(taskLoad)
    {}
}; // class simple_root_task

void Test ()
{
    for ( size_t i=100; i<=1000000; i*=10 ) {
        for ( size_t l=10; l<=1000000000/i; l*=10 ) {
            tbb::task &r = *new( tbb::task::allocate_root() ) simple_root_task(i, l);
            try {
                tbb::task::spawn_root_and_wait(r);
            } catch ( ... ) {
                printf ("TestX: exception caught ");
            }
            printf ("Done: num tasks %d, task load %d ", i, l);
        }
    }
} // Test


nzea
July 1, 2008 2:52 PM PDT
Rate
 
#2 Reply to #1

Your code ran fine for me. After digging around a bit more, I realize what I'm doing is a bit more complicated that I described. I'm trying to virtualize the task creation, and I think it's this indirection that may be causing the problems. I've modified your code to better illustrate what I'm doing. This modified code crashes on me when more than 1000 tasks are running at once (although not necessarily with the return_list assertion).

 

static volatile double g_vol = 0;

double g_tasks_started=0;

class leaf_task : public tbb::task

{

tbb::task* execute () {

++g_tasks_started;

for ( size_t i = 0; i < 10000; ++i )

g_vol += i/g_vol;

return NULL;

}

public:

task* creator(task *parent){

return new(parent->allocate_additional_child_of(*parent)) leaf_task;

}

}; // class leaf_task

class task_launcher : public task {

task* execute(){

leaf_task t;

spawn(*(t.creator(parent())));

return NULL;

}

}; // class task_launcher

class simple_root_task : public tbb::task

{

const size_t m_task_count;

const size_t m_task_load;

void newTask(){

spawn(*new(allocate_additional_child_of(*this)) task_launcher);

}

tbb::task* execute () {

set_ref_count(1);

for ( size_t i = 0; i < m_task_count; ++i ) {

//spawn( *new( allocate_additional_child_of(*this) ) leaf_task );

newTask();

}

wait_for_all();

return NULL;

}

public:

simple_root_task ( size_t tasks, size_t taskLoad )

: m_task_count(tasks)

, m_task_load(taskLoad)

{}

}; // class simple_root_task

void Test ()

{

for ( size_t i=100; i<=1000000; i*=10 ) {

for ( size_t l=10; l<=1000000000/i; l*=10 ) {

tbb::task &r = *new( tbb::task::allocate_root() ) simple_root_task(i, l);

try {

tbb::task::spawn_root_and_wait(r);

} catch ( ... ) {

printf ("TestX: exception caught ");

}

printf ("Done: num tasks %d, task load %d ", i, l);

}

}

} // Test

Andrey Marochko
Total Points: 1436
Status Points: 936
Brown Belt
July 2, 2008 12:59 AM PDT
Rate
 
#3 Reply to #2
All right, now I see where is the problem. Functions like allocate_additional_child_of are treaky beasts as they implicitly use another argument - "this". And it is easy to overlook the fact that "this" plays a critical role here, it establishes thread ownership for the new task. And it has a requirement to be owned by the current thread. In your example you use parent as owner, but the parent can be (and, as the corrupted memory testifies, is) owned by (running on) another thread at the moment of some of its descendants allocation.

Here is the fixed  piece of code:


class leaf_task : public tbb::task
{
    tbb::task* execute () {
        ++g_tasks_started;
        for ( size_t i = 0; i < 10000; ++i )
            g_vol += i/g_vol;
        return NULL;
    }
public:
    static tbb::task* create ( tbb::task *owner, tbb::task *parent ) {
        return new( owner->allocate_additional_child_of(*parent) ) leaf_task;
    }
}; // class leaf_task

class task_launcher : public tbb::task {
    tbb::task* execute(){
        spawn( *leaf_task::create(this, parent()) );
        return NULL;
    }
}; // class task_launcher


Important changes are in bold.

Note also that I turned the instance method "creator" into the static one. It allows you to avoid creating fake local objects to call the instance method, and makes the code cleaner. My heart nearly stopped when I saw that leaf_task object allocated on the stacksmiley [:-)].

Raf Schietekat
Total Points: 7250
Status Points: 6750
Brown Belt
July 2, 2008 1:08 AM PDT
Rate
 
#4 Reply to #2
To Andrey Marochko: "volatile" doesn't work in C++, as Arch Robison will tell you, so you have a race on g_vol; the self-assignment operation is another reason to have a mutex.

To "nzea": Don't call allocate_additional_child_of() on "parent", because "this" may have been stolen and then there are ownership issues, see reference below.

To TBB: I see "8.3.2.4 new( this.task::allocate_additional_child_of( parent ))" in reference manual revision 1.9 (may not be most recent, but I have to run now), which should be either "this->" or maybe something like "t.".

(Hey, we had a race on the thread and Andrey won... First and third paragraphs are still relevant, though.)

Andrey Marochko
Total Points: 1436
Status Points: 936
Brown Belt
July 2, 2008 1:20 AM PDT
Rate
 
#5 Reply to #4
To Raf: I know pretty well about volatile limits and issues. In the test above I do not care about accumulated value correctness. "volatile" here is just simple way to prevent the optimizer from eliminating the loop in release mode (I checked whether the sample works with different loads in leaf tasks). Though from the methodological standpoint you are right of course, spreading bad habits is not goodsmiley [:-)], so tbb::atomic would be a better choice.
nzea
July 2, 2008 3:17 PM PDT
Rate
 
#6 Reply to #5
That did the trick, thanks for clearing that up! I was confused about what the left side of the allocation represented, but this makes sense now.
Arch Robison (Intel)
July 2, 2008 5:00 PM PDT
Rate
 
#7 Reply to #6

This discussion inspired me to upgrade my implementation of task_group on http://softwareblogs.intel.com/2008/07/02/implementing-task_group-interface-in-tbb to allow multiple threads to create tasks within the same task_group.  I used task::self() for the "owner".  It depends upon an undocumented implementation feature that when there is no running task on a master thread, nonetheless task::self() returns a reference to a dummy task owned by that thread.

 


Forum Statistics

4488 users have contributed to 24040 threads and 70114 posts to date.
In the past 24 hours, we have 22 new thread(s) 111 new posts(s), and 143 new user(s).

In the past 3 days, the most popular thread for everyone has been IVF 11.0 and VS2008 The most posts were made to IVF 11.0 and VS2008 The post with the most views is Leigh, Have you built this fo

Please welcome our newest member janet.p.simpsonnasa.gov