Did I understand it right that your tbb::parallel_reduce loop follows the omp parallel region (as opposed to nesting into it)? It's not clear from the pseudo-code above.
If you are using Intel Compiler and its OpenMP implementation, be aware that OpenMP worker threads spin for some time after the end of a parallel region before going asleep. This spin time is controlled by the KMP_BLOCKTIME environment variable. As you run a tbb parallel loop right after that, I recommend you set the KMP_BLOCKTIME to 0, and it will make OpenMP worker threads sleeping immediately after the region. See the compiler documentation for more details about KMP_BLOCKTIME. Besides the environment variable, you could control the setting with kmp_set_blocktime() and kmp_get_blocktime() calls; you should set it to 0 before entering the omp parallel region. Again, these functions and the environment variable are Intel Compiler specific.
Alexey Kukanov
TBB developer