I'm checking out my code's "CPU Usage Histogram" and I have a fairly significant overhead in the function kmp_launch_thread. Is there any way to reduce this time? I have tried varying KMP_AFFINITY with only minor improvements. I'm at a loss. I realize I'm not giving a whole lot of information, so just let me know what you need (system info, code snippets, etc.).
Thanks in advance!