VTune Amplifier XE 2011 fails writing sampling result on linux based embedded environment

November 9, 2012, 3:13 pm

Latest and popular articles on Intel Technologies

≫ Next: Capturing IO Activity using Performance Counters

I have a licensed copy of XE 2011 VTune amplifier. I would like to install the CLI part of VTune to the target embedded environment running Oracle Enterprise 64-bit Linux for x86. To test the installation I issue the following command: "amplxe-cl -collect hotspots -r /tmp/vtune.du du /opt". I can see the du command is launched and completed correctly. However, it hangs while writing the sampling results. I am attaching the strace dump and the result files for your analysis. I would appreciate your help.

Thanks in advance,

-Arup Biswas

Attachment	Size
Download strace.txt	136.5 KB
Download vtune.tar	14.5 KB

↧

Capturing IO Activity using Performance Counters

November 11, 2012, 1:48 am

Latest and popular articles on Intel Technologies

≫ Next: Very Slow getc() in Concurrency Analysis

≪ Previous: VTune Amplifier XE 2011 fails writing sampling result on linux based embedded environment

Hello ,

I am running some heavy IO operations ( such as generating huge files or reading them ) on Intel Westmere W3670 and would like to capture this behaviour. Are there any specific counters available for the purpose ?

Also , what do IO_TRANSACTIONS and MEM_UNCORE_RETIRED:UNCACHEABLE denote / mean ?

Thanks ,
Jaspal

↧

Very Slow getc() in Concurrency Analysis

November 15, 2012, 1:14 am

Latest and popular articles on Intel Technologies

≫ Next: Does VTune 2013 support Ubuntu 12.04(64bit)?

≪ Previous: Capturing IO Activity using Performance Counters

Recently I was using VTune to profile some parallel benchmarks. I found some programs took much longer time to execute on vtune than directly executing them. After some tests, I found that the system function "getc" is very slow when I use VTune, therefore the time of loading big input files is very long. To make this scenario easy to reproduce, I compiled a simple md5sum program (ftp://quatramaran.ens.fr/pub/madore/misc/md5sum.c) which calls getc and then do a concurrency analysis on VTune.

$ gcc -o md5sum -g -O2 md5sum.c
$ time ./md5sum webdocs_250k.dat
f44f49ac4fb609005ba3bd2fb511df54 webdocs_250k.dat
real 0m2.997s
user 0m2.959s
sys 0m0.036s

$ amplxe-cl -collect concurrency -knob enable-user-tasks=true -- ./md5sum webdocs_250k.dat
f44f49ac4fb609005ba3bd2fb511df54 webdocs_250k.dat
Using result path `/home/zhang/parsec-3.0/pkgs/apps/freqmine/run/r005cc'
Executing actions 34 % Resolving information for `libc-2.3.4.so'
Warning: Cannot locate symbols for file `/opt/intel/vtune_amplifier_xe_2013/lib64/pinruntime/glibc/libc-2.3.4.so'.
Executing actions 36 % Resolving information for `libc-2.12.so'
Warning: Cannot locate symbols for file `/lib64/libc-2.12.so'.
Executing actions 36 % Resolving information for `libtpsstool.so'
Warning: Cannot locate symbols for file `/opt/intel/vtune_amplifier_xe_2013/lib64/libtpsstool.so'.
Executing actions 50 % Generating a report
Summary
-------

Average Concurrency: 1.000
Elapsed Time: 135.801
CPU Time: 135.798
Wait Time: 0.007
CPU Usage: 1.000
Executing actions 100 % done

As you can see, when I run md5sum directly, it only takes about 3s. But if I use vtune (2013update2), it takes about 130s! The analysis result shown in VTune GUI (see the picture attached) clearly indicates that the "getc" function call takes 127s! I wonder if it is a known issue of VTune, or I have to use some correct commandline arguments to get expected getc speed, or it is related to my system configuration (kernel, glibc, etc.).

Here is my system information:

$ cat /etc/issue
CentOS release 6.3 (Final)
Kernel \r on an \m

$ uname -a
Linux yang 2.6.32-279.9.1.el6.x86_64 #1 SMP Tue Sep 25 21:43:11 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
$ gcc --version
gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ amplxe-cl --version
Intel(R) VTune(TM) Amplifier XE 2013 Update 2 (build 253325) Command Line Tool
Copyright (C) 2009-2012 Intel Corporation. All rights reserved.

Any suggestions are appreciated. Thanks!

Attachment	Size
Download vtune.png	120.2 KB
Download md5sum.c	7 KB

↧

Does VTune 2013 support Ubuntu 12.04(64bit)?

November 16, 2012, 6:46 am

Latest and popular articles on Intel Technologies

≫ Next: Error: kernel NOT compiled with frame pointers

≪ Previous: Very Slow getc() in Concurrency Analysis

There is no problem during the installation process.

But when I take the tool to analysis the program, it says" Failed to finalize the result: the result you are opening is empty……"

The program is successful to analysis by VTune 2011 （Fedora 13 System）.

↧

Error: kernel NOT compiled with frame pointers

November 22, 2012, 11:04 pm

Latest and popular articles on Intel Technologies

≫ Next: VTune 2013 install fail

≪ Previous: Does VTune 2013 support Ubuntu 12.04(64bit)?

ÖS : SUSE SLES 11 SP2 x86_64
Hardware : E5-2600 DP system
Intel(R) VTune(TM) Amplifier XE 2013 Update 2"

It will print this error message. What is the side effect of it?

[   24.498884] **********************************************************************************************************
[   24.498890] Error: kernel NOT compiled with frame pointers -- NO KERNEL-SPACE TIMER CALL TRACES WILL BE GENERATED!
[   24.498893] **********************************************************************************************************

↧

VTune 2013 install fail

November 23, 2012, 2:07 am

Latest and popular articles on Intel Technologies

≫ Next: How to find bottlenecks caused by WaitNamedPipe()?

≪ Previous: Error: kernel NOT compiled with frame pointers

When installing vtune with vtune_amplifier_xe_2013_update2.tar.gz on Jefferson Pass server. First it said that Fedora 14 is not a supported OS. After skipping this point, it halts as below:

Step no: 5 of 6 | Installation
--------------------------------------------------------------------------------
Each component will be installed individually. If you cancel the installation,
components that have been completely installed will remain on your system. This
installation may take several minutes, depending on your system and the options
you selected.
--------------------------------------------------------------------------------
Installing Amplifier XE Command line interface component... done
--------------------------------------------------------------------------------
Installing Amplifier XE Sampling driver kit component...
WARNING: Failed to build driver.
Suggestion: after the installation completes, see
'/opt/intel/vtune_amplifier_xe_2013/sepdk/src/README.txt' for information on how
to build and load the driver into the kernel.
--------------------------------------------------------------------------------
Installing Amplifier XE Power driver kit component...

And the README.txt' file is empty.

↧

How to find bottlenecks caused by WaitNamedPipe()?

November 29, 2012, 1:34 am

Latest and popular articles on Intel Technologies

≫ Next: suspend does not work on ubuntu 12.04 laptop

≪ Previous: VTune 2013 install fail

I am using Amplifier XE 2013 (with Update 2), but I have trouble finding bottlenecks that are caused by excessive wait times due to calls to WaitNamedPipe().

I have tried the analysis types "Hotspots" and "Locks and Waits", but none of these work; they always display 0 seconds for WaitNamedPipe(), even though that function caused more than 50 seconds in some of my programs.

What needs to be configured in Amplifier XE in order to properly analyze programs that use WaitNamedPipe()? Or is this a bug in the current Amplifier XE version?

↧

suspend does not work on ubuntu 12.04 laptop

December 2, 2012, 10:38 am

Latest and popular articles on Intel Technologies

≫ Next: VTune Amplifier XE 2013 fails writing sampling result on linux based embedded environment

≪ Previous: How to find bottlenecks caused by WaitNamedPipe()?

During the installation of the Intel VTune Amplifier XE 2013 build 253325 the OS (ubuntu 12.04.1 LTS) I get the folowing warning: Installing Amplifier XE Sampling driver kit component... WARNING: NMI watchdog timer is enabled. Suggestion: turn off the nmi_watchdog timer before running sampling. When I turn off nmi_watchdog and reinstall, everything looks good, but the Ubuntu 12.04 turn on nmi_watchdog after each restart. After installation of the Intel VTune Amplifier XE 2013 build 253325 ,the Ubuntu 12.04.1 LTS (supported OS) is not able to suspend or hibernate on laptop (ThinkPad T410). After uninstallation Intel VTune Amplifier XE 2013 everything works OK. Is there any workaround how to solve this problem?

↧

VTune Amplifier XE 2013 fails writing sampling result on linux based embedded environment

December 4, 2012, 2:58 pm

Latest and popular articles on Intel Technologies

≫ Next: VTune Amplifier XE 2013 Update 2 (build 253325) freezes on exit Win 7

≪ Previous: suspend does not work on ubuntu 12.04 laptop

Everytime, I attempt to post on your forum, I get the following error:

The requested URL was rejected. Please consult with your administrator.

Your support ID is: 8287664565553476927

Does this ring a bell? I would appreciate any help. I double checked with our IT admin, it is not our firewall that is rejecting this post.

Please help,

-Arup Biswas

↧

VTune Amplifier XE 2013 Update 2 (build 253325) freezes on exit Win 7

December 11, 2012, 7:24 am

Latest and popular articles on Intel Technologies

≫ Next: Installation on Windows crashes

≪ Previous: VTune Amplifier XE 2013 fails writing sampling result on linux based embedded environment

On Win7 64 bit, VTune Amplifier XE 2013 locks up on exit and needs to be terminated nearly every time I use it on a medium sized problem. Is this a known issue?

And while I am here, I would like to say that the "new" user interface of the start page with a few "clickable" links in the middle of a blank page is a huge backward step.

↧

Installation on Windows crashes

December 12, 2012, 5:53 am

Latest and popular articles on Intel Technologies

≫ Next: vtune amplifier XE 2013 reported wrong elapsed time

≪ Previous: VTune Amplifier XE 2013 Update 2 (build 253325) freezes on exit Win 7

Hello,

I can't install Intel VTune Amplifier XE anymore.

I tried to install the Intel VTune Amplifier XE 2013 Update 2 (alongside an already existing 2011 version), but it ended with the following error message:

Module C:\Program Files (x86)\Intel\VTune Amplifier XE 2013\bin64\amplxe_tpssmrte_clrprof_1.0.dll failed to register. HRESULT -2147024703. Contact your support personnel.

-2147024703 , which is 0x800700C1 meaning it's not a valid application. I'm using Windows 7 64bit. Because of this error message I choose 'Cancel' and the setup rolled everything back. Since then I can't reinstall 2013 or 2011. It always crashes, when I choose to install the graphical user interface (Command.line interface and Event-based sampling driver can be installed with the above error message, but this message can be ignored). The setup program shows 'Pre-requisite Issues Summary' (which is empty) and then crashes:

Unhandled exception at 0x774d15de (ntdll.dll) in Setup.exe: 0xC0150010: The activation context being deactivated is not active for the current thread of execution.

Callstack:
   ntdll.dll!_ZwRaiseException@12() + 0x12 bytes
    ntdll.dll!_ZwRaiseException@12() + 0x12 bytes
    Setup.exe!00626ca3()
    [Frames below may be incorrect and/or missing, no symbols loaded for Setup.exe]
    Setup.exe!0062735f()
    Setup.exe!0062735f()
    user32.dll!_UserCallWinProcCheckWow@32() + 0x10e bytes
    user32.dll!_DispatchMessageWorker@8() + 0xed bytes
    user32.dll!_DispatchMessageW@4() + 0xf bytes
    Setup.exe!00534329()
    Setup.exe!00535cd9()
    Setup.exe!0048e7a3()
    ntdll.dll!_RtlDosSearchPath_Ustr@36() + 0xe635 bytes
    ntdll.dll!_RtlDosSearchPath_Ustr@36() + 0xe635 bytes

The same now happens with the setup program for the 2011 version. I can't get it to reintegrate into VS2010 again.

What can I do to fix this?

Greetings,
Alexander Motzkau

↧

vtune amplifier XE 2013 reported wrong elapsed time

December 12, 2012, 10:25 am

Latest and popular articles on Intel Technologies

≫ Next: New full screen mode in VTune Amplifier XE 2013 Update 3 for Windows*

≪ Previous: Installation on Windows crashes

Hi, I am using vtune amplifier XE 2013 to profile my program. The runtime reported by my program is 214047.34 seconds.

The time reported by Amplifier is as follows:

Elapsed Time: 85076.861

CPU Time: 84286.360

CPU Usage: 1.000

Executing actions 100 % done

I checked files created at the beginning and end of the program run. The creation time is about 2.5 days apart, which is close to 214047.34s runtime reported by my program. Vtune reported time is about 1 day.

I don't have any multi-threading in my program. I was doing hotspot analysis: -collect hotspots. OS is linux Sles10. Why does vtune report a wrong elapsed time?

↧

New full screen mode in VTune Amplifier XE 2013 Update 3 for Windows*

December 14, 2012, 2:05 pm

Latest and popular articles on Intel Technologies

≫ Next: Manualresetevents

≪ Previous: vtune amplifier XE 2013 reported wrong elapsed time

Sometimes there is never enough screen space to view a result with lots of data or metrics. To help alleviate this condition, we recently added "Full Screen" mode to the product. To enable full screen mode, press F11 or select "Full Screen" from the View menu in the standalone graphical interface. This function maximizes the current result display to the whole screen, enabling a larger viewing area, similar to the Microsoft* Internet Explorer* functionality. Besides reducing window decoration, this feature will also temporarily hide the Project Navigator, if it is present.

To return to normal mode, simply press Escape or F11, again.

Let us know what you think about this new capability and any suggestions you might have for making viewing of the data easier.

↧

Manualresetevents

December 17, 2012, 11:53 am

Latest and popular articles on Intel Technologies

≫ Next: What is the secret of Locks and Waits?

≪ Previous: New full screen mode in VTune Amplifier XE 2013 Update 3 for Windows*

Hi,

We are converting a stochastic simulation fortran program to OpenMP as the outputs of the program can be summed. In the simplest mode, we have just made the main loop a parallel region with firstprivate. No matter how many threads we launch, the wall time consumed is roughly the time for a single thread times the number of threads. The problem seems to be _kmp_launch_monitor which is having 200ms waits for ManualResetEvents. Eliminating atomic and critical sections has little effect on the outcome. Using OMP DO likewise.

Reading a bit on ManualResetEvents has not helped. Where should we be looking for the cause of the ManualResetEvents? Can we make the wait time shorter? Make them go away?

I gather that the launch monitor will always be there in an Intel OpenMP solution? Otherwise the code is working as desired.

thanks for any suggestions.

↧

What is the secret of Locks and Waits?

January 3, 2013, 7:33 pm

Latest and popular articles on Intel Technologies

≫ Next: Problem with VTune APIs resume/pause and -mavx option

≪ Previous: Manualresetevents

Hi,

We're trying to move a stochastic Fortran program to OpenMP with XE 2013 in Win 7 using Visual Studio. Basically, we want to run many copies of a program, after the initial read-in of tables while sharing a couple of the large (basically read-only) tables between the threads. In the simplest configuration, two large do loops, with subroutines and modules, are completely enclosed in just a parallel region, firstprivate, except for a couple of shared arrays. In this version, after entering the parallel region, the threads never leave it. We've tried more complicated uses of OpenMP but, in all cases, we get only modest improvements...i.e., a number of threads running but very lilttle improvement in total accomplishment in wall time compared to only one thread.

Finally, we're getting substantial improvement (e.g., four threads-worth of work in only twice the wall time of one thread) HOWEVER, it only occurs if the code is run in VTune locks and waits (x64, in either debug or release mode). If, immediately after running in this mode, we run the same program (cntl-f5) w/o VTune, it reverts to very lttle gain (i.e., the same amount of work in the same amount of wall time, no matter how many threads are running).

Here's the question: What is VTune Locks and Waits doing that is speeding things up?

We judge the success by the quantity and quality of the output so nothing is being missed in the execution. Also, Locks and Waits is not completely consistent, sometimes it, too, runs slowly. Whichever mode (fast or slow) it starts in, it continues indefinately. This has been observed on both i7 and an E5 machines. There may be an issue of the order of running w/ and w/o Locks and Waits and compiling but we have not been able to pin down any consistant behavior in that regard.

We're hoping that whatever Locks and Waits has discovered, we can use to achieve the speed ups we need to move this to a MIC.

thanks,

Bruce

↧

Problem with VTune APIs resume/pause and -mavx option

January 8, 2013, 7:16 am

Latest and popular articles on Intel Technologies

≫ Next: PREFETCHNTA cause L1D eviction (L1D.REPLACMENT)

≪ Previous: What is the secret of Locks and Waits?

Hi there,
I'm trying to implement the VTune APIs resume/pause in my code. I'm following the example from here:

http://software.intel.com/en-us/articles/how-to-call-resume-and-pause-ap...
(case 1)

Actually, I wrote the same code of case 1 and it works, but if I add "-mavx" to

icpc -g test.cpp -I/opt/intel/vtune_amplifier_xe_2011/include /opt/intel/vtune_amplifier_xe_2011/lib64/libittnotify.a -lpthread -o test

VTune doesn't collect anything (it doesn't resume, I think). I tried -msse3 and similar options and the problem is still there. I tried to change the code, but no hope. My conclusion is that it is something wrong with the -mavx and the VTune APIs (note that VTune collects data if I remove the APIs).

Any solution? I know that for the moment I can remove -mavx, but this is part of a bigger code, so the option must be there when running on Sandy Bridge.

I'm running on Linux64 and the compiler version is:

Best regards,

Alfio

↧

PREFETCHNTA cause L1D eviction (L1D.REPLACMENT)

January 10, 2013, 9:02 am

Latest and popular articles on Intel Technologies

≫ Next: How to reduce the data output in Lightweight Hotspots Analysis

≪ Previous: Problem with VTune APIs resume/pause and -mavx option

Hello, it seems I have some kind of misunderstanding. I am expecting that PREFETCHNTA prefetchs data to 2nd level cache and doesn't evict anything from L1D. But in vTune I can clearly see that in function that contains only prefetchnta (as a microbenchmark) many L1D.REPLACMENT events atributed to every non-temporal prefetch instruction. So it means prefetched data is actualy reach L1D cache, right?

What is wrong in my undertsanding or what did I miss? My intention is process block of data there every piece is needed only once, so that is why it would be better to avoid bringing it in L1D and use non-temporal operations.

Any recomendation for SandyBridge and new Intel platrforms?

BTW does non-temporal load to AVX register available in SB (somthing like MOVNTDQA)?

Thanks in advance.

AORM says

" The non-temporal instruction is: PREFETCHNTA— Fetch the data into the second-level cache, minimizing cache pollution."

and

L1D.REPLACEMENT - Replacements in the 1st level data cache.

↧

How to reduce the data output in Lightweight Hotspots Analysis

January 10, 2013, 2:12 pm

Latest and popular articles on Intel Technologies

≫ Next: User Tasks show all red with no names

≪ Previous: PREFETCHNTA cause L1D eviction (L1D.REPLACMENT)

I understand it's necessary to save all the event data in order to do the analysis. However, saving it to one big .vtss file is the dumbest thing I'ver ever seen. I appologize for the words I'm using. I have even a file with 16GB. It crawls and crashes when it load and resolves. Any suggestion to resolve this issue?

I sent out crash report 2 days ago, never heard any feed back. If the tech support is so bad, why bother? Sorry about the attitude.

↧

User Tasks show all red with no names

January 10, 2013, 2:37 pm

Latest and popular articles on Intel Technologies

≫ Next: Skip the system specific Dll's from profiling

≪ Previous: How to reduce the data output in Lightweight Hotspots Analysis

I recently started using the __itt_task_begin / end APIs to mark code segments. In the Tasks and Frames section of a capture the tasks show up in what looks like correct nesting, but they are all red and don't have the names. I've double checked and it looks like valid string handles are going into the task_begin functions. Any idea what's going on here? Thanks.

↧

Skip the system specific Dll's from profiling

January 11, 2013, 2:14 am

Latest and popular articles on Intel Technologies

≫ Next: Java profiling crash

≪ Previous: User Tasks show all red with no names

I was a user of the older version of the VTune Performance analyzer . In the older version I could skip the instrumentation of the desired Dll's . However in the new VTune Performance amplifier I could not locate this option . Can anyone point me to a similar setting in the new VTune Performance amplifier ?

↧