Quantcast
Channel: Intel® Software - Intel® VTune™ Profiler (Intel® VTune™ Amplifier)
Viewing all 1574 articles
Browse latest View live

New experimental feature: Caller / Callee view!

$
0
0

Intel® VTune™ Amplifier XE 2013 Update 5 (released Feb. 26, 2013) includes a new experimental feature that can be enabled with an environment variable. 

An experimental feature is a beta quality feature that may or may not appear in a future production release.  We need feedback from users who try it on real code to tell us if we should keep it, change it or drop it.  Note: data collected with the experimental feature enabled is not guaranteed to be backward compatible with future releases. Please give this new feature a try, and tell us what you think!

The Caller/Callee View, when enabled, provides the ability to explore functions calling the selected function (callers) and functions called by the selected function (callees), and is displayed as a separate tabin all viewpoints that include call stack data (e.g., Hotspots and Lightweight-Hotspots, General Exploration, etc. when the "Collect stacks" options has been checked).

Below is an example display in which you can see that:

  • The left window is a flat profile view with functions and attributed self and total time metrics.
  • The top right window shows callers of a selected function in the flat profile as a bottom-up tree with total metrics.
  • The bottom right window shows callees of the selected function as s top-down tree.
  • By right-clicking on any function, a context menu is displayed that allows you to filter the data, as in other views, as well as other actions appropriate for the view.  Filtering is done on Total Time basis, so you get all sub-trees that include the selected function at any level.

To enable this new, experimental feature, set the AMPLXE_EXPERIMENTAL environment variable equal to “caller-callee” and launch the product.  On Windows*, open a command prompt from the Startup menu:

And execute the following commands:

C:\> set AMPLXE_EXPERIMENTAL=caller-callee
C:\> amplxe-gui

Or, on Linux, source the variables and execute the commands, e.g.:

$ source /opt/intel/vtune_amplifier_xe/amplxe-vars.sh
$ export AMPLXE_EXPERIMENTAL=caller-callee
$ amplxe-gui &

Please try it and post your questions, comments, and suggestions in this thread.


Amplifier XE2013 Error 0x40000002 Insufficient memory Failed to finalize result

$
0
0

Hello experts,

I am evaluating Parallel Studio XE 2013 update 2 to see if this software could be useful for our software development needs. On most runs it crashes with message "Failed to finalize the result - Error 0x40000002 (Insufficient memory)". Here is what I see in the output window.

The database has been cleared, elapsed time is 0.436 seconds.
Raw data has been loaded to the database, elapsed time is 21.130 seconds.
Data transformations have been finished, elapsed time is 2.387 seconds.
Precomputing frequently used data has been finished, elapsed time is 0.011 seconds.
Finalizing the result took 24.252 seconds.

Sometimes, but not all the time, when I use the "Re-resolve" option it recovers and displays all the results ok. I am pretty sure that my machine has enough available memory. However it could be that I am missing a setting that allows the Amplifier to use more memory? Any help with this will be appreciated. Please let me know if you need more information.

Thank you,

SK

Sandy Bridge Memory analysis problems

$
0
0

Trying to use the analysis type "Memory access - Sandy Bridge and Ivy Bridge" in VTune 2013 on an application. App runs fine, and results are collected and processed but the application opens the viewpoint "Task Time". Task Time appears to be the wrong viewpoint for this analysis and thus it shows basically nothing useful. It does not allow me to switch the viewpoint to anything else either.

Any help here?

Setup:

Windows 7 x64, VTune Amplifier 2013 XE Update 4 (build 270817). CPU: 3610QM (quad-core ivy bridge).

Vtune Amplifier XE for Multicores, how it works?

$
0
0

I'm using Intel Vtune Amplifier XE 2013 to profile a parallel program running on a multicore CPU, in particular it is written in OpenCL and executed in Xeon Phi. I wonder how should be the exact interpretation of the results brought by Vtune, i.e.,

  1. Is it the value of the performance counter collected by a single thread or the whole core? (Assuming there are many cores in a CPU and many threads can be executed concurrently on a core, as in case of Xeon Phi).
  2. How does Vtune sample on a multicore CPU? Does it sample on a single core and report it, or sample on many cores and take the average?

New experimental feature: OpenCL* performance analysis on Intel® HD Graphics

$
0
0

If you use the recent Intel® SDK for OpenCL*  Applications you might know that Intel® VTune™ Amplifier XE 2013 Update 5 (released Feb. 26, 2013) includes a new experimental feature of OpenCL* performance analysis on Intel® HD Graphics. 

Graphics Processing Unit analysis is an area of active research and the VTune team is very interested in your feedback and suggestions.

Please try it and post your questions, comments, and suggestions in this thread.

AttachmentSize
Downloadgpuoclview1.png268.42 KB

Hardware event construction

$
0
0

Sorry to ask a stupid question. We have a 4P Intel Xeon E5-4620 CPU (Sandy Bridge), and we want to use vtune to moniter some hardware events in the packege.  Our motherboard is Dell R720. So, where can we get the events that can be sampled with vtune-update5? I have seen a list in the last section of http://software.intel.com/sites/products/documentation/doclib/stdxe/2013...

But a lot of events that claimed to be supported in Xeon processors cannot be sampled. For example, when run

amplxe-cl -collect-with runsa -knob event-config="SNOOP_RESPONSE.HIT" ls -alrt

I get

amplxe: Error: Invalid Event SNOOP_RESPONSE.HIT discarded.

However, I succeeded once with "OFFCORE_RESPONSE.PF_L2_DATA_RD.LLC_MISS.ANY_RESPONSE_0" event. But I have tried several others and get the same "invaild event" response, like OFFCORE_RESPONSE_0.DATA_IFETCH.LLC_MISS_LOCAL_DRAM, MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS  etc.

The purpose of my test is to using "OFFCORE_RESPONSE_0" with snoop filter to moniter the snoop traffic between CPUs. The method to sample OFFCORE_RESPONSE_0 event is as section 18.8.5 in http://download.intel.com/products/processor/manual/325462.pdf

But what is way to utilize this method in vtune?

I also made sure that there was only one instance of vtune running in the system. So where can we get the valid event list or how can we construct the righ event for moniter?

Thanks a lot!

Locks and Waits analysis AV on SetEvent

$
0
0

Hi,

When using the Locks and Waits analysis, the tpsstool hooks the SetEvent API. During startup of our application not too much is happening in parallel and everything works. At some stage the application starts receiving and processing a lot of data in parallel (8 threads) and then the hook indirectly causes an access violation and the unhandled exception handler is called:

MSVCR80!_CxxFrameHandler3+0x180
MSVCR80!_CxxExceptionFilter+0x2db
MSVCR80!_CxxExceptionFilter+0xafb
MSVCR80!_CxxExceptionFilter+0xd3b
MSVCR80!_CxxFrameHandler3+0x77
ntdll!RtlDecodePointer+0xbd
ntdll!RtlUnwindEx+0xbbf
ntdll!KiUserExceptionDispatcher+0x2e
ntdll!RtlAcquireSRWLockExclusive+0x13
ntdll!RtlDeregisterWaitEx+0x62
KERNELBASE!UnregisterWaitEx+0x1b
tpsstool!tpss_tp___itt_model_disable_push_post_cbk_impl+0x139daa
tpsstool!tpss_tp___itt_model_disable_push_post_cbk_impl+0x13a7f3
<Our code calls SetEvent()>

I tried both Update 4 and 5 and got the same result and it's completely reproducable. HotSpots analysis is working fine. Would have expected ERROR_INVALID_HANDLE in case there was a problem with the lifetime of the event object. Alltough I have never seen any evidence of our handling of mutexes there probably is a problem. Before spending another N days on this I want to know wether anybody recognized the callstack above? If so, what was the problem/fix in your case?

Running Windows Server 2008 R2 SP1, x64 and Xeon E5-4640 processor.

Patrick

Defining code regions to profile

$
0
0

Is there any way to define code regions (functions, modules etc) or time frame to profile and ignore anything else. For instance, if I have an application which sets up a large collection of objects (which takes a lot of CPU resources) then runs a relatively fast function A and exits, can I make VTune ignore anything except for A? I know that it is possible to zoom in using the timeline, but I'd rather send notifications from the application. Something like:

PrepareObjects();

StartCollectingData();

A();

StopCollectingData();

Or it would be nice to somehow select functions/classes of interest.

Is it possible in VTune?


Feedback on Update 5 of Intel VTune Amplifier XE 2013

$
0
0

I recently applied Update 5 of Intel VTune Amplifier XE 2013 on Windows 7 Professional ( 64-bit ). At the end of update when I pressed the Finish button Internet Explorer ( IE ) did Not display the Getting Started web-page. IE was in a very strange Not Responding state and after 1 minute of waiting I terminated IE.

That was the only issue I've experienced and a previous Update 4 worked well. Another thing is I think a message:

Install detected conflicts on the system

An older version of  Intel VTune Amplifier XE 2013 is already installed.
The currently installed Intel VTune Amplifier XE 2013 Update 4 will be automatically removed prior to installing Intel VTune Amplifier XE 2013 Update 5.

needs to be changed to ( without a word conflicts / it is very confusing ):

Install detected an older version of  Intel VTune Amplifier XE 2013 is already installed.
The currently installed Intel VTune Amplifier XE 2013 Update 4 will be automatically removed prior to installing Intel VTune Amplifier XE 2013 Update 5.

Thanks.

 

OpenCL GPU analysis working partially

$
0
0

First of all, many thanks to the VTune team to implementing OpenCL GPU profiling to the tool.

I was trying out the tool on a command-line OpenCL application running on the HD 4000. I followed the documentation and was able to enable the GPU profiling support in VTune. I profiled my application and  some metrics such as average execution time of the kernel, EU array busy and stalled work fine. However, some other metrics, such as memory bandwidth, still report 0.0. I have the latest HD 4000 driver installed with OpenCL 1.2 support. My application is not using DirectX, only OpenCL.

Any ideas?

Multiple installations of Parallel Studio XE

$
0
0

I am using Parallel Studio 2013 under the non-commercial license.

I installed Parallel Studio 2013 update 3 just the other day and ran into a problem with Vtune crashing.  I would like to reinstall Parallel Studio 2013 update 1 to see if I can track down the cause of this SEGV under Vtune update5.

I am not installing in the default location.  I will be switching between installations during testing using 'modules'.  This is how the admins at work do it so I know it's possible.

The release notes PDF provided with update 1 state on page 4.

"You do not need to uninstall previous versions or updates before installing a newer version –
the new version will coexist with the older versions."

Yet I get the following warning during installation

Step no: 1 of 7 | Previous versions detected
--------------------------------------------------------------------------------
Installation program has detected other version of the product installed which
must be uninstalled to continue installation.
--------------------------------------------------------------------------------
1. Continue with installation [default]

These products will be uninstalled automatically if newer version is selected
for installation:
    Intel(R) Parallel Studio XE 2013 Update 3 for Linux* common files
    Intel(R) VTune(TM) Amplifier XE 2013 Update 5
    Intel(R) Inspector XE 2013 Update 5
    Intel(R) Advisor XE 2013 Update 2

Whom should I believe?

Can you point me to documentation on how to multiple, segregate instalations of Parallel Studio?

Dan

Why thread td 0x0 belongs to autochk.exe in vTune?

$
0
0

Hello

I am collecting data for whole system. As I could see 22% (picture is below) of the execution happens in thread with tid 0x0. Knowing that I found which process is an owner of thread tid=0x0. It happens to be autochk.exe, from my point of view it is strange as almost all IO drivers working in the systems was executed in the context of thread 0x0.  I would rather think that thread 0 belongs to idle process 0. At least it would explain why so many DPC and interrupts are handled in this thread.  

My questions are:

1. What does thread with tid 0x0 do in Windows? Does it have dedicated purpose?

2. Is it correct that Thread tid 0x0 belongs to autochk.exe?

3. Why so much drivers execution is attributed to thread tid 0x0

AttachmentSize
Downloadcapture.jpg16.03 KB

Vtune gui crashes when I "File->Open->Results", ubuntu linux 12.10

$
0
0

This issue is with regards to "Intel VTune Aplifier XE 2013" Update 5, build 274450.

Our target system is running CentOS 6.  It doesn't have graphics capabilities (doesn't use them). I have installed the cli-only version on the target box and use amplxe-cl to run our app and gather information.  Once the program being analysed exits, an "r*" directory is generated. I copy that directory down onto another box for analysis.

The viewer with the GUI side is running Ubuntu 12.10 under KDE.  I copy the "r*" directory from the target into my working directory and run ample-gui.  I direct it to File->Open->Result, and point it to the "r*" directory.  And the GUI generates a seg-fault and presents me with the information contained in the attached file.

I have access to a Windows 7 machine, so I installed "Intel VTune Aplifier XE 2013",  copied the "r*" directory from my target, and loaded it into the gui.  No crash.

To complete this task, I need this to work in a linux vm.  I'm still in the eval-period for the vtune product...  Any idea why its crashing?  Is there a work-around?  I don't need the VM version to do any analysis... I just need it to do presentation of the info recorded on the target.

-d

AttachmentSize
Downloadresults.txt64.38 KB

How to resolve "checksum mismatch" and related symbol lookup fails

$
0
0

I've been able to run vtune several times against our target application.   But I haven't figured out how to get the symbols to go into the collected data.  For example, when I shut down the program, I see the following warnings:

amplxe: Using result path `/shared/vtune/scripts/r001hs'
amplxe: Executing actions 34 % Resolving information for `libtpsstool.so'      
amplxe: Warning: Cannot locate symbols for file `/shared/vtune/opt/intel/vtune_amplifier_xe_2013/lib64/libtpsstool.so'.
amplxe: Executing actions 34 % Resolving information for `libdl-2.5.so'        
amplxe: Warning: Cannot locate symbols for file `/lib64/libdl-2.5.so'.
amplxe: Executing actions 35 % Resolving information for `libfips.so'          
amplxe: Warning: Cannot locate symbols for file `/usr/lib64/libfips.so'.
amplxe: Executing actions 35 % Resolving information for `libc-2.5.so'         
amplxe: Warning: Cannot locate symbols for file `/lib64/libc-2.5.so'.
amplxe: Executing actions 36 % Resolving information for `libpthread-2.5.so'   
amplxe: Warning: Cannot locate symbols for file `/lib64/libpthread-2.5.so'.
amplxe: Executing actions 36 % Resolving information for `txx64.debug'         
amplxe: Warning: Cannot match file `/usr/bin/txx64.debug': checksum mismatch.
amplxe: Warning: Cannot locate symbols for file `/usr/bin/txx64.debug'.
amplxe: Executing actions 37 % Resolving information for `ld-2.5.so'           
amplxe: Warning: Cannot locate symbols for file `/lib64/ld-2.5.so'.
amplxe: Executing actions 50 % Generating a report 

The application I'm vetting is "txx64.debug" which includes debug symbols.  I'm curious what the checksum mismatch is from.  And I'm curious why it says "Cannot locate symbols for file ...txx64.debug" when I can use "strings txx64.debug" and see the symbols.

Many thanks

-d

Error: PMU resources are not available.

$
0
0

Intel VTune Amplifier XE 2013

Windows 8 64 bits. VS 2010 & VS 2012. IvyBridge, i7, GT2.

when I try to run bandwidth - Sandy Bridge / Ivy Bridge analysis, I get this error:

Error: PMU resources are not available. Hardware Event-based Sampling is not supported on this operating system.

Then I checked driver status - here's what I got. In admin commandline of coz.

C:\Program Files (x86)\Intel\VTune Amplifier XE 2013\bin32>amplxe-sepreg -c
Checking platform...
Platform is genuine Intel: OK
Platform has SSE2: OK
Platform architecture: INTEL64
User has admin rights: OK
Drivers will be installed to C:\Windows\System32\Drivers\
Checking sepdrv3_10 driver path...OK
Checking sepdrv3_10 service...
Driver status: the sepdrv3_10 service is running
Checking sepdal driver path...OK
Checking sepdal service...
Driver status: the sepdal service is running

C:\Program Files (x86)\Intel\VTune Amplifier XE 2013\bin32>amplxe-sepreg -i -v

Stopping service sepdrv3_10...OK
Copying file C:\Program Files (x86)\Intel\VTune Amplifier XE 2013\bin64\sepdrv\w
in7\sepdrv3_10.sys to C:\Windows\System32\Drivers\sepdrv3_10.sys...OK
Installing service sepdrv3_10...OK
Warning: service sepdrv3_10 already exists
Starting service sepdrv3_10...OK
Writing startup key to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\sepdrv3_10...OK
Stopping service sepdal...OK
Copying file C:\Program Files (x86)\Intel\VTune Amplifier XE 2013\bin64\sepdrv\w
in7\sepdal.sys to C:\Windows\System32\Drivers\sepdal.sys...OK
Installing service sepdal...OK
Warning: service sepdal already exists
Starting service sepdal...OK
Writing startup key to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\sepdal...OK
VTSS++ driver found
Deleting system32/drivers/vtss.sys file...OK
Forming source path for vtss.sys...OK
Forming destination path for vtss.sys...OK
Copying file C:\Program Files (x86)\Intel\VTune Amplifier XE 2013\bin32\.\..\bin64\sepdrv\vtss.sys to C:\Windows\system32\drivers\vtss.sys...OK
Installing and starting VTSS++ driver...FAILED


"Failed to finalize the result"

$
0
0

I am using XE 2011. Seit last week I always get the Message while running Hotspot analysis""Failed to finalize the result.  the result you ar opening is empty. this may be caused by an error during data collection. try to re-run the analysis"

I am using Wiindow 7, 64 bit, starting as admin. and when I press "Start Paused" it ieither gnore it and starts immediately starts or produce the above error message.

snb-memory-access: Cannot run data transformation `Compute CPU Usage'

$
0
0

Greetings, my apologies if this has been answered elsewhere - I have not been able to find a solution on this forum yet.

I'm unable to use amplxe-cl to collect memory access information. Here is a sample invocation with output:

$ amplxe-cl -collect snb-memory-access ./fft 999999
amplxe: Using result path `[removed]/r000memacc'
amplxe: Executing actions 16 % Processing profile metrics and debug information
amplxe: Warning: Error 0x40000026 (Database interface error) -- Cannot run data transformation `Compute CPU Usage'.
amplxe: Executing actions 50 % Generating a report
Result Info
-----------
Parameter                 r000memacc
------------------------  --------------------------------------------------------------------------------------------------------------------------
Application Command Line  ./fft "999999"
CPU Name                  3rd generation Intel(R) Core(TM) Processor family
Computer Name             [removed]
Environment Variables    
Frequency                 3400000000
Logical CPU Count         8
MPI Process Rank         
Operating System          3.5.0-26-generic DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=12.04
DISTRIB_CODENAME=precise
DISTRIB_DESCRIPTION="Ubuntu 12.04.2 LTS"
Result Size               987748
User Name                

Summary
-------
Elapsed Time:  0.000
amplxe: Executing actions 100 % done

However, collecting hotspots works correctly and I am able to get a working report.

Unable to use "Cycles and uOps" analysis

$
0
0

I am unable to run "Cycles and uOps" analysis under Sandy Bridge / Ivy Bridge / Haswell Analysis. It says - This analysis type is only defined for the processors based on the intel microarchitecture code name Sandy Bridge / Ivy Bridge.

I am running vtune as administrator and "Run as Administrator". System details are 

Product version: Update 5 (build 274450)

OS: Windows server 2008 R2 standard Service Pack 1

Processor : intel i5-3570 @ 3.4 GHz - Turbo boost disabled

Memory: 8GB

Can anyone help me on this?

Collection failed. The data cannot be displayed. Error: Binary load failed.

$
0
0

According to this error reporting this is definitely a pre-C++-Exception-Handling application (which means pre-1996).

And I'm copying this error message by hand since copy&paste is also not yet invented!

Analysis is:

General Exploration - Knights Corner Platform.

On the console it says (again -- somebody deep in the library using printf since we don't know about C++-Exception-Handling):

[peterf@tmp68 SAMSUNG]$ osi_DLL_Error: libscif.so.0: cannot open shared object file: No such file or directory
Unable to open shared library: libabstract_mic_host.so
SEP_Set_Up_Abstract_Callback_table returned error: 37

[Outside Any know module] In vtune amplifier

$
0
0

 I generated report in linux system for c++ code by attaching it to the running process using vtune amplifier  and then when I am viewing it in windows

in top -down tree , it show that [outside any known module] has taken 30 % of CPU time while for rest 70% time it shows some function names.

 I want to know what all are the possibile locations ,tasks,activities etc  where this 30% time is getting consumed  and is not mentioned in the report

 

Any kind of probable ideas may be helpful.

Thanks

Viewing all 1574 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>