Quantcast
Channel: Intel® Software - Intel® VTune™ Profiler (Intel® VTune™ Amplifier)
Viewing all 1574 articles
Browse latest View live

Could not profile on Android device

$
0
0

Hi, I have tried three phone, none of that could let me profile my app...

Version Intel Vtune Amplifier 2016 for Systems (windows)

Phone : Nexus 5 without root

The Intel Vtune Amplifier app could found on phone, but the host would show

Amplifier cannot detect Android device configuration.

 

Phone:Samsung S6 and M9, bothwithout root

Only show "Amplifier cannot detect Android device configuration"

I'm not sure it's because vtune not support it's CPU

 

Thanks

Zone: 


Getting "Error : Cannot connect to data source provider"

$
0
0

Hi

I am using the following free trial version: vtune_amplifier_xe_2016.3.0.463186 on Ubuntu 14.04. This worked fine on one of my boxes. But on another box, I get the analysis running and on stopping or after the configured time, I get an error saying "Cannot connect to data source provider". And no analysis is provided. Cannot open the analysis results by giving the path also. I get the same error.

Can someone provide any clue on what the issue could be. I checked all the kernel flags mentioned and everything looks fine. The exact same steps are done on my another machine where it works. I am testing it with a very simple C code for debugging.

Thanks

Anzal

Profiling output data locally on Windows

$
0
0

Hello,

I have access to VTune on a supercomputer running Linux, and on my local Windows machine. I would like to avoid the lag of running the VTune GUI on the supercomputer over a remote connection. Instead, I would like to transfer the VTune output data from the supercomputer to my Windows home computer, and then point my local copy of VTune to the data.

Is there a way to do this? I can't find any option for point VTune to existing data on Windows (on Linux I used the amplxe-gui command).

Thanks!

Zone: 

Thread Topic: 

How-To

vTune client / license

$
0
0

Hi,

When opening profiling results in vTune Amplifier XE I get the following error:

"License check failed. Cannot find valid license. Data cannot be displayed. Error 0x4000001f (No valid license) -- Your support services License expired. Buy a new license to use this version of the software."

However, my IT department insist that we have bought a license, and the "client setup is wrong" and ask me: 

"can you please just post a question to Intel – regarding how to setup Intel vTune 2016 to use the license server on st-vlic01: 28518@st-vlic01.st.statoil.no"

So... can someone help?

Hang after start to profile Android phone

$
0
0

I have try to profile  app on Nexus 5.

I could start profile with "Attach to Process" and "Launch Android Package" with basic hotspot.

The Launch Android Package could open app, 

But after I press stop, it would have no response.

(cmd continue to be black in whole progress)

And when I try to change project, 

The window of

"Please wait until the collection is cancelled" would appear and stay there (like forever, unless I remove my phone)

(2) Attatch to progress

Similar result, could start,   but no response after I press stop in GUI. and black cmd window in all progress

 

Thanks

Zone: 

Thread Topic: 

Question

Problem evaluating GPU hotspots

$
0
0

Hey guys,

I'm trying to analyze GPU hotspots on my chromebook (Intel Core i3-5005U Processor -  Intel HD graphic 5500 - graphic driver i915) using Vtune_Amplifier_XE_2016. 3.0.463186. 
I keep getting this error when I run  "amplxe-cl -collect gpu-hotspots -- <my app> " 
amplxe: Error: Your version of the Intel Graphics Driver is obsolete and needs to be updated before collection. 

Here is the output of "amplxe-runss --context-value-list" :

targetOS: Linux

OS: Linux

isPtraceScopeLimited: false

isTSXAvailable: false

isHTEnabled: true

isSGXAvailable: false

LinuxRelease: 3.14.0

Hypervisor: None

isPtraceAvailable: true

areGpuHardwareMetricsAvailable: UnsupportedInterfaceVersion

ETW: NA

isEtwDxSupported: no

isEtwCLRSupported: no

isPowerAnalysisAvailable: PlatformError

isPowerKernelStacksAvailable: false

isFtraceAvailable: yes

isMdfEtwAvailable: false

isCSwitchAvailable: yes

isGpuBusynessAvailable: yes

isGpuBusynessDetailsAvailable: yes

isGpuWaitAvailable: yes

isFunctionTracingAvailable: yes

isIowaitTracingAvailable: yes

isVSyncAvailable: yes

isSEPDriverAvailable: false

platformType: 97

CPU_NAME: 5th generation Intel(R) Core(TM) Processor family

PMU: broadwell

referenceFrequency: 2000000000

isVTSSPPDriverAvailable: false

isNMIWatchDogTimerRunning: true

LinuxPerfCredentials: Kernel

LinuxPerfCapabilities: breakpoint:raw;cpu:raw,format,events;software:raw;tracepoint:raw;uncore_imc:raw,format,events

LinuxPerfStackCapabilities: fp,dwarf

isTPSSAvailable: true

isPytraceAvailable: true

isGENDebugInfoAvailable: false

 

 

Any Idea how to fix this? Do I need an older version of Vtune to get this to work? if so, how could I figure out what driver version Vtune needs to perform GPU analysis?

 

I'd be thankful if you could help me to get around this issue.

Zone: 

Thread Topic: 

Question

An assertion failure on VTune Amplifier Update 4

$
0
0

Hi there, 

Please help. The VTune Amplifier XE update 4 (Windows 10 64-bit, VS2015 Enterprise) crashes when finalizing results. I have tried to uninstall and reinstall again. However, it doesn't help. The same error occurs on both VS2015 integration and the GUI. 

Here are the first part of the crash log messages. 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

[Process]
Process Name: amplxe-eil-bridge.exe
Virtual Memory Size: 775632K
Working Set Size: 223784K

[Assertion]
CrashedPID: 7232
CrashedTID: 10488
Expression: Can't properly initialize leveldb cache.
File: C:\bb\INNLphep2w6r\b\b\tmpnxqvft\vcs\dbinterface1\src\sqlite\timelinedb\timeline_fill_helper_impl.cpp
Line: 42
Product: Intel(R) VTune(TM) Amplifier XE 2016; 470476
ReportPath: C:\temp\amplxe-log-wdc\2016-07-28-Thu-17-06-31-453.amplxe-eil-bridge.exe\

[Process]
Process Name: amplxe-gui.exe
Virtual Memory Size: 677376K
Working Set Size: 226276K

[Assertion]
CrashedPID: 9164
CrashedTID: 7672
Expression: Can't properly initialize leveldb cache.
File: C:\bb\INNLphep2w6r\b\b\tmpnxqvft\vcs\dbinterface1\src\sqlite\timelinedb\timeline_fill_helper_impl.cpp
Line: 42
Product: Intel(R) VTune(TM) Amplifier XE 2016; 470476
ReportPath: C:\temp\amplxe-log-wdc\2016-07-28-Thu-17-15-37-942.amplxe-gui.exe\

[Products]
Package ID: N/A
Package Contents: Intel(R) VTune(TM) Amplifier XE 2016 Update 4
Build Number: 470476

Zone: 

Thread Topic: 

Help Me

Pause /Resume API on Amplfier XE 2016

$
0
0

I try to repeat the steps described in the article using the newer 2016 version of Vtune Amplifier XE. However the behaviour of the test run is not correct.

The build command is the following:

icpc -g -mmic test_itt.cpp $AMPLIFIER_XE_INC -L$AMPLIFIER_XE_BASE/bin64/k1om -littnotify -lpthread -o test_mic

I run application natively form the host on a coprocessor:

amplxe-cl -collect advanced-hotspots --target-system=mic-native:mic0 --search-dir=. -start-paused /home/test_mic

and obtain the following (snippet):

amplxe: Warning: Pause command is not supported for managed code profiling. Runtime overhead is still possible. Data size limit may be exceeded.
amplxe: Collection paused.
Sampling session is already stopped
The sampling collection paused.
amplxe: Collection resumed.
The sampling collection resumed.
amplxe: Collection stopped.

It is clear wrong behaviour since I expect two resumes and two pauses. Moreover the messages are not clear to me at all. Why there are some without "amplxe:" prefix which apparently only repeat information?

Thread Topic: 

Question

How to detect that user task analysis is active?

$
0
0

We would like to add user task analysis to a project. The collection in our case is best achieved using a callback registration mechanism, which has a bit of overhead, but has the advantage that the overhead could potentially zero in case no user task analysis has been requested.

To achieve this, we check the domain->flag to see if analysis is active, and only in that case register the callbacks. Unfortunately, it seems that the flag is non-zero as soon vtune is attached (regardless if user task analysis has been requested or not).

Is there a way to check from within the application if user task analysis has been requested?

Thanks!

Stephan

Thread Topic: 

Question

Impact of h/w counter bugs on general exploration results

$
0
0

Hi,

I've been testing out the top-down model that the general exploration mode offers on Haswell-EP (E5-2670). I am especially interested in the L3 Bound breakdown. But then I noticed many of the counters used appear to be buggy. Errata HSM26 and HSM30 (http://www.intel.com/content/dam/www/public/us/en/documents/specificatio...) are worrisome. HSM30 appears to only apply to SMT mode, but HSM26 states that counters may undercount by as much as 40%.

I didn't notice any warnings within vtune about such issues. How should I interpret metrics such as "Contested Accesses", "Data Sharing", "L3 Latency" etc. that may be impacted by the various errata?

The main errata I found that applies to top-down analysis:

HSM26: Certain Local Memory Read / Load Retired PerfMon Events May
Undercount  (Undercounts up to 40% have been observed - seems this one is the biggest issue)

HSM30: Performance Monitor Counters May Produce Incorrect Results (Seems to be less serious than the above, and can be worked around by disabling SMT in BIOS)

HSM31: Performance Monitor UOPS_EXECUTED Event May Undercount  (Not too concerned about this one as it seems rare)

How much should we worry about each one? Running in ST mode also improves multiplexing reliability so that seems to be a smart move. But not sure what do make of HSM26.

Best regards,

Gisle

Zone: 

Thread Topic: 

Question

Understanding Advanced Hotspots CPI

$
0
0

Hi,

I'm starting to learn how to analyze OMP projects, and I've found that AH in Vtune gives me CPI, and I'm trying to better understand what is displayed.

For arguments sake, if I've set # threads to 4 on a 4 core machine and the CPI displayed is 1, is this 1 for each core or for the entire machine? As in if the machine advanced X cycles, then has each core done X instructions or X/4?

 

Thanks,

Arik

Zone: 

Thread Topic: 

Question

Howto - Advanced Hotspots on KNL

$
0
0

I have a KNL system, Installed Parallel Studio XE Cluster Edition 2016.0.3. C++ and IVF projects build (so does Intel MPI)

Having an issue with VTune Amplifier running Advanced Hotspots.

As distributed, the VTune drivers were not installed during installation of PS (due to KNL CPU), however, in following the instructions for building the drivers, the build succeeds, and installation of drivers succeeds

insmod-sep -q

shows

pax driver is loaded and owned by group "vtune" with file permissions "666"
socperf2_0 driver is loaded ...
sep4_0 driver is loaded...
vtsspp driver is loaded...

groups shows my user account is a member of vtune

Any hints on how to get5 hardware sampling working would be appreciated.

Jim Dempsey

Zone: 

Thread Topic: 

How-To

Installing VTune in a restricted environment

$
0
0

Hi all,

Recently, I have been tasked with installing VTune on a couple of Linux development servers. There are several problems though:

  1. The servers have no or limited Internet connectivity. They can connect to a license server though, running on a different machine.
  2. There is no way how to obtain root or sudo access (only a regular user account is available).
  3. All software packages MUST be installed via RPM (or yum).

VTune uses an install script which cannot be used for the reason number 3. Furthermore, I understand VTune contains a kernel module which needs to be somehow introduced into the system, and needs to match the running kernel. I am sure there are other people using this software as well in similarly restricted environment, is there some guidance on the following?

  1. Can the installation be performed using only RPMs?
  2. Can the kernel module be precompiled for a given OS and packaged into RPM as well?
  3. Following a kernel update, will the module continue to work or will it need to be reinstalled?

Many thanks in advance,

Ondrej

Thread Topic: 

Question

VTune not working in Ubuntu 16.04

$
0
0

Hello,

Today I received an Intel Parallel Studio XE Cluster Edition license for Linux, I tested VTune's against Ubuntu 16.04, I have never used it before but it was clear that the SW was missing some functionality.

After some troubleshooting and thanks to this guide: https://software.intel.com/en-us/sep_driver, found that the problem was the vtsspp driver, this failed to compile during installation.

After modifying the #if condition (kernel version) in both files: vtsspp/module.c (Line 755) and vtsspp/collector.c (Line 1811) the SW compiled and installed successfully, bringing VTune's functionality back.

Regards,

Federico Tula Rovaletti

VT Amplxe 2016 update 4 not working on Windows 10 Anniversary Edition

$
0
0

I have two Skylake systems that are running Windows 10 Anniversary Edition and neither of them are able to get vtss.sys working.  Eventlog reports the failure: "The vtss service failed to start due to the following error:%%4294967290".  [This magic number is -6 and disassembling vtss.sys, I can see this is used and appears in a number of places in the code (disassembly). The Linux source shows this -6 is VTSS_ERR_NOTFOUND (which is not referenced in the Linux source). ]

I have followed the guidance on the forums and HyperV is off (not checked in programs and features and disabled via BCDEdit).  Various Tools indicate my BIOS has VT enabled.  I am left to a default diagnosis that something is using the PMU resources (Windows 10 AnnEd or something else - Visual Studio 2015 Update 3??) amplxe-sepreg -u, followed by amplxe-sepreg -i -v shows that "Installing and starting VTSS++ driver...FAILED". 

 

My goal is to be able to use EBS.

 

 

 

 

 

Thread Topic: 

Bug Report

Improve multi-threading of VTune itself

$
0
0

When performing VTune (16.0.3) runs of a few minutes (e.g. 3), the time to read in and finalize the sample data is excruciating long. In looking at the resource display of the System Monitor (Linux on KNL) it appears that very few threads, perhaps only 1, is involved in preparing the data for analysis display.

Can this be made more multi-threaded?

Jim Dempsey

kernel:BUG: soft lockup

$
0
0

Using VTune 16.0.3, on KNL, Basic Hotspots.

Experiencing: kernel:BUG: soft lockup - CPU#...

Any recommendations?

Jim Dempsey

Difficulty profiling Linux daemon

$
0
0

I feel like this has probably been asked before but I was unable to find any post that addresses my questions.

I am attempting to profile a system daemon that I wrote on a Linux system (Ubuntu 14.04.5; 3.19 kernel) during its normal execution.  My daemon starts running as root but then de-escalates its privileges to a system account (i.e. a daemon or service account with no password or login ability).

When I attempt to run collection using amplxe-cl as root and attach to my daemon process I get this error message:

amplxe: Error: Data collection is interrupted because credentials of the target process and VTune Amplifier do not match. Please start both target process and VTune Amplifier with the credentials of the same user and try collecting data again.
amplxe: Collection failed.
amplxe: Internal Error

 

I can understand in other cases but if I'm running as root, why can't I profile any process that I want?  

Next I tried running amplxe-cl using sudo to run it as the same account that the daemon is running (let's say the username and group are "bob") like this:

sudo -u bob -g bob amplxe-cl -collect hotspots -r <some dir> --target-pid <some pid>

 

This appears to work, at least in that it doesn't fail and return an error immediately, but when I attempt to stop collection by hitting CTRL-C, nothing happens and the process hangs.  I then attempted to issue the "stop" from another terminal and got this error:

amplxe: Error: Cannot handle the given command due to an internal error.

amplxe: Internal Error

Is there any way to profile my daemon without making code changes or system configuration changes?

I'm running vTune 2016 update 4 if that matters.

How to profile hybrid openMP/MPI code

$
0
0

Hi All

I have a piece of code which uses both openMP and MPI and I wish to profile it in different configurations. e.g. 

One Haswell node with 20 cores in following configurations

1. 20 MPI tasks and no openMP parallelization or 1 openMP thread

2. 10 MPI tasks and 2 openMP threads per task

3. 4 MPI tasks and 5 openMP threads per task

4. 2 MPI tasks and 10 openMP threads per task

I am running completely independent tasks (linear solvers) with different data sets so among MPI tasks there is NO communication. The reason i have MPI tasks is because in future I would like to have a more fine-grain parallel task that can also use cores form nodes on the infiniband network.

I had expected that for the matrix I am using for the linear solver I would see more than 10 times improvement between variant 1 and 4. What this means is that in 1 say all 20 tasks finish in 130 seconds (maximum time taking task). I see that 4 finishes in 13 seconds but then in order to complete all the work I must run 4 10 times. This results in 130 seconds so the gain in parallelizing with openMP is absent. 

This is what I wish to understand with a tool or a set of tools. I was advised by my cluster administrator to use Vtune for openMP analysis and ITAC for MPI analysis.

I am wondering is there an integrated way of looking at the possible issues with my test? Kindly advise.

with kind regards and thanks in advance for reading my message

Rohit

P.S.:- Please note that in order to get these numbers I used the knowledge provided in articles listed

https://software.intel.com/en-us/node/528819

https://software.intel.com/en-us/node/522691#AFFINITY_TYPES

So in my code I use options listed on this page to map processes to cores

Zone: 

Thread Topic: 

Question

Profiling a single threaded single process running on a single CPU with General Exploration

$
0
0

I am trying to profile a process on linux running on a single CPU on a Broadwell (model name    : Intel(R) Xeon(R) CPU D-1540 @ 2.00GHz) and getting a CPI rate of 2.828 with default Vtune config. With more samples (reducing the sample interval), I see it bumps up to 3.148. While I understand one or two delay functions in "Bottom-up" chart that are affecting the CPI rate, what I do not understand is "vmlinux" showing a CPI rate of 1.588. According to system config, CPU that I am running is dedicated to the process and any actions of linux kernel should be performed on a different CPU. Does CPI 1.588 above say this is not happening ? Any help is greatly appreciated. Any other suggestions/comments based on the results and Vtune-config pasted below?

Vtune config on General Exploration:

1. Attach to a process with ssh session.

2. Automatically stop after 60 secs.

3. Analyse child processes.

4. Under 1 minute of duration estimate

5. Collection data: 0

6. Slow frames: 40, Fast frames: 100 (default values)

7. CPU mask: 11 

 

RESULTS:

With defaults:

Elapsed Time:    60.059s
    Clockticks:    151,600,000
    Instructions Retired:    53,600,000
    CPI Rate:    2.828
    MUX Reliability:    0.948
    Front-End Bound:    0.191
        Front-End Latency:    0.106
            ICache Misses:    0.026
            ITLB Overhead:    0.009
            Branch Resteers:    0.047
            DSB Switches:    0.000
            Length Changing Prefixes:    0.000
            MS Switches:    0.106
        Front-End Bandwidth:    0.086
            Front-End Bandwidth DSB:    0.026
            Front-End Bandwidth MITE:    0.237
            Front-End Bandwidth LSD:    0.000
    Bad Speculation:    0.046
    Back-End Bound:    0.584
        Memory Bound:    0.217
            L1 Bound:    0.237
            L2 Bound:    0.000
            L3 Bound:    0.000
            DRAM Bound:    0.211
            Store Bound:    0.000
        Core Bound:    0.367
            Divider:    0.000
            Port Utilization:    0.923
                Cycles of 0 Ports Utilized:    0.633
                Cycles of 1 Port Utilized:    0.290
                Cycles of 2 Ports Utilized:    0.053
                Cycles of 3+ Ports Utilized:    0.079
    Retiring:    0.178
        General Retirement:    0.113
        Microcode Sequencer:    0.065
        Assists:    0.000
    Total Thread Count:    5
    Paused Time:    0s

 

With Sampling Interval event-config=CPU_CLK_UNHALTED.THREAD:sa=200000,INST_RETIRED.ANY:sa=200000 as suggested in another recent post.

Elapsed Time:    60.001s
    Clockticks:    1,454,600,000
    Instructions Retired:    462,000,000
    CPI Rate:    3.148
    MUX Reliability:    0.984
    Front-End Bound:    0.067
        Front-End Latency:    0.063
            ICache Misses:    0.019
            ITLB Overhead:    0.003
            Branch Resteers:    0.025
            DSB Switches:    0.000
            Length Changing Prefixes:    0.000
            MS Switches:    0.121
        Front-End Bandwidth:    0.004
            Front-End Bandwidth DSB:    0.000
            Front-End Bandwidth MITE:    0.179
            Front-End Bandwidth LSD:    0.000
    Bad Speculation:    0.009
    Back-End Bound:    0.769
        Memory Bound:    0.365
            L1 Bound:    0.294
            L2 Bound:    0.000
            L3 Bound:    0.234
            DRAM Bound:    0.000
            Store Bound:    0.000
        Core Bound:    0.404
            Divider:    0.000
            Port Utilization:    0.660
                Cycles of 0 Ports Utilized:    0.415
                Cycles of 1 Port Utilized:    0.242
                Cycles of 2 Ports Utilized:    0.110
                Cycles of 3+ Ports Utilized:    0.049
    Retiring:    0.155
        General Retirement:    0.056
        Microcode Sequencer:    0.099
        Assists:    0.000
    Total Thread Count:    5
    Paused Time:    0s

Zone: 

Thread Topic: 

How-To
Viewing all 1574 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>