Kernel driver build fails on Fedora 30. Asm macros not substituted.

March 26, 2020, 5:58 am

Latest and popular articles on Intel Technologies

≫ Next: How to use Intel® VTune™ Profiler in a kubernetes Environment

≪ Previous: sepdk build fails on Fedora 31. Asm macros not defined

Hi,

I previously posted about this same problem on Fedora 31, an unsupported OS. I tried today on Fedora 30 (a supported OS) and ran into the exact same problem.

Here's the previous post with full details:
https://software.intel.com/en-us/forums/vtune/topic/850745

TCE Level:

Level 1

TCE Open Date:

Wednesday, March 25, 2020 - 22:58

↧

How to use Intel® VTune™ Profiler in a kubernetes Environment

March 25, 2020, 2:53 pm

Latest and popular articles on Intel Technologies

≫ Next: Analyse python file from embedded python

≪ Previous: Kernel driver build fails on Fedora 30. Asm macros not substituted.

Hi All,

I have a Kubernetes environment. I want to use Intel® VTune™ Profiler to determine bottlenecks, inefficient code, long execution times, etc or any other useful info about the containers( containers have python applications in it) .

I need some documentation or wiki on how to get started with setting up Intel® VTune™ in a Kubernetes environment (it can be standalone or container-based installation of VTune) and how to run it

Thanks

Krishna Venkata

↧

Analyse python file from embedded python

March 31, 2020, 5:57 am

Latest and popular articles on Intel Technologies

≫ Next: VTune counting cache hit/miss wrong?

≪ Previous: How to use Intel® VTune™ Profiler in a kubernetes Environment

Hi, I would like have a python analysis for Python files executed from the C python API, like in this sample code adapted from https://docs.python.org/3/extending/embedding.html :

#define PY_SSIZE_T_CLEAN
#include <Python.h>

int
main(int argc, char *argv[])
{
	printf("before\n");
	wchar_t *program = Py_DecodeLocale(argv[0], NULL);
	if (program == NULL) {
		fprintf(stderr, "Fatal error: cannot decode argv[0]\n");
		exit(1);
	}
	Py_SetProgramName(program); 
	Py_Initialize();
	auto* f = fopen("f.py", "rb");
	auto r = PyRun_SimpleFile(f, "f.py");
	if (Py_FinalizeEx() < 0) {
		exit(120);
	}
	PyMem_RawFree(program);
	return 0;
}

When I use Hostspots analysis with user mode sampling and managed mode set to any of (auto, native, mixed), I only get the C/C++ functions.

python37_d.dll ! PyEval_EvalFrameDefault - ceval.c
python37_d.dll ! PyEval_EvalCodeWithName + 0xaf3 - ceval.c:3930
python37_d.dll ! PyEval_EvalCodeEx + 0x95 - ceval.c:3959
python37_d.dll ! PyEval_EvalCode + 0x2d - ceval.c:524
python37_d.dll ! run_mod + 0x69 - pythonrun.c:1035
python37_d.dll ! PyRun_FileExFlags + 0x111 - pythonrun.c:988
python37_d.dll ! PyRun_SimpleFileExFlags + 0x4df - pythonrun.c:429
embed1.exe ! [embed1.exe] + 0x11a2f - [unknown source file]

It is possible to have the python file/stack analysis, and how ? Thank you

I use vtune 2020-60519 with Visual Studio 2017 on Windows.

↧

VTune counting cache hit/miss wrong?

March 30, 2020, 3:52 pm

Latest and popular articles on Intel Technologies

≫ Next: VTune backend crashes

≪ Previous: Analyse python file from embedded python

Hi!

I am using VTune to measure the different levels of cache hits and misses (Load). I assumed L2_MISS = L3_HIT + L3_MISS (similarly for L1 and L2) but this does not seem to satisfy from the output below?

Config : Intel Core i3-5005u + Windows 10

CPU
Name:   Intel(R) Core(TM) Processor code named Broadwell
Frequency:   2.0 GHz
Logical CPU Count:   4

Elapsed Time:   60.004s
CPU Time:   25.576s
CPI Rate:   1.641
Total Thread Count:   4
Paused Time:   0s

Hardware Events
Hardware Event Type   Hardware Event Count   Hardware Event Sample Count   Events Per Sample
BACLEARS.ANY   223,106,693   97   100003
BR_MISP_RETIRED.ALL_BRANCHES_PS   64,401,449   7   400009
CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE   1,497,344,919   651   100003
CPU_CLK_UNHALTED.REF_TSC   51,034,000,000   25,517   2000000
CPU_CLK_UNHALTED.REF_XCLK   2,645,079,350   1,150   100003
CPU_CLK_UNHALTED.THREAD   51,314,000,000   25,657   2000000
CPU_CLK_UNHALTED.THREAD_P   47,242,070,863   1,027   2000003
CYCLE_ACTIVITY.STALLS_L1D_MISS   13,616,020,424   296   2000003
CYCLE_ACTIVITY.STALLS_L2_MISS   10,350,015,525   225   2000003
CYCLE_ACTIVITY.STALLS_MEM_ANY   20,332,030,498   442   2000003
CYCLE_ACTIVITY.STALLS_TOTAL   29,992,044,988   652   2000003
INST_RETIRED.ANY   31,262,000,000   15,631   2000000
INST_RETIRED.PREC_DIST   30,130,045,195   655   2000003
INST_RETIRED.X87   0   0   2000003
INT_MISC.RECOVERY_CYCLES   276,000,414   6   2000003
ITLB_MISSES.STLB_HIT   50,601,518   22   100003
ITLB_MISSES.WALK_COMPLETED   85,102,553   37   100003
ITLB_MISSES.WALK_DURATION   2,884,286,526   1,254   100003
L1D.REPLACEMENT   1,518,002,277   33   2000003
L1D_PEND_MISS.FB_FULL   46,000,069   1   2000003
L1D_PEND_MISS.PENDING   33,810,050,715   735   2000003
L2_RQSTS.RFO_HIT   55,200,828   12   200003
LD_BLOCKS.NO_SR   0   0   100003
LD_BLOCKS.STORE_FORWARD   39,101,173   17   100003
LD_BLOCKS_PARTIAL.ADDRESS_ALIAS   71,302,139   31   100003
LSD.CYCLES_4_UOPS   138,000,207   3   2000003
LSD.CYCLES_ACTIVE   92,000,138   2   2000003
LSD.UOPS   506,000,759   11   2000003
MACHINE_CLEARS.COUNT   2,300,069   1   100003
MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HITM_PS   27,154,927   59   20011
MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT_PS   10,585,819   23   20011
MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_MISS_PS   5,523,036   12   20011
MEM_LOAD_UOPS_RETIRED.HIT_LFB_PS   565,816,974   246   100003
MEM_LOAD_UOPS_RETIRED.L1_HIT_PS   6,716,010,074   146   2000003
MEM_LOAD_UOPS_RETIRED.L1_MISS_PS   761,322,839   331   100003
MEM_LOAD_UOPS_RETIRED.L2_HIT_PS   434,713,041   189   100003
MEM_LOAD_UOPS_RETIRED.L2_MISS_PS   332,489,587   289   50021
MEM_LOAD_UOPS_RETIRED.L3_HIT_PS   287,620,750   250   50021
MEM_LOAD_UOPS_RETIRED.L3_MISS   9,200,644   4   100007
MEM_LOAD_UOPS_RETIRED.L3_MISS_PS   6,900,483   3   100007
MEM_UOPS_RETIRED.ALL_STORES_PS   5,888,008,832   128   2000003
MEM_UOPS_RETIRED.LOCK_LOADS_PS   262,218,354   114   100007
MEM_UOPS_RETIRED.SPLIT_LOADS_PS   4,600,138   2   100003
MEM_UOPS_RETIRED.SPLIT_STORES_PS   0   0   100003
MEM_UOPS_RETIRED.STLB_MISS_LOADS_PS   108,103,243   47   100003
MEM_UOPS_RETIRED.STLB_MISS_STORES_PS   2,300,069   1   100003

Any help regarding this would be appreciated.

Thanks!

↧

VTune backend crashes

April 1, 2020, 12:18 am

Latest and popular articles on Intel Technologies

≫ Next: Issue with vtune drivers being group permissioned

≪ Previous: VTune counting cache hit/miss wrong?

VTune 2020 Profile crashes collecting the analysis..

Code compiled with VS2017 - 15.8.9

Do you have a later version I could try

Problem signature:
Problem Event Name:   BEX64
Application Name:   vtune-backend.exe
Application Version:   0.0.0.0
Application Timestamp:   5ddce9e2
Fault Module Name:   amplxe_msdia140.dll
Fault Module Version:   14.10.25017.0
Fault Module Timestamp:   58a64084
Exception Offset:   000000000009e819
Exception Code:   c0000417
Exception Data:   0000000000000000
OS Version:   6.1.7601.2.1.0.18.10
Locale ID:   1033
Additional Information 1:   2356
Additional Information 2:   2356dca811460826fcbf797f7d9cab81
Additional Information 3:   de98
Additional Information 4:   de98d53fa8c257268087b149a2719e54

Read our privacy statement online:
http://go.microsoft.com/fwlink/?linkid=104288&clcid=0x0409

If the online privacy statement is not available, please read our privacy statement offline:
C:\Windows\system32\en-US\erofflps.txt

↧

Issue with vtune drivers being group permissioned

April 1, 2020, 7:02 am

Latest and popular articles on Intel Technologies

≫ Next: Recursive Filter In by Selection

≪ Previous: VTune backend crashes

Hi,

To facilitate usage of vtune among a couple of people, I have changed the group ownership of vtune drivers (/dev/{pax,socperf,sep5}) from vtune to some other group (mygroup). This was done via : insmod-sep -g mygroup when loading the driver

On RHEL 7.6 dev host, I'm having issue where if this `mygroup` is not the primary group of the user, then vtune-gui and vtune is unable to run properly and query the HW sampling drivers.

This does not seem to be an issue on RHEL 6.x.

straces seem inconclusive yet why vtune driver being group permissioned (perm 660 and ownership of root:mygroup) and user having group membership (primary_group, mygroup) does't work.

If I make the /dev/{pax,socperf,sep5} be root:primary_group, then it works.But making it root:mygroup fails on RHEL 7.x only

Any reason why? Something to do with sssd settings? Why is vtune not able to work with such a scenario? Any help please?

Thanks,

↧

Recursive Filter In by Selection

April 2, 2020, 1:01 am

Latest and popular articles on Intel Technologies

≫ Next: ./insmod-sep: line 261: socwatch_exists: command not found

≪ Previous: Issue with vtune drivers being group permissioned

Hi,

When profiling my code with VTune, I often need to find out when a given function gets executed (start elapsed time -> end elapsed time). To do so, I usually do the following:

I go to the top-down tree view.
I select a range in the elapsed time graph at the bottom, and "Filter In by Selection"
I observe the new tree view, and check if the CPU time for that function is close to 100%. If it is not, I need to move my range around -> go back to step 2 and iterate.

This is a rather tedious process. In the top-down tree, there is an option "Filter In by Selection", which almost does what I want. It shows the function in the elapsed time graph, but it does not show its callees.

Basically, what I am looking for is an option in the top-down tree that would be "Filter In by Selection (Recursive)". Is there anything like that in VTune?

Thank you for your help,

Joachim

↧

./insmod-sep: line 261: socwatch_exists: command not found

April 5, 2020, 5:23 am

Latest and popular articles on Intel Technologies

≫ Next: Assertion 'Cannot write magic record to trace' failed.

≪ Previous: Recursive Filter In by Selection

Hi,
i have installed vtune 2020 on RHEL 7.6 , and while checking for sep driver load status i get following error -

[root@node1 ]# source /home/user/I2020u0/parallel_studio_xe_2020.0.088/psxevars.sh intel64
Intel(R) Parallel Studio XE 2020 for Linux*
Copyright (C) 2009-2019 Intel Corporation. All rights reserved.
[root@node1 ]# cd  /home/user/I2020u0/vtune_profiler_2020.0.0.605129/sepdk/src/
[root@node1 ]# ./insmod-sep -q
pax driver is loaded and owned by group "vtune" with file permissions "660".
socperf3 driver is loaded and owned by group "vtune" with file permissions "660".
sep5 driver is loaded and owned by group "vtune" with file permissions "660".
socwatch driver is not correctly loaded.
./insmod-sep: line 261: socwatch_exists: command not found
vtsspp driver is loaded and owned by group "vtune" with file permissions "660".
./insmod-sep: line 268: [: too many arguments
[root@node1 ]#

I also have intel 2019 u5 , and it the same command (insmod-sep) worked fine on this system.

Please let me know if more information is required from my end to fix this.

↧

Assertion 'Cannot write magic record to trace' failed.

April 6, 2020, 10:43 pm

Latest and popular articles on Intel Technologies

≫ Next: VTune Profiler 2020 : Profiling Remote Target Inside Docker With Host and Remote system as Windows

≪ Previous: ./insmod-sep: line 261: socwatch_exists: command not found

Hi,
I am trying vtune 2020u0 on rhel 7.6 in intel 8280.
For testing the setup i used APS, and it ran fine and generated the results without any issue.

Then i tried out the hpc-performance analysis on using amplxe-cl command as -

time mpirun -np $SLURM_NPROCS -ppn $SLURM_NTASKS_PER_NODE amplxe-cl -collect hpc-performance -data-limit 0 -result-dir result_hpcperf -- ${INSTALL_ROOT}/wrf.exe

the run has finished , but is seems that the data gathering command has experienced some issue -

WRF: SUCCESS COMPLETE wrf
vcs/collectunits1/tmu/src/tmu.c:437 write_trace: Assertion 'Cannot write magic record to trace' failed.

Abort trap signal
Image              PC                Routine            Line        Source
wrf.exe            00000000030C8DDB  for__signal_handl     Unknown  Unknown
libpthread-2.17.s  00002AAAACFB25D0  Unknown               Unknown  Unknown
libc-2.17.so       00002AAAAD8E7207  gsignal               Unknown  Unknown
libc-2.17.so       00002AAAAD8E88F8  abort                 Unknown  Unknown
libittnotify_coll  00002AAAAACE2D88  Unknown               Unknown  Unknown
libittnotify_coll  00002AAAAACE33AB  Unknown               Unknown  Unknown
libittnotify_coll  00002AAAAACE34F7  Unknown               Unknown  Unknown
libittnotify_coll  00002AAAAACE4C9B  Unknown               Unknown  Unknown
libittnotify_coll  00002AAAAACD7B81  Unknown               Unknown  Unknown
libittnotify_coll  00002AAAAACD79A3  Unknown               Unknown  Unknown
libittnotify_coll  00002AAAAACD77F6  Unknown               Unknown  Unknown
ld-2.17.so         00002AAAAAABAFCA  Unknown               Unknown  Unknown
libc-2.17.so       00002AAAAD8EAB69  Unknown               Unknown  Unknown
libc-2.17.so       00002AAAAD8EABB7  Unknown               Unknown  Unknown
libc-2.17.so       00002AAAAD8D33DC  __libc_start_main     Unknown  Unknown
wrf.exe            0000000000415169  Unknown               Unknown  Unknown

around 11 hours have elapsed and i still see amplxe-cl process running (top command)

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 42453 root      20   0  626072  53584  32180 S   6.2  0.0   0:29.60 amplxe-cl
111683 root      20   0  164668   2616   1556 R   6.2  0.0   0:00.01 top
     1 root      20   0   56060   8328   2620 S   0.0  0.0   1:04.08 systemd
     2 root      20   0       0      0      0 S   0.0  0.0   0:00.12 kthreadd
     3 root      20   0       0      0      0 S   0.0  0.0   0:00.06 ksoftirqd/0

and i can see a 17mb directory was also created (result_hpcperf.node1).
I am not sure if i will face issues with other collection/analysis types
Though i tried out the amplxe self checker script, and log seems to indicate that the setup is fine.

Please let me know if i can provide more information from my end to fix this issue.

↧

VTune Profiler 2020 : Profiling Remote Target Inside Docker With Host and Remote system as Windows

April 7, 2020, 6:04 am

Latest and popular articles on Intel Technologies

≫ Next: Mac VTune downloads lead to a 404 page

≪ Previous: Assertion 'Cannot write magic record to trace' failed.

Objective : To profile remote running process using Hardware Event Based Sampling Hotspot analysis type inside a docker with windows image (windowsserver:ltsc2019) from Windows Host.

Version of vtune tool used : Vtune 2020 update1

Version of Docker Engine : 19.03.8 (Docker Desktop)

Host OS : Version 1909 Windows Pro

I have done the following successfully :

1) Passwordless SSH connection to the remote system (docker) using empty password config (not through key based)

2) Installed VTune standalone profiler with command line support in the remote target system. This includes the installation of the driver also.

Problem : I get the following error

C:\Users\hariv\.ssh>vtune --target-system=ssh:User03@localhost[:2222] -collect hotspots -knob collection-type:hw-events -- /matrix/matrix.exe
vtune: Using target: ssh:User03@localhost[:2222]
vtune: Error: Please, check that the command '/tmp/vtune_profiler_2020.1.0.607630/bin32/amplxe-runss -V' is run successfully on the target.
vtune: Error: VTune cannot detect remote machine configuration.
vtune: Error: Please, check that the command '/tmp/vtune_profiler_2020.1.0.607630/bin32/amplxe-runss -V' is run successfully on the target.
vtune: Error: VTune cannot detect remote machine configuration.

The amplxe-runss works fine when i try to run in the target system but it is present in a different directory inside /bin64

Attached files : 1) Included the screenshot of the error

2) Dockerfile used to build the windows image which includes SSH configuration and VTune installation.

3) The script to install the SSH in the remote target system

I am unable to change the default directory even by using the flag -target-install-dir . I am unable to find a solution after this to make my remote profiling work here. Any sort of Help is appreciated. Thank you.

Attachment	Size
Download Capture.PNG	19.35 KB
Download Dockerfile.rar	426 bytes

↧

Mac VTune downloads lead to a 404 page

April 9, 2020, 3:13 am

Latest and popular articles on Intel Technologies

≫ Next: Estimating elapsed time for a vtune anaysis (knob sampling-interval)

≪ Previous: VTune Profiler 2020 : Profiling Remote Target Inside Docker With Host and Remote system as Windows

Hi,

Is there a known problem with Mac downloads at the moment? I've been trying the download the Mac interface for Vtune and all of the versions I've tried lead to a 404 page rather than a download. I was able to download the Linux version successfully.

Thanks

↧

Estimating elapsed time for a vtune anaysis (knob sampling-interval)

April 13, 2020, 11:04 pm

Latest and popular articles on Intel Technologies

≫ Next: Packed non-vectorized FP operations

≪ Previous: Mac VTune downloads lead to a 404 page

Hi,

I ran a HPCPerformance analysis(vtune 2020u0) on intel8280 (RHEL7.6) with default settings as -

time mpirun -np $SLURM_NPROCS -ppn $SLURM_NTASKS_PER_NODE  $OPTS  amplxe-cl -collect hpc-performance -data-limit 0 -result-dir result_hpcperf -- ${APP_INSTALL_ROOT}/appname.exe

the analysis part

vtune: Executing actions  0 %
........
vtune: Executing actions 100 % done

took around 45 minutes and "result_hpcperf.nodeXX" directory had around 20G data.

Q1: If my linux kernel version is 3.10.0-957.el7.x86_64 then what will be the default sampling interval ?

Q2: If i reduce the sampling interval for an analysis by half, (by rough estimate) how much elapsed time and output data should i expect for the vtune analysis+report generation part ?

- I was expecting that if the sampling interval is halved (default 1ms -> 0.5ms ) , then the analysis & result generation should take around 90 minutes and i was expecting data of around 40-50 GB. Please let me know if my assumptions are incorrect.

Q3: Also, If i reduce the sampling interval for an analysis by half, then (in general based on your observations with this tool) how much accuracy in output data metrics can i expect ?

As per this article (CPU sampling interval, ms field) , i assumed the default sampling interval should be 1ms, and i reran HPC performance analysis by setting sampling-interval to 0.5 ms as -

time mpirun -np $SLURM_NPROCS -ppn $SLURM_NTASKS_PER_NODE  $OPTS  amplxe-cl -collect hpc-performance -data-limit 0 -result-dir result_hpcperf -knob sampling-interval=0.5  -- ${APP_INSTALL_ROOT}/appname.exe

the last statement to appear in the stdout was -

vtune: Executing actions  0 %

and around 11 hours ave elapsed since then and around 150G of data has been generated in results directory.

within the results directory ( find . -printf "%T+\t%p\n" | sort) i saw that the last file was changed around 11 hours ago , and that file has following contents -

[user@headnode01 hpcperf_char_00003]$ cat result_hpcperf.node3/config/log.cfg
<?xml version='1.0' encoding='UTF-8'?>

<bag xmlns:int="http://www.w3.org/2001/XMLSchema#int" xmlns:long="http://www.w3.org/2001/XMLSchema#long">
 <message_entry_t int:status="2" cap="Data collection completed successfully" msg="" long:timeStamp="1586803953480"/>
 <message_entry_t int:status="2" cap="Data collection completed successfully" msg="" long:timeStamp="1586803953542"/>
 <message_entry_t int:status="2" cap="Data collection completed successfully" msg="" long:timeStamp="1586803953687"/>
 <message_entry_t int:status="2" cap="Data collection completed successfully" msg="" long:timeStamp="1586803953748"/>
 <message_entry_t int:status="2" cap="Data collection completed successfully" msg="" long:timeStamp="1586803954281"/>
 <message_entry_t int:status="1" cap="Data collection completed with warnings" msg="Please see warning messages for details. " long:timeStamp="1586809230671">
  <message msg="Analyzing data in the node-wide mode. The hostname (node61) will be added to the result path/name." int:severity="1"/>
  <message msg="Peak bandwidth measurement started." int:severity="1"/>
  <message msg="Peak bandwidth measurement finished." int:severity="1"/>
  <message msg="To enable hardware event-base sampling, VTune Profiler has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes." int:severity="2"/>
  <message msg="Collection started." int:severity="1"/>
  <message msg="Collection stopped." int:severity="1"/>
 </message_entry_t>
</bag>

also, on the compute node (node3) i checked the running processes via top command -

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
127588 root      20   0 4128520  82480   3308 R 100.0  0.0 563:13.50 sep
    10 root      20   0       0      0      0 S   6.2  0.0   0:22.52 rcu_sched
     1 root      20   0   56068   8276   2620 S   0.0  0.0   0:26.51 systemd

Here also , it seems that the sep command(/driver)has been running since ~9hours with no memory utilization. Not sure if the application/sep driver is running fine. Is there a way to confirm (via system logs/sep driver logs) if the application is running fine?

It would be very helpful for me if i could get an estimate of the time to be taken by this analysis to finish in my scenario?

- Asking as i will adjust the "walltime" for my vtune jobs on my cluster accordingly.

Please let me know if i can provide more information from my end to help you with answers to my queries.

↧

Packed non-vectorized FP operations

April 14, 2020, 3:54 am

Latest and popular articles on Intel Technologies

≫ Next: collecting gpu-hotspots crashes my application

≪ Previous: Estimating elapsed time for a vtune anaysis (knob sampling-interval)

I am using vtune 2020u0 on intel 8280 platform. I carried out an HPC characterization analysis and was looking at the Heading of Vectorization Section which has

Vectorization:	77.7% of Packed FP Operations
    Instruction Mix:	
    SP FLOPs:	15.4%
    Packed:	79.8%
    128-bit:	0.0%
    256-bit:	0.1%
    512-bit:	79.8%
    Scalar:	20.2%
    DP FLOPs:	0.4%
    x87 FLOPs:	0.0%
    Non-FP:	84.2%
    FP Arith/Mem Rd Instr. Ratio:	0.462
    FP Arith/Mem Wr Instr. Ratio:	1.369

-
From report it seems code issued packed + non packed instructions and, out of all the packed FP instructions issued during code execution, only 77.7% were vectorized - Which (AFAIK) means these instructions resulted in use of AVX/AVX2/AVX512 bit registers.

Could you please explain / refer me to an article which explains the reasons for non-vectorization of remainder 22.3% packed instructions? and how these 22.3% packed instructions execute (using scalar registers?)?

example - mm256_add_ps is a packed instruction, so could you help me in understanding that how the add operation could be non-vectorized in following context -

float f[8]={1.0,2.0,1.2,2.1, 5.2,5.3,10.1,11.0};
__m256 v=_mm256_load_ps(&f[0]);
v=_mm256_add_ps(v,v);

↧

collecting gpu-hotspots crashes my application

April 14, 2020, 1:57 pm

Latest and popular articles on Intel Technologies

≫ Next: Can't view the source code for functions

≪ Previous: Packed non-vectorized FP operations

Hi,

My application crashes when I try to run collect gpu-hotspots, whereas if I just collect hotspots then it works just fine.

Vtune self check script ran just fine. Attaching the log file.

Attachment	Size
Download log.txt	62.29 KB

↧

Can't view the source code for functions

April 16, 2020, 10:47 am

Latest and popular articles on Intel Technologies

≫ Next: Unable to load .pdb files after profiling using VS2019

≪ Previous: collecting gpu-hotspots crashes my application

Hello!

I am trying to profile a C++ application with OpenMP using Intel Vtune Profiler. According to Intel Tutorial (https://software.intel.com/en-us/download/tutorial-finding-hotspots-c-sa...) I have run Hotspots Analysis and found the most time-consuming functions. In the tutorial functions' sources are shown in the Call Stack pane (see "guide" picture). However, in my case in the Call Stack pane it is written that almost all of functions have [unknown source file], so I can't find their source (see "reality" picture).

Tell me, please what should I do to find the source of these functions?
Many thanks! :)

P.S. I have compiled this application (q.exe) using NetBeans with MinGW. I have tried to run Hotspots Analysis in both Debug and Release modes, but still have no success.

Attachment	Size
Download guide.png	135.12 KB
Download reality.png	171.42 KB

↧

Unable to load .pdb files after profiling using VS2019

April 17, 2020, 9:00 am

Latest and popular articles on Intel Technologies

≫ Next: Problems importing data gathered by command line from AWS host

≪ Previous: Can't view the source code for functions

Hey,

After I profile my C++ application VTune just sits trying to load the .pdb for one of my libraries. It never moves past this .pdb. The .pdb it gets stuck on is random for each time I run. This application can be profiled without issue when compiled using VS2015 using the same Vtune version I've tried both compiling using /Zi and /Z7, but with the same results.

Behavior is identical when using the standalone VTune, or using the VS2019 integrated one. Anything I can do to help diagnose this?

Thanks

Operating system and version

Windows 10

Tool version:

VTune 2020, Update 1. 607630

Compiler version:

MSVC 2019. 16.5.4

GNU Compiler Collection (GCC)* or Microsoft Visual Studio* version (if applicable)

MSVC 2019. 16.5.4, 14.25.28610

↧

Problems importing data gathered by command line from AWS host

April 21, 2020, 11:40 am

Latest and popular articles on Intel Technologies

≫ Next: Silent CLI only install still does GUI checks

≪ Previous: Unable to load .pdb files after profiling using VS2019

Hi,

I've setup vTune profiler on an AWS linux installation where we are running processes in Docker. I don't have direct root login access on the host, so need to run vTune with sudo directly on the host (root is necessary to allow vTune to delve into the Java process running in Docker).

After the run, I copied the results directory to a local Linux VM which does have GUI access. Following the guidance on this page:

https://software.intel.com/en-us/vtune-help-importing-results-to-gui

When I try to import without ticking 'Import via a link instead of result copy', I get the following failure message:

"cannot import the result because the current project already has a result with the name"

That is the complete message.

I then try importing with the 'Import via a link...', this time I just get a spinning 'progress' image, but it never returns.

Can anyone advise on what is going wrong here?

Many thanks

↧

Silent CLI only install still does GUI checks

April 22, 2020, 9:12 am

Latest and popular articles on Intel Technologies

≫ Next: No 32bit target for remote Linux

≪ Previous: Problems importing data gathered by command line from AWS host

Hi,

My current project involves profiling against AWS instances that I spin up specifically for profiling. The AWS instances get erased at the end of the session. I'm therefore installing vTune profiler quite frequently (on Linux).

To save a little time each time I run up an instance, I'd hoped to use the silent install; so I ran through my 'normal' install using:

install.sh -d vtune_install.conf

In this installation, I customised it to remove GUI compnents. The components recorded in `vtune_install.conf` were:

COMPONENTS=;intel-vtune-profiler-2020-cli-common__noarch;intel-vtune-profiler-2020-common__noarch;intel-vtune-profiler-2020-cli__x86_64;intel-vtune-profiler-2020-cli-32bit__i486;intel-vtune-profiler-2020-collector-32linux__i486;intel-vtune-profiler-2020-collector-64linux__x86_64;intel-vtune-profiler-2020-doc__noarch;intel-vtune-profiler-2020-sep__noarch;intel-vtune-profiler-2020-target__noarch;intel-vtune-profiler-2020-vpp-server__x86_64;intel-vtune-profiler-2020-common-pset

When I run

install.sh --silent vtune_install.sh

I get an error:

Missing critical prerequisite
-- ALSA library is not found. 'Graphical user interface' compenent(s) cannot be installed
...

with a bit more info about ALSA and then a complaint about X11 not being present followed by a suggestion to deselect the 'Graphical user interface' compenent(s). As I mentioned, I had done this in the manual install which I'd previously done to generate the config for the silent install, the selected components that I listed above from the config also don't seem to include GUI components.

Is it actually possible to do a silent install without the GUI and if so, how can this be done?

Many thanks,

Dominic

↧

No 32bit target for remote Linux

April 22, 2020, 10:01 am

Latest and popular articles on Intel Technologies

≫ Next: System Profiling of an AWS host with app running in Docker - Outside any known module

≪ Previous: Silent CLI only install still does GUI checks

Hi, I'm trying to run Vtune remotely on a 32 bit Linux target but there's no vtune_profiler_target_x86.tgz in the target directory (only an x86_64 one).

Is this unsupported now?

↧

System Profiling of an AWS host with app running in Docker - Outside any known module

April 23, 2020, 9:58 am

Latest and popular articles on Intel Technologies

≫ Next: Profiler Drivers missing

≪ Previous: No 32bit target for remote Linux

Hi,

The system I'm currently profiling is a Linux AWS instance, it's a Scala app (so running on JVM) running in a docker container. When I profile using a command line like:

vtune -collect hotspots -knob sampling-mode=hw -knob enable-stack-collection=true -finalization-mode=full

The majority of the recorded time is listed as 'Outside any known module', and covers the code that I'm mainly interested in profiling. I had initially suspected that it was as a result of the code being built without debug symbols, however I built a new version with both Java and Scala compilers set to record full debug symbols, and this didn't improve things.

Can you advise on what I'm missing that could enable me to get full stack info via VTune? I could remove the application from the docker container if that would help, but this is slightly more complicated than it sounds in a prod-like environment and so I'd prefer to avoid it at this stage unless I know it's got a good chance of success.

Many thanks!

Dominic

↧