My dear reader, how are you? السلام عليكم
Perseverance is stubbornness with a purpose – Josh Shipp
In this post, I explain the standard tools to collect performance monitoring tools in modern computing platforms. Furthermore, we explore the standard tools to collect utilization metrics for the execution of applications in Linux.
Performance Monitoring Counters (PMCs)
Performance events or performance monitoring counters (PMCs) are special-purpose registers provided in modern microprocessors to store the counts of software and hardware activities. They include software events, which are pure kernel-level counters such as page faults, context switches, etc. as well as micro-architectural events originating from the processor and its performance monitoring unit called the hardware events such as cache misses, branch instructions, etc. PMCs have been developed primarily to aid low-level performance analysis and tuning.
Significant Properties
The following are some of the significant properties of PMCs.
- PMCs are typically large in number. For example, a typical Intel Haswell CPU has 164 PMCs whereas Intel Skylake processors have 385 PMCs.
- They can not be collected all together because of a limited number of registers dedicated to storing them. 3 or 4 PMCs can be collected in one application run.
- So, the collection of all PMCs for application execution on a platform is a
tedious task. - PMCs are architecture-specific. PMCs on an Intel processor may not be available for an ARM processor or a GPU.
Tools to obtain PMCs
- Likwid tool DirectMe
- Likwid is an abbreviation for Like I know what I’m Doing.
- Provides command-line tools and an API to obtain PMCs for both Intel and AMD processors on the Linux OS.
- Recently added support for extracting GPU counters as well.
- PAPI DirectMe
- Provides a standard API for accessing PMCs available on most modern microprocessors.
- Developed in the University of Tennessee.
- Intel PCM DirectMe
- Used for reading PMCs of core and uncore components of an Intel
processor.
- Used for reading PMCs of core and uncore components of an Intel
- Linux Perf DirectMe
- Also called perf_events can be used to gather the PMCs for CPUs in Linux.
- CUPTI DirectMe
- Can be used for obtaining the PMCs for Nvidia GPUs
- NVProf DirectMe
- Can be used to collect and profile the PMCs for Nvidia GPUS
Collection of PMCs on Modern Intel-based Server Using Likwid Tool
Using Likwid tool, we now explain the collection of PMCs on an Intel Haswell multicore and dual-socket CPU with specifications given as below:
Hardware topology can be viewed using likwid-topology tool.
- likwid-topology reports on the following
- Thread topology: How processor IDs map on physical compute resources
- Cache topology: How processors share the cache hierarchy
- Cache properties: Detailed information about all cache levels
- NUMA topology: NUMA domains and memory sizes
- GPU topology: GPU information
- To get more information about the caches, use likwid-topology with -c flag
PMCs can be collected using likwid-perfctr tool. The following commands are useful to understand what PMCs and metrics can be accessed using Likwid on a given platform.
- List all performance groups:
– likwid-perfctr -a - List all events and counters:
– likwid-perfctr -e - List all events and suitable counters for events with ‘L2’ in them:
– likwid-perfctr -E L2 - Run command on CPU 2 and measure performance group TEST:
– likwid-perfctr -C 2 g TEST ./ a.out
A sample Likwid command-line invocation is shown below where EVENTS
represents one or more PMCs, which are collected during the execution of the given application APP:
$ likwid-perfctr -f -C S0:0-11,[email protected]:12-23,36-47 -g EVENTS ./APP
Here, the application (APP) during its execution is pinned to physical cores ( 0-24) in our platform.
For example, the following command:
$ likwid-perfctr -f -C S0:0-11,24-35@S1:12-23,36-47 -g ICACHE_ACCESSES:PMC0,ICACHE_MISSES:PMC1 ./APP
The above command determines the counts for two PMCs, ICACHE_ACCESSES :PMC0 and ICACHE_MISSES :PMC1. Likwid uses likwid-pin for core pinning.
Resource Pinning
- Pinning is the process of binding of a process or thread to a specific core or
memory bank.
– It can improve the performance of your code by increasing the percentage
of local memory accesses
– The application best performance can be achieved by allowing application
threads to get executed on the CPU core which is as close as to its
memory bank.
–Linux tools for resource pinning are:- Taskset
- Numactl
Since Likwid do not provide the option to bind the application to memory
– We can use numactl, i.e., a command-line Linux tool, with option membind to pin our applications to memory blocks
– For our platform numactl gives 2 memory blocks, 0 and 1. The list of comma-separated PMCs is specified in EVENTS.
$ likwid-perfctr -f -C S0:0-11,[email protected]:12-23,36-47 -g ICACHE_ACCESSES:PMC0,ICACHE_MISSES:PMC1 numactl --membind =0,1 ./APP
Numactl can also be used to pin the applications to cores as shown below:
$ numactl --physcpubind=0-47 ./APP
Linux tools to monitor processes and threads
The following are some of the useful Linux tools to monitor the process during its execution on a processor.
- Top ( Real time Linux Process Monitoring
- Htop (Real time Linux Process Monitoring)
- Vmstat (Virtual Memory Statistics)
- Iostat (Input/Output Statistics)
- Netstat (Network Statistics)
- Ps ( CPU Usage for each process or user.
- Sar (Linux Process Monitoring)
- Sar command can generate a report and email them to the system admin.
I hope you find this post useful. If you find any errors or feel any need for improvement, let me know in your comments below.
Signing off for today. Stay tuned and I will see you in my next post! Happy learning.