PCPIntro - introduction to the Performance Co-Pilot (PCP)
The Performance Co-Pilot (PCP) is an SGI toolkit designed for monitoring and managing system-level performance. These services are distributed and scalable to accommodate the most complex system configurations and performance problems.
PCP supports many different platforms, including (but not limited to) Linux, MacOSX, IRIX, AIX, Solaris and Windows. From a high-level PCP can be considered to contain two classes of software utility:
These are the parts of PCP that collect and extract performance data from various sources, e.g. the Linux /proc pseudo filesystem. These are available under GPL/LPGL from http://oss.sgi.com/projects/pcp.
These are the parts of PCP that display data collected from hosts (or archives) that have the PCP Collector installed. Many monitor tools are available as part of PCP under GPL/LPGL from http://oss.sgi.com/projects/pcp. Other (typically graphical) monitoring tools are available separately in the PCP GUI package.
This manual entry describes the high-level features and options common to most PCP utilities available on all platforms.
The PCP architecture is distributed in the sense that any PCP tool may be executing remotely. On the host (or hosts) being monitored, each domain of performance metrics, whether the kernel, a service layer, a database management system, a web server, an application, etc. requires a Performance Metrics Domain Agent (PMDA) which is responsible for collecting performance measurements from that domain. All PMDAs are controlled by the Performance Metrics Collector Daemon (pmcd(1)) on the same host.
Client applications (the monitoring tools) connect to pmcd(1), which acts as a router for requests, by forwarding requests to the appropriate PMDA and returning the responses to the clients. Clients may also access performance data from a PCP archive (created using pmlogger(1)) for retrospective analysis.
The following performance monitoring applications are primarily console based, are typically run directly from the command line, and are all part of the base PCP package.
Each tool or command is documented completely in its own reference page.
pmstat Outputs an ASCII high-level summary of system performance.
pmie An inference engine that can evaluate predicate-action rules to perform alarms and automate system management tasks.
pminfo Interrogate specific performance metrics and the metadata that describes them.
Generates PCP archives of performance metrics suitable for replay by most PCP tools.
pmval Simple periodic reporting for some or all instances of a performance metric, with optional VCR time control.
If the PCP GUI package is installed then the following additional tools are available.
Displays trends over time of arbitrarily selected performance metrics from one or more hosts.
pmtime Time control utility for coordinating the time between multiple tools (including pmchart and pmval).
Produce ASCII reports for arbitrary combinations of performance metrics.
There is a set of common command line arguments that are used consistently by most PCP tools.
Performance metric information is retrospectively retrieved from the Performance Co-Pilot (PCP) archive, previously generated by pmlogger(1). The -a and -h options are mutually exclusive.
archive is either the base name common to all of the physical
files created by an instance of pmlogger(1), or any one of the
physical files, e.g. myarchive (base name) or myarchive.meta
(the metadata file) or myarchive.index (the temporal index) or myarchive.0 (the first data volume of archive) or myarchive.0.bz2 or myarchive.0.bz (the first data volume compressed with bzip2(1)) or myarchive.0.gz or myarchive.0.Z or myarchive.0.z (the first data volume compressed with gzip(1)), myarchive.1 or myarchive.3.bz2 or myarchive.42.gz etc.
-a archive[,archive,...] An alternate form of -a for applications that are able to handle multiple archives.
Unless directed to another host by the -h option, or to an archive by the -a option, the source of performance metrics will be the Performance Metrics Collector Daemon (PMCD) on the local host. The -a and -h options are mutually exclusive.
Normally the distributed Performance Metrics Name Space (PMNS) is used, however if the -n option is specified an alternative local PMNS is loaded from the file pmnsfile.
The argument samples defines the number of samples to be retrieved and reported. If samples is 0 or -s is not specified, the application will sample and report continuously (in real time mode) or until the end of the PCP archive (in archive mode).
-z Change the reporting timezone to the local timezone at the host that is the source of the performance metrics, as identified via either the -h or -a options.
By default, applications report the time of day according to the local timezone on the system where the application is executed. The -Z option changes the timezone to timezone in the format of the environment variable TZ as described in environ(5).
Most PCP tools operate with periodic sampling or reporting, and the -t
and -A options may be used to control the duration of the sample interval and the alignment of the sample times.
Set the update or reporting interval.
The interval argument is specified as a sequence of one or more elements of the form number[units] where number is an integer or floating point constant (parsed using strtod(3C)) and the optional units is one of: seconds, second, secs, sec, s, minutes, minute, mins, min, m, hours, hour, h, days, day and d. If the unit is empty, second is assumed.
In addition, the upper case (or mixed case) version of any of the above is also acceptable.
Spaces anywhere in the interval are ignored, so 4 days 6 hours
30 minutes, 4day6hour30min, 4d6h30m and 4d6.5h are all equivalent.
Multiple specifications are additive, e.g. ``1hour 15mins
30secs'' is interpreted as 3600+900+30 seconds.
By default samples are not necessarily aligned on any natural unit of time. The -A option may be used to force the initial sample to be aligned on the boundary of a natural time unit. For example -A 1sec, -A 30min and -A 1hour specify alignment on whole seconds, half and whole hours respectively.
The align argument follows the syntax for an interval argument described above for the -t option.
Note that alignment occurs by advancing the time as required, and that -A acts as a modifier to advance both the start of the time window (see the next section) and the origin time (if the -O option is specified).
Many PCP tools are designed to operate in some time window of interest, e.g. to define a termination time for real-time monitoring or to define a start and end time within a PCP archive log.
In the absence of the -O and -A options to specify an initial sample time origin and time alignment (see above), the PCP application will retrieve the first sample at the start of the time window.
The following options may be used to specify a time window of interest.
By default the time window commences immediately in real-time mode, or coincides with time at the start of the PCP archive log in archive mode. The -S option may be used to specify a later time for the start of the time window.
The starttime parameter may be given in one of three forms (interval is the same as for the -t option as described above, ctime is described below):
To specify an offset from the current time (in real-time mode) or the beginning of a PCP archive (in archive mode) simply specify the interval of time as the argument. For example -S 30min will set the start of the time window to be exactly 30 minutes from now in real-time mode, or exactly 30 minutes from the start of a PCP archive.
To specify an offset from the end of a PCP archive log, prefix the interval argument with a minus sign. In this case, the start of the time window precedes the time at the end of archive by the given interval. For example -S
-1hour will set the start of the time window to be exactly one hour before the time of the last sample in a PCP archive log.
@ctime To specify the calendar date and time (local time in the reporting timezone) for the start of the time window, use the ctime(3C) syntax preceded by an at sign. For example -S '@ Mon Mar 4 13:07:47 1996'
By default the end of the time window is unbounded (in real-time mode) or aligned with the time at the end of a PCP archive log (in archive mode). The -T option may be used to specify an earlier time for the end of the time window.
The endtime parameter may be given in one of three forms (interval is the same as for the -t option as described above, ctime is described below):
To specify an offset from the start of the time window simply use the interval of time as the argument. For example -T 2h30m will set the end of the time window to be 2 hours and 30 minutes after the start of the time window.
To specify an offset back from the time at the end of a PCP archive log, prefix the interval argument with a minus sign. For example -T -90m will set the end of the time window to be 90 minutes before the time of the last sample in a PCP archive log.
@ctime To specify the calendar date and time (local time in the reporting timezone) for the end of the time window, use the ctime(3C) syntax preceded by an at sign. For example -T '@ Mon Mar 4 13:07:47 1996'
By default samples are fetched from the start of the time window (see description of -S option) to the end of the time window (see description of -T option). The -O option allows the specification of an origin within the time window to be used as the initial sample time. This is useful for interactive use of a PCP tool with the pmtime(1) VCR replay facility.
The origin argument accepted by -O conforms to the same syntax and semantics as the starttime argument for the -T option.
For example -O -0 specifies that the initial position should be at the end of the time window; this is most useful when wishing to replay ``backwards'' within the time window.
The ctime argument for the -O, -S and -T options is based upon the calendar date and time format of ctime(3C), but may be a fully specified time string like Mon Mar 4 13:07:47 1996 or a partially specified time like Mar 4 1996, Mar 4, Mar, 13:07:50 or 13:08.
For any missing low order fields, the default value of 0 is assumed for
hours, minutes and seconds, 1 for day of the month and Jan for months.
Hence, the following are equivalent: -S '@ Mar 1996' and -S '@ Mar 1
If any high order fields are missing, they are filled in by starting with the year, month and day from the current time (real-time mode) or the time at the beginning of the PCP archive log (archive mode) and advancing the time until it matches the fields that are specified. So, for example if the time window starts by default at ``Mon Mar 4 13:07:47 1996'', then -S @13:10 corresponds to 13:10:00 on Mon Mar 4, 1996, while -S @10:00 corresponds to 10:00:00 on Tue Mar 5, 1996 (note this is the following day).
For greater precision than afforded by ctime(3C), the seconds component may be a floating point number.
Also the 12 hour clock (am/pm notation) is supported, so for example 13:07 and 1:07 pm are equivalent.
The number of performance metric names supported by PCP in IRIX is of the order of a few thousand. There are fewer metrics on Linux, but still a considerable number. The PCP libraries and applications use an internal identification scheme that unambiguously associates a single integer with each known performance metric. This integer is known as the Performance Metric Identifier, or PMID. Although not a requirement, PMIDs tend to have global consistency across all systems, so a particular performance metric usually has the same PMID.
For all users and most applications, direct use of the PMIDs would be inappropriate (e.g. this would limit the range of accessible metrics, make the code hard to maintain, force the user interface to be particularly baroque, etc.). Hence a Performance Metrics Name Space (PMNS) is used to provide external names and a hierarchic classification for performance metrics. A PMNS is represented as a tree, with each node having a label, a pointer to either a PMID (for leaf nodes) or a set of descendent nodes in the PMNS (for non-leaf nodes).
A node label must begin with an alphabetic character, followed by zero or more characters drawn from the alphabetics, the digits and character `_' (underscore). For alphabetic characters in a node label, upper and lower case are distinguished.
By convention, the name of a performance metric is constructed by concatenation of the node labels on a path through the PMNS from the root node to a leaf node, with a ``.'' as a separator. The root node in the PMNS is unlabeled, so all names begin with the label associated with one of the descendent nodes below the root node of the PMNS, e.g. kernel.percpu.syscall. Typically (although this is not a requirement) there would be at most one name for each PMID in a PMNS. For example kernel.all.cpu.idle and disk.dev.read are the unique names for two distinct performance metrics, each with a unique PMID.
Groups of related PMIDs may be named by naming a non-leaf node in the PMNS tree, e.g. disk.
There may be PMIDs with no associated name in a PMNS; this is most likely to occur when specific PMIDs are not available in all systems, e.g. if ORACLE is not installed on a system, there is no good reason to pollute the PMNS with names for all of the ORACLE performance metrics.
Note also that there is no requirement for the PMNS to be the same on all systems, however in practice most applications would be developed against a stable PMNS that was assumed to be a subset of the PMNS on all systems. Indeed the PCP distribution includes a default local PMNS for just this purpose.
The default local PMNS is located at $PCP_VAR_DIR/pmns/root however the environment variable PMNS_DEFAULT may be set to the full pathname of a different PMNS which will then be used as the default local PMNS.
Most applications do not use the local PMNS, but rather import parts of the PMNS as required from the same place that performance metrics are fetched, i.e. from pmcd(1) for live monitoring or from a PCP archive for retrospective monitoring.
In configuration files and (to a lesser extent) command line options, metric specifications adhere to the following syntax rules.
If the source of performance metrics is real-time from pmcd(1) then the accepted syntax is host:metric[instance1,instance2,...]
If the source of performance metrics is a PCP archive log then the accepted syntax is archive/metric[instance1,instance2,...]
The host:, archive/ and [instance1,instance2,...] components are all optional.
The , delimiter in the list of instance names may be replaced by white space.
Special characters in instance names may be escaped by surrounding the name in double quotes or preceding the character with a backslash.
White space is ignored everywhere except within a quoted instance name.
An empty instance is silently ignored, and in particular ``'' is the same as no instance, while ``[one,,,two]'' is parsed as specifying just the two instances ``one'' and ``two''.
As a special case, if the host is the single character ``@'' then this refers to a PM_CONTEXT_LOCAL source, see pmNewContext(3).
Since PCP version 2, version information has been associated with pmcd(1) and PCP archives. The version number is used in a number of ways, but most noticeably for the distributed pmns(4). In PCP version 1, the client applications would load the PMNS from the default PMNS file but in PCP version 2, the client applications extract the PMNS information from pmcd(1) or a PCP archive. Thus in PCP version 2, the version number is used to determine if the PMNS to use is from the default local file or from the actual current source of the metrics.
Since PCP version 3, the pmcd(1) hostname specification has been extended to allow an optional pmcd port number, and also optional pmproxy(1) hostname and port number. These supercede (and override) the old-style PMCD_PORT, PMPROXY_HOST and PMPROXY_PORT environment variables.
The following are valid hostname specifications that specify connections to pmcd on host nas1.servers.com with/without a list of ports and with/without a pmproxy(1) connection through a firewall.
$ pcp -h nas1.servers.com:44321,email@example.com:44322
$ pcp -h nas1.servers.com:firstname.lastname@example.org:44322
$ pcp -h nas1.servers.com:email@example.com
$ pcp -h firstname.lastname@example.org
$ pcp -h nas1.servers.com:44321
In addition to the PCP run-time environment and configuration variables described in the PCP ENVIRONMENT section below, the following environment variables apply to all installations.
When set, this variable defines the path to a file that contains definitions of derived metrics as per the syntax described in pmLoadDerivedConfig(3). Derived metrics may be used to extend the available metrics with new (derived) metrics using simple arithmetic expressions.
If PCP_DERIVED_CONFIG is set, the derived metric definitions are processed automatically as each new source of performance metrics is established (i.e. each time a pmNewContext(3) is called) or when requests are made against the PMNS.
Many PCP tools support the environment variable PCP_STDERR, which can be used to control where error messages are sent. When unset, the default behavior is that ``usage'' messages and option parsing errors are reported on standard error, other messages after initial startup are sent to the default destination for the tool, i.e. standard error for ASCII tools, or a dialog for GUI tools.
If PCP_STDERR is set to the literal value DISPLAY then all messages will be displayed in a dialog. This is used for any tools launched from the a Desktop environment.
If PCP_STDERR is set to any other value, the value is assumed to be a filename, and all messages will be written there.
This environment variable, previously used by pmlaunch(5), pmgsys(1), pmview(1) and the pmview(1) front-end scripts (such as mpvis(1)), has been deprecated from the PCP 2.0 release onward and replaced by PCP_STDERR.
When attempting to connect to a remote pmcd(1) on a machine that is booting, the connection attempt could potentially block for a long time until the remote machine finishes its initialization. Most PCP applications and some of the PCP library routines will abort and return an error if the connection has not been established after some specified interval has elapsed. The default interval is 5 seconds. This may be modified by setting PMCD_CONNECT_TIMEOUT in the environment to a real number of seconds for the desired timeout. This is most useful in cases where the remote host is at the end of a slow network, requiring longer latencies to establish the connection correctly.
When a monitor or client application loses a connection to a pmcd(1), the connection may be re-established by calling a service routine in the PCP library. However, attempts to reconnect are controlled by a back-off strategy to avoid flooding the network with reconnection requests. By default, the back-off delays are 5, 10, 20, 40 and 80 seconds for consecutive reconnection requests from a client (the last delay will be repeated for any further attempts after the fifth). Setting the environment variable PMCD_RECONNECT_TIMEOUT to a comma separated list of positive integers will re-define the back-off delays, e.g. setting PMCD_RECONNECT_TIMEOUT to ``1,2'' will back-off for 1 second, then attempt another connection request every 2 seconds thereafter.
For monitor or client applications connected to pmcd(1), there is a possibility of the application "hanging" on a request for performance metrics or metadata or help text. These delays may become severe if the system running pmcd crashes, or the network connection is lost. By setting the environment variable PMCD_REQUEST_TIMEOUT to a number of seconds, requests to pmcd
will timeout after this number of seconds. The default behavior is to be willing to wait 10 seconds for a response from every pmcd for all applications.
When pmcd(1) is started from $PCP_RC_DIR/pcp then the primary instance of pmlogger(1) will be started if the configuration flag pmlogger is chkconfig'ed on, some key applications from the pcp.sw.base subsystem are installed and pmcd is running and accepting connections.
The check on pmcd's readiness will wait up to PMCD_WAIT_TIMEOUT
seconds. If pmcd has a long startup time (such as on a very large system), then PMCD_WAIT_TIMEOUT can be set to provide a maximum wait longer than the default 60 seconds.
If set, then interpreted as the the full pathname to be used as the default local PMNS for pmLoadNameSpace(3). Otherwise, the default local PMNS is located at $PCP_VAR_DIR/pcp/pmns/root for base PCP installations.
Many of the performance metrics exported from PCP agents have the semantics of counter meaning they are expected to be monotonically increasing. Under some circumstances, one value of these metrics may smaller than the previously fetched value. This can happen when a counter of finite precision overflows, or when the PCP agent has been reset or restarted, or when the PCP agent is exporting values from some underlying instrumentation that is subject to some asynchronous discontinuity.
The environment variable PCP_COUNTER_WRAP may be set to indicate that all such cases of a decreasing ``counter'' should be treated as a counter overflow, and hence the values are assumed to have wrapped once in the interval between consecutive samples. This ``wrapping'' behavior was the default in earlier PCP versions, but by default has been disabled in PCP release from version 1.3 on.
The PMDA_PATH environment variable may be used to modify the search path used by pmcd(1) and pmNewContext(3) (for PM_CONTEXT_LOCAL contexts) when searching for a daemon or DSO PMDA. The syntax follows that for PATH in sh(1), i.e. a colon separated list of directories, and the default search path is ``/var/pcp/lib:/usr/pcp/lib'', (or ``/var/lib/pcp/lib'' on Linux, depending on the value of the $PCP_VAR_DIR environment variable).
The TPC/IP port(s) used by pmcd(1) to create the socket for incoming connections and requests, was historically 4321 and more recently the officially registered port 44321; in the current release, both port numbers are used by default as a transitional arrangement. This may be over-ridden by setting PMCD_PORT to a different port number, or a comma-separated list of port numbers. If a non-default port is used when pmcd is started, then every monitoring application connecting to that pmcd must also have PMCD_PORT set in their environment before attempting a connection.
The following environment variables are relevant to installations in which pmlogger(1), the PCP archive logger, is used.
The environment variable PMLOGGER_PORT may be used to change the base TCP/IP port number used by pmlogger(1) to create the socket to which pmlc(1) instances will try and connect. The default base port number is 4330. When used, PMLOGGER_PORT should be set in the environment before pmlogger is executed.
When pmlc(1) connects to pmlogger(1), there is a remote possibility of pmlc "hanging" on a request for information as a consequence of a failure of the network or pmlogger. By setting the environment variable PMLOGGER_REQUEST_TIMEOUT to a number of seconds, requests to pmlogger will timeout after this number of seconds. The default behavior is to be willing to wait forever for a response from each request to a pmlogger. When used, PMLOGGER_REQUEST_TIMEOUT should be set in the environment before pmlc is executed.
If you have the PCP product installed, then the following environment variables are relevant to the Performance Metrics Domain Agents (PMDAs).
Use this variable has been deprecated and it is now ignored. If the ``proc'' PMDA is configured as a DSO for use with pmcd(1) on the local host then all of the ``proc'' metrics will be available to applications using a PM_CONTEXT_LOCAL context.
The previous behaviour was that if this variable was set, then a context established with the type of PM_CONTEXT_LOCAL will have access to the ``proc'' PMDA to retrieve performance metrics about individual processes.
Use this variable has been deprecated and it is now ignored. If the ``sample'' PMDA is configured as a DSO for use with pmcd(1) on the local host then all of the ``sample'' metrics will be available to applications using a PM_CONTEXT_LOCAL context.
The previous behaviour was that if this variable was set, then a context established with the type of PM_CONTEXT_LOCAL will have access to the ``sample'' PMDA if this optional PMDA has been installed locally.
If set, pmieconf(1) will form its pmieconf(4) specification (set of parameterized pmie(1) rules) using all valid pmieconf files found below each subdirectory in this colon-separated list of subdirectories. If not set, the default is $PCP_VAR_DIR/config/pmieconf.
Configuration file for the PCP runtime environment, see pcp.conf(4). $PCP_RC_DIR/pcp
Script for starting and stopping pmcd(1). $PCP_PMCDCONF_PATH
Control file for pmcd(1). $PCP_PMCDOPTIONS_PATH
Command line options passed to pmcd(1) when it is started from $PCP_RC_DIR/pcp. All the command line option lines should start with a hyphen as the first character. This file can also contain environment variable settings of the form "VARIABLE=value". $PCP_BINADM_DIR
Location of PCP utilities for collecting and maintaining PCP archives, PMDA help text, PMNS files etc. $PCP_PMDAS_DIR
Parent directory of the installation directory for Dynamic Shared Object (DSO) PMDAs. $PCP_RUN_DIR/pmcd.pid
If pmcd is running, this file contains an ascii decimal representation of its process ID. $PCP_LOG_DIR/pmcd
Default location of log files for pmcd(1), current directory for running PMDAs. Archives generated by pmlogger(1) are generally below $PCP_LOG_DIR/pmlogger. $PCP_LOG_DIR/pmcd/pmcd.log
Diagnostic and status log for the current running pmcd(1) process. The first place to look when there are problems associated with pmcd. $PCP_LOG_DIR/pmcd/pmcd.log.prev
Diagnostic and status log for the previous pmcd(1) instance. $PCP_LOG_DIR/NOTICES
Log of pmcd(1) and PMDA starts, stops, additions and removals. $PCP_VAR_DIR/config
Contains directories of configuration files for several PCP tools. $PCP_VAR_DIR/config/pmcd/rc.local
Local script for controlling PCP boot, shutdown and restart actions. $PCP_VAR_DIR/pmns
Directory containing the set of PMNS files for all installed PMDAs. $PCP_VAR_DIR/pmns/root
The ASCII pmns(4) exported by pmcd(1) by default. This PMNS is be the super set of all other PMNS files installed in $PCP_VAR_DIR/pmns. In addition, if the PCP product is installed the following files and directories are relevant. $PCP_LOG_DIR/NOTICES
In addition to the pmcd(1) and PMDA activity, may be used to log alarms and notices from pmie(1) via pmpost(1). $PCP_PMLOGGERCONTROL_PATH
Control file for pmlogger(1) instances launched from $PCP_RC_DIR/pcp and/or managed by pmlogger_check(1) and pmlogger_daily(1) as part of a production PCP archive collection setup.
Environment variables with the prefix PCP_ are used to parameterize the file and directory names used by PCP. On each installation, the file /etc/pcp.conf contains the local values for these variables. The $PCP_CONF variable may be used to specify an alternative configuration file, as described in pcp.conf(4).
Also refer to the books Performance Co-Pilot User's and Administrator's
Guide and Performance Co-Pilot Programmer's Guide which can be found at http://techpubs.sgi.com.