&

Tracking a Partitioned System p Overall Utilization

Description

The pupose of this program is to automate the conversion of nmon csv files into web pages. It provides long term trend charts, and can aggregate performance of all partitions residing on a physical server.

Update: February 23, 2011

Changes in this version have been completed by Allan Cano at Itrus.com.

See below for installation instructions.

You MUST MUST MUST update the /path/etc/nmon2web.cfg file or zero it out if you choose to update the nmon2web pl and cgi scripts.

For current users, replace the nmon2web.pl and nmow2web.cgi programs with the ones in the tar.gz file and update paths as needed. You will want to check the remaining files for differences, excepting find_max_nmon_val.pl which has not changed. If you update, be sure the customization setting are changed as well.


Update: February 11, 2009

Changes in this version:

See below for installation instructions.

For current users, replace the nmon2web.pl program with the one in the tar.gz file. You may want to check the remaining files for differences. However, no changes have been made to them since June 2008. If you update, be sure the customization setting are changed as well.


This is the third of a series of tips that illustrate how to automate the collection and display of nmon performance data from multiple servers. This tip extends the capability of the previous tips by adding CPU, memory, virtual I/O aggregation across partitions residing on the same physical server. This tip is targeted primarily for micropartitioned systems. However, it may be of broader interest for the following new features:

  1. More flexibility in choice of charts (replaced the nmon2rrd utility with Perl)
  2. Displays non-default AIX settings for ease of management.
  3. Displays Change control logs for AIX settings and hardware configuration.
  4. Work Load Manager - displays absolute CPU utilization by class (useful for micropartitions, where %utilization is meaningless)
  5. Centralized rrdtool database for easier data extraction so you can write your own programs. (The nmon2rrd tool created a separate rrdtool database for each nmon file.)
  6. Uses less disk space - removed duplicate databases (daily and long term)

I originally had two objectives in creating this tool

  1. Automate the creation of daily nmon charts
  2. Aggregate CPU and memory usage across multiple partitions on a micropartitioned server.

The scope grew to include the aggregation of virtual I/O. However, there are some limitations in this approach (like how to handle mulitple VIO servers). So be sure to understand the limitations listed at the bottom of this page. Otherwise, I've found this to be a very useful tool for tracking performance.

Process Overview

This tool organizes and creates charts on a centralized server using nmon data from multiple servers. Each server (standalone, LPAR, micropartition) uses "nmon -f" to collect daily performance. At the end of the collection period, the nmon file is transfered to a staging directory on a centralized web server. (I leave the details of the data transfer up to you.) On the web server, the "nmon2web.pl" script organizes the data by server and stores it in "rrdtool" database. It also creates the daily web pages.

I've tried to automate the process. For example, to add a new server, simply put the new nmon file in the web server's staging directory. The "nmon2web.pl" will figure out that this is a new server, and will create the necessary directories, rrdtool databases, etc. It will also add the new server to the web page.

  1. All Servers: The "nmon" program collects performance data on AIX LPAR's, micropartitions and standalone servers.
  2. Web Server: The "nmon2web.pl" script processes the nmon files
  3. PC Browser: Point browser to the "index.html" page on the web server

Installation Steps for Servers

  1. Install nmon performance monitor tool (V11 is preferred, but V10 will work, V9 will work, sorta)
  2. Use cron to automate nmon data collection.
    # following cron entry will run nmon with a 10 minute sample rate, starting
    # at 00:01 for 24 hours:
    #
    01 00 * * * (cd /system_dir/nmon/HOSTNAME; /usr/local/bin/nmon -x)
    
  3. Automate upload of the nmon files to web server. (I run ftp as a cron job)

Installation Steps for Web Server

  1. Comment: My test web server is a Linux on Power micropartition. It should work on AIX web servers as well.
  2. Install the "rrdtool"
  3. Unpack the nmon2web.tar.gz (gzip -dc nmon2web.tar.gz |tar -xvf-)
  4. Install nmon2web.cgi
  5. Install nmon2web.pl
  6. Create $HTTP_DIR and $NMON_DIR directories

Comments

Aggregating CPU and memory are relatively straight forward. However, aggregating virtual I/O and ethernet is more challenging. By default the nmon2web.cgi program aggregates virtual utilization by summin all vscsi adapters across all partitions (LPAR and micropartition). For ethernet, the program sums en0 traffic only on micropartitions. The problem is that the program could double count vscsi workload, or assume the wrong ethernet interface.

You can specify your virtual scsi and ethernet configuration by creating the file $SYSTEM_DIR/Shared/sharedpool/virtual.cfg. There's a template file in the same directory that explains how to configure.

Known Limitations

This program is not backward compatible with Parts 1 & 2. The file systems have been reorganized, the rrdtool databases centralized. The nmon data should be reloaded from scratch.

Adding or deleting servers can cause blank aggregated charts. The underlying "rrdtool" doesn't handle missing data when aggregating data. So if you add a add/remove a micropartition on a server, you may get blank charts when you try to display aggregated performance over a time period where the server is missing.

Adding new devices (scsi, fcsi, ethernet) may cause blank graphs. Same reason as above. New devices will have to be added manually to the appropriate rrd file.

Short nmon sampling intervals increase disk requirements on the web server. For example, a 1 minute interval and 3 year data retention (default) used about 500 MB of disk space (reserved at setup time by the rrdtool). I recommend using an nmon sampling interval of 10-20 minutes.

Linux Empty charts for "system calls". (nmon produces negative numbers)

Linux on Power If running on a micropartitioned system with AIX, the CPU free pool may look 100% busy. Displays as a standalone server (nmon for pLinux doesn't report the serial number, and consequently doesn't get assigned to a server).

Other Please report other issues. I have a limited "sandbox" and have not tested every combination of hardware/operating system.