Cisco UCS Monitoring Policies

Cisco UCS has two monitoring policies. The first monitoring policy is fault collection and suppression; this global fault policy controls the life cycle of a fault in a Cisco UCS domain, including when faults are cleared, the flapping interval (the length of time between the fault being raised and the condition being cleared), and the retention interval (the length of time a fault is retained in the system), as shown in Figure 13-5.

Figure 13-5 UCS Fault Collection and Suppression Policies

A fault in Cisco UCS has the following life cycle:

1. A condition occurs in the system, and the Cisco UCS Manager raises a fault. This is the active state.

2. When the fault is alleviated, it enters a flapping or soaking interval that is designed to prevent flapping. Flapping occurs when a fault is raised and cleared several times in rapid succession. During the flapping interval, the fault retains its severity for the length of time specified in the global fault policy.

3. If the condition reoccurs during the flapping interval, the fault returns to the active state. If the condition does not reoccur during the flapping interval, the fault is cleared.

4. The cleared fault enters the retention interval. This interval ensures that the fault reaches the attention of an administrator even if the condition that caused the fault has been alleviated and the fault has not been deleted prematurely. The retention interval retains the cleared fault for the length of time specified in the global fault policy.

5. If the condition reoccurs during the retention interval, the fault returns to the active state. If the condition does not reoccur, the fault is deleted.

The second monitoring policy is the statistics collection policy, which defines how frequently statistics are collected (collection interval) and how frequently they are reported (reporting interval). Reporting intervals are longer than collection intervals so that multiple statistical data points can be collected during the reporting interval. This provides the Cisco UCS Manager with sufficient data to calculate and report minimum, maximum, and average values.

For NIC statistics, the Cisco UCS Manager displays the average, minimum, and maximum of the change since the last collection of statistics. If the values are 0, there has been no change since the last collection.

As shown in Figure 13-6, statistics can be collected and reported for the following five functional areas of the Cisco UCS system:

Figure 13-6 UCS Collection Policies

Adapter: Statistics related to the adapters

Chassis: Statistics related to the chassis

Host: A placeholder for future support

Port: Statistics related to the ports, including server ports, uplink Ethernet ports, and uplink Fibre Channel ports

Server: Statistics related to servers

Note

The Cisco UCS Manager has one default statistics collection policy for each of the five functional areas. You cannot create additional statistics collection policies, and you cannot delete the existing default policies. You can only modify the default policies.

The values that are displayed for the delta counter in the Cisco UCS Manager are calculated as the difference between the last two samples in a collection interval. In addition, the Cisco UCS Manager displays the average, minimum, and maximum delta values of the samples in the collection interval.

Leave a Reply

Your email address will not be published. Required fields are marked *