Cisco UCS Monitoring Events and Logs

The Cisco UCS Manager generates system log, or syslog, messages to record the following incidents that take place in the Cisco UCS Manager system:

Routine system operations

Failures and errors

Critical and emergency conditions

There are three kinds of syslog entries: Fault, Event, and Audit. Each syslog message identifies the Cisco UCS Manager process that generated the message and provides a brief description of the operation or error that occurred. The syslog is useful both in routine troubleshooting, incident handling, and management.

The Cisco UCS Manager collects and logs syslog messages internally. Also, you can send them to external syslog servers running a syslog daemon. Logging to a central syslog server helps in aggregation of logs and alerts. Some syslog messages include DIMM problems, equipment failures, thermal problems, voltage problems, power problems, high availability (HA) cluster problems, and link failures.

Syslog messages contain event codes and fault codes. To monitor syslog messages, you can define syslog message filters. These filters can parse the syslog messages based on the criteria you choose. You can use the following criteria to define a filter:

By event or fault codes: Define a filter with a parsing rule to include only the specific codes that you intend to monitor. Messages that do not match these criteria are discarded.

By severity level: Define a filter with a parsing rule to monitor syslog messages with specific severity levels. You can set syslog severity levels individually for OS functions, to facilitate logging and display of messages ranging from brief summaries to detailed information for debugging.

Cisco devices can send their log messages to a UNIX-style syslog service. A syslog service simply accepts messages and then stores them in files or prints them according to a simple configuration file. This form of logging is the best available for Cisco devices because it can provide protected long-term storage of logs. Figure 13-2 shows UCS syslog configurations.

Images
Images

Figure 13-2 UCS Syslog Configurations

UCS will generate system event logs (SELs) as well. An SEL is used to troubleshoot system health. It records most server-related events, such as instances of over- or under-voltage, temperature events, fan events, and BIOS events. The types of events supported by SEL include BIOS events, memory unit events, processor events, and motherboard events.

The SELs are stored in the CIMC NVRAM, through an SEL log policy. It is best practice to periodically download and clear the SELs. The SEL file is approximately 40 KB in size, and no further events can be recorded after it is full. It must be cleared before additional events can be recorded.

You can use the SEL policy to back up the SEL to a remote server and optionally to clear the SEL after a backup operation occurs. Backup operations can be triggered based on specific actions, or they can be set to occur at regular intervals. You can also manually back up or clear the SEL. Figure 13-3 shows UCS SEL configurations for specific chassis because SELs will be saved in CIMC NVRAM.

Images

Figure 13-3 UCS System Event Logs

The backup file is automatically generated. The filename format is sel-SystemName- ChassisID-ServerID-ServerSerialNumber-Timestamp—for example, sel-UCS-A-ch01- serv01-QCI12522939-20091121160736.

Because the UCS is a computer device, it will record any user activity. Audit logs record system events that occurred, where they occurred, and which users initiated them, as shown in Figure 13-4.

Leave a Reply

Your email address will not be published. Required fields are marked *