Catalog Basics: SMF Forward Recovery (continued)

Continued from ETJ

By Blair Svihra


…The BCS, by design, contains minimal information. It can best be compared to the old-style Control Volume (CVOL), used primarily as a data set locator. Although additional information is contained in the BCS, its function is primarily the same as the CVOL – quick access to the volume containing the data set and its critical information.

A volume’s VVDS contains detailed information relating to each VSAM data set and systems managed storage (SMS) non-VSAM data set physically residing on a volume. The VVDS does not describe data sets on other volumes or extensions of data sets on other volumes. Data set attributes, extent information, Access Method Data Statistics Block (AMDSB) information, etc. for VSAM data sets, and SMS information for nonVSAM data sets are described by VSAM Volume Record (VVR) and nonVSAM Volume Record (NVR) records.

Inaccessibility of either the BCS or the VVDS can have disastrous effects on your catalog environment. An understanding of the functionality and contents of both components is helpful in deciding the appropriate course of action. Although the critical information resides in the VVDS, recovery from either component is extremely important. If an ICF catalog failure occurs, many recovery options are available. The first and foremost concern of any recovery procedure is complete synchronization and accountability of all catalog entries.

Recovery of the BCS or VVDS normally involves either an extraction and rebuilding of the current contents or a restoration from a previous backup. The preferred method is recovery using the “current” contents. Rebuilding with up to date information tends to reduce the number of catalog discrepancies found during post recovery validation. Either way, diagnostics should be run after each procedure to verify the integrity of the recovery. Clearly, certain failures, including most DASD related hardware problems, require that the component be restored and up-leveled to the point of failure. Depending on an installation’s recovery procedures, this may be the preferred procedure for all ICF catalog recovery options.

 

Up-leveling

Up-leveling of a restored BCS may be accomplished in several ways. The first is to remove incorrect or extraneous data set entries from the BCS using IDCAMS “DELETE NOSCRATCH” control cards. Missing entries are recreated using IDCAMS “DEFINE RECATALOG” control cards. Whether this process is manual or automated, it is a time-consuming procedure usually associated with recovery validation. Another way is by applying copies of BCS changes, recorded by the operating system directly to the back-leveled BCS or to a backup copy of the catalog. The ultimate outcome is the complete up-level of the BCS to the point of failure.

Restoration and resynchronization of a VVDS is far more complicated than that of a BCS. Normally, if a VVDS is unusable, the complete volume must be restored. This ensures that the critical information in the VVDS is in-synch with the data sets on the volume and all related Volume Table of Contents (VTOC) entries. Correcting potential discrepancies between the VVDS and the BCS is an easier task than addressing synchronization problems between the VVDS and VTOC.

As the capacity of DASD devices increases, the back-leveling of all data sets on a volume, due to a VVDS failure and subsequent full-volume restore, becomes a less attractive recovery option. This is especially true on non-SMS volumes with a large number of nonVSAM data sets. Back-leveling data sets with no corresponding VVDS records is extremely counterproductive and labor-intensive. In addition to the resynchronization of catalog entries, a considerable amount of time might be spent identifying and executing production jobs that need to be rerun for all restored data sets.

The operating system records updates to the VVDS by the VVDS Manager. The VVDS Manager is the component in catalog management that is responsible for VVDS activity. Applying copies of VVDS changes to a restored VVDS up-levels the VVDS to the point of failure. Up-leveling theVVDS without back-leveling data sets on the volume is the primary goal of VVDS forward recovery.

IBM’s SMF interface records specific BCS and VVDS activity in the form of internal records. Module IGG0CLED handles both BCS and VVDS catalog records written to SMF. SMF records dealing with ICF catalogs are nothing more than copies of BCS or VVDS records appended to a fixed length header. The header information for ICF catalog records contains the SMF record type, the date and time the SMF record was generated, the system (SMF ID) that recorded the SMF record, the BCS or VVDS name to which the catalog record is related, and a two-byte indicator specifying whether the record was added, deleted, or replaced an existing record. These snapshots of catalog records are produced and recorded when catalog changes occur through SVC26 or catalog requests.

Recording BCS and VVDS record activity via SMF is a relatively straightforward audit process performed by the operating system. All SMF records selected by parmlib member SMFPRMxx are written to the SMF facility. Additional steps are required to offload the SMF records from the system to allow the SMF data sets to be reused and continue the recording process. The offloaded records are usually saved for a specific period of time in case a BCS or VVDS must be recovered. Knowing how the system records this information is unimportant to BCS and VVDS recovery. That is the job of the utility used to perform SMF forward recovery. The person(s) responsible for catalog recovery must be aware that input needed for recovery may reside in either the “dumped” SMF files and/or in the active SMF repository.

 

SMF Record Types

Restoration and up-leveling of a BCS via SMF records is a relatively simple procedure. IBM records BCS catalog activity in the following internal records:

Individual record types do not imply a specific type of catalog activity. For example, internal records related to Define Activity (Type 61) may reflect a new record added or an existing record rewritten. When an alternate index is defined over a base cluster, Type 61 records are generated for each of the alternate index records added to the BCS. A Type 61 record is also generated to update the base cluster record already residing in the BCS with the new alternate index information. Even though a catalog record is updated, it is not logically linked to Type 66 Alter activity. A mixture of add, delete and alter modifications occurs for both Type 65 and Type 66 BCS activity.

One type of SMF record exists for VVDS activity: Type 60 (VVDS Activity). An SMF record is generated every time a VSAM data set described in a VVDS is closed, updated, extended, etc. SMF records for SMS nonVSAM data sets are not generated when the number of extentsincreases. Extent information is only recorded in the VTOC for these data sets, not in the VVDS like their VSAM counterparts. Since only one type of SMF record is generated for VVDS activity, all additions, deletions and updates of VVDS records are recorded together. Information in the SMF header indicates to the recovery utility the type of processing that must be performed on behalf of the VVDS.

 

Reliability

SMF forward recovery sounds great, but unfortunately, it may not be 100% reliable. Any number of factors can affect SMF forward recovery – missing SMF input, CPU clock differences in a multi-CPU environment, system software problems, daylight savings time, time constraints, etc.

The real problem in SMF forward recovery is not knowing or understanding how updates are applied but, rather, the managing of SMF input. Identifying the latest BCS or VVDS backup and locating all the SMF input required to up-level the BCS or VVDS are tedious tasks. These recovery steps may be addressed during the actual recovery instead of being implemented ahead of time. Unfortunately, the SMF dump facility does not provide the flexibility to separate different BCS and VVDS SMF records to different output files. Having all catalog SMF records grouped together increases the size of the input file used during SMF forward recovery. Depending upon the volume of catalog activity, separation and use of only the SMF records corresponding to the entity being recovered should decrease processing time. Also, the SMF dump facility does not keep track of the actual time period that records were generated for each individual BCS and VVDS. These tasks need to be managed for all systems generating SMF records for an ICF environment.

Quick identification of the starting point and required input can have a profound effect on the integrity and efficiency of catalog recovery. SMF Type 36 (ICF EXPORT) records are generated when an IDCAMS EXPORT of a BCS is complete. This provides a reference or starting point as to when SMF updates should be applied. The Type 36 record is not generated when full volume backups are performed. The Type 36 record name, “ICF CATALOG” is somewhat misleading. The phrase “ICF CATLOG” normally implies a BCS and all related VVDSs. In this instance, it only refers to the BCS. IBM does not support the backup and SMF forward recovery of a VVDS. Dino Software’s T-REX offering fully supports both BCS and VVDS SMF forward recovery.

Ensuring the availability of SMF data is critical to the forward recovery process. Collection of SMF data from all systems that access a broken BCS or VVDS is required. The data may reside in online SMF repositories and/or sequential data sets containing “dumped” SMF records. The accidental elimination of required input can be disastrous. Clearing the online SMF data sets without first dumping them or failing to offload SMF data sets when they are full can contribute to loss of data. Message “IEE316I SMF DATA LOST – NO DATA SETS AVAILABLE” indicate that SMF can no longer record SMF information. The SMF being discarded may very well be copies of catalog updates. Log stream logging of SMF data has considerable advantages to the use of the “MANx” data sets. A condition code zero from an SMF forward recovery utility that did not use all required SMF input does not necessarily indicate the recovery was successful. The integrity of a recovered object is determined by the input used to perform SMF forward recovery. Again, the quick identification of the latest backup and the gathering of all required SMF records should be an automated ongoing process.

 

Time Considerations

Another important point to consider in SMF forward recovery is the potential time differences in a multi-CPU environment using shared DASD. If BCS and VVDS updates occur from more than one system, adequate procedures must be implemented to adjust for this difference. The date and time the SMF record was generated is stored in the SMF header. If catalog updates (delete/define activity) for a specific data set occur on different CPU’s, BCS and VVDS records may not be correctly reapplied.

For example –

Given the above sequence of events, the SMF input for the delete of cluster A.B.C may appear before the initial define. SMF records used by a forward recovery utility are usually sorted by the date and time the SMF record was generated. The result may be that the recovered BCS contains records for cluster A.B.C even though the data set does not exist on the volume. Catalog updates in a multi-CPU environment must be synchronized via each CPU clock, or catalog updates should be limited to specific systems.

If catalog problems are a result of a software bug, knowing when to stop SMF forward recovery is very important. When record overlays occur, SMF records the corrupted BCS and VVDS records exactly as they were written. SMF does not validate the contents of the BCS or VVDS record as it is added to the SMF header. These records are snapshot of catalog activity, whether they are correct or not.

SMF records selected by a forward recovery utility reapply the catalog record as it appears in the SMF record. Reapplying the exact catalog update that caused the original catalog failure will result in a recovered BCS or VVDS that is corrupted. If a catalog failure is caused by a software problem, the offending SMF should be identified. IEC331I, IEC332I and IEC333I messages issued for a BCS usually indicate some type of catalog problem. Noting the time these messages appeared in the SYSLOG should give some indication of the events leading up to a failure and when SMF forward recovery should be terminated.

Daylight savings time presents an interesting roadblock to SMF forward recovery. During the spring, clocks are adjusted one hour ahead. The one hour gap in SMF data should not cause a problem with a forward recovery utility. When the clocks are reset in the fall, the time is adjusted back one hour. If processing occurs during the hour before the clock was adjusted, the operating system would record two hours’ worth of catalog updates during one logical hour. Although this potential problem exists, very few installations have had to address this situation. Quiescing the system for the hour before the clocks are adjusted will eliminate this problem but is not very practical in today’s 24×7 environment.

 

Planning recovery time

The number of SMF records generated by catalog management is very large. This is especially true for Type 60 – VVDS Activity. Knowing how long the recovery of a BCS or VVDS will take using SMF data should be factored into an installations recovery plans. If a large number of catalog updates occur on your systems, then BCS and VVDS backups should be scheduled more frequently. This reduces the number of SMF records generated between backups and should decrease recovery time. Even though fewer SMF records may be required for recovery, identification and management of the SMF input and the locating of backups by individual BCS and VVDS names should be performed. Subscribing to and implementing a specific recovery methodology is important for a thorough and complete BCS or VVDS recovery.

SMF data serves as a journal of ICF catalog activity needed to up level a BCS or VVDS. Although potential problems do exist and may complicate ICF catalog recovery, the use of SMF data should be included in every data centers recovery plans. Management of SMF data, when handled correctly, can greatly speed recovery efforts.

 


Blair Svihra has been in the z/OS industry for over 30 years, having extensive expertise with ICF catalogs. He was the original author of VSAM Mechanic (Catalog Solution) and has contributed to many other products, including T-REX and Universal Data Manager (UDM).  Blair is currently a Sr. Product Development executive with Dino-Software Corp.

About Dino-Software

Dino-Software Corporation develops enterprise-wide solutions for the management, analysis, protection, and repair of complex z/OS mainframe environments.  Dino-Software has long been acknowledged for its superiority in ICF catalog management and technical support, helping organizations ensure their business-critical assets remain online and recoverable in a disaster.  Learn more about Dino-Software and its z/OS mainframe storage solutions at https://dino-software.com.

 

Tech Talk Articles & Whitepapers