In this chapter, you will understand and demonstrate knowledge in the following areas:
|Database recovery without archiving|
|Archiving Redo Logs|
The purpose of this chapter is to present the DBA with the capabilities of Oracle to handle database recovery. The first portion of the discussion will focus on introducing database recovery. The various types of database failure that may occur will be illustrated, along with the need for the DBA to communicate effectively in this time of crisis. The discussion will also present the types of recovery available in Oracle. This discussion will identify the type of recovery, the backup requirements that type of recovery presents, and more. The remainder of the chapter will focus on the methodology required for restoration of a database that has experienced a problem when the redo logs are not archived. Recall that redo logs contain records of all transactions that have occurred on the database. The redo logs can be archived to assist in database recovery only when the DBA uses physical backups as the method for storing copies of the database. Approximately 27 percent of the OCP Exam 3 content is material presented in this chapter.
In this section, you will cover the following topics related to recovery theory:
|Managing database failure|
|Managing database recovery|
|Role of database archiving|
|Types of recovery|
In order to master the art of database recovery, the DBA should be mindful of both the technical requirements of various aspects of database recovery and the need to communicate effectively in times of crisis. This section focuses on failure scenarios, managing database failure, recovering the database without archiving, and the importance of using and startup of database archiving.
There are several different scenarios for failure in the Oracle database. These scenarios can be divided into five general categories, which will be discussed in this section. The five categories include statement failure, user process failure, instance failure, user error, and media failure. Each of these categories has different implications on the need for DBA intervention in order to provide recovery from the situation.
When Oracle cannot process a statement issued by a user, this situation is generally known as statement failure. There are several causes for statement failure. First of all, a user may be attempting to issue a statement referencing a table that does not exist, or to which they are not granted permission to use. In this case, Oracle will issue the user an error that details both the are of the statement that contained the error and a reference to a special code, called the Oracle error message code. Recovery from this error situation consists of Oracle immediately rolling back any changes made by the statement automatically.
The failure of a user process requires more intervention from the Oracle Server. In some cases, the user may cancel or terminate the statement or process they are running with a ctrl-c command from SQL*Plus. If a user process terminates, then the process monitor (PMON) background process has to intervene. When a user process fails, PMON is there to handle automatic process recovery for the database in several areas. Some of those areas are listed below:
|Rollback on the failed process|
|Release of locks on tables and internal latches the process acquired|
|Removal of that process identifier from active processes in the V$ performance views|
There are many different things that may cause an instance to fail, including a problem with the memory of the host system running Oracle, a power outage, or disk corruption. Although instance failure requires the DBA to intervene in order to restart the database, the recovery of the instance is handled by the Oracle background process system monitor. SMON handles the rebuilding of the Oracle SGA, rolling forward all transactions in progress at the time of the crash, and then rolling back all uncommitted work that was in progress at the time the database crashed. When the DBA starts the database, SMON begins the tasks required to restore the instance to full operational capacity. SMON may take a while to recover the instance, depending on the transaction volumes on the database at the time of database failure.
Several situations may arise as a result of mistakes produced by users, such as accidentally deleting data or database objects from the Oracle database. This situation can test the limits of a backup and recovery strategy. If the problem is related to data change, the user may be able to recover using the rollback command. However, the dropping of tables or other objects may require DBA intervention, recovering the entire database to another machine, exporting the dropped object, and then restoring the object to the appropriate environment. To avoid this problem in production, only the DBA should be allowed to create database objects, or the task should be given to a privileged user.
Media failure means the loss of information on a disk that has either been corrupted at a hardware level or suffered some other form of irreparable damage. There are two types of media failure that may occur on the database, temporary and permanent media failure. If the disk is temporarily unavailable because a disk controller card failed, the problem should be easy and fast to correct with simple hardware replacement. Permanent media failure is caused by physical or magnetic damage to the casing in which the disk is stored or other corruption of data stored on disk. Permanent media failure requires the DBA to restore the file lost from backup copies of the database as well as any database changes made after the most recent backup with the application of archived redo log information, if archiving is used. Backup and recovery strategy of an organization is designed to act as an insurance policy against problems in the database that render the database unusable. Database recovery requires a strong understanding of operating-system-specific commands and processes for physical Oracle file manipulation, as well as an understanding of Oracles recovery mechanisms. The time spent recovering the database depends on several critical factors:
|Accessibility of backups Both the physical availability of backups (onsite or offsite) AND the accessibility of backups on hardware storage media. Disk is fastest; tape is slower.|
|Frequency of backups More frequent backups mean faster recovery, because fewer archived redo logs need to be applied for the same amount of recovered data.|
|Type of failure Some types of failure are easier to fix and less time consuming than others. For example, if the database lost one disk drive that contained only a few read only tablespaces, the DBA will spend less time recovering than if the DBA lost several disks of volatile data that were backed up infrequently.|
|Type of backups Physical backups provide better recoverability than logical exports, because archived redo can be applied to handle the changes made after the most recent backup was taken.|
Database recovery is the most critical skill a DBA can master to provide value to an organization. However, it is not a skill that is easily encapsulated into one or two commands or parameters in the way that dropping database objects can be reduced to a statement. Of all areas of Oracle database administration, the mastery of database recovery is the trickiest. Not only must a DBA have the experience it takes to develop seasoned, instinctive approaches to recovery of a database, but the DBA must also be a facilitator of good communication. Most DBAs in larger IT organizations with many databases will know the intricacies of every application using the database. Knowing who to contact is as important as technical skill. For recovery requiring a lot of downtime, the decision to recover must be made in consensus. In crisis situations, users and managers generally have a few specific questions they need to have answered. Those questions are as follows:
|What caused the problem?|
|Who is affected by this problem?|
|What steps are required to correct the problem?|
|How long will it take to correct this problem?|
|Can this problem be prevented?|
A few cases of hands-on experience provide the best instruction for DBAs who want to know the right things to communicate in the organization when a database crisis arises. However, the DBA with the technical expertise and strong communication skills will be effective in crisis situations quickly.
With physical backups, it is an excellent idea to use the archivelog option of Oracle. The method for turning archiving on is covered shortly. Archiving of redo logs should be used in any production database environment where data changes are frequent and complete recovery to the point of failure is required. With archived redo logs, there are more options for database recovery. Data changes made after the most recent backup are recorded by redo logs that can then be applied to the database after that backup is restored, for full database recovery. Without the archived redo logs, the data changes for a database are lost every time a log switch occurs. Recall the process used to write online redo logs. Two or more redo logs store online redo taken from the redo log buffer and copied to disk by LGWR. Each redo log may consist of, ideally, more than one member, mirrored (or "multiplexed") on multiple disks to minimize the Oracle databases dependency on any one disk. If a disk drive fails that contains the only member of the current online redo log, the Oracle instance will fail.
When an online redo log fills, a log switch occurs. Old redo log will be archived automatically by ARCH if the DBA has issued the archive log start statement from Server Manager or if the LOG_ARCHIVE_START initialization parameter is set to TRUE before instance startup. LGWR or CKPT will write the current redo log sequence information to the datafile headers and to the control file, then start writing redo log entries to the other online redo log. A checkpoint also occurs. LGWR tells DBWR to flush the dirty buffer write queue of the database buffer cache in the SGA. Flushing the dirty buffers means that DBWR will write the data changes made to database blocks in the buffer cache out to the datafiles on disk that contain copies of the blocks. Data in the online redo log is no longer required.
Situations where archiving may not be required include read only databases, or databases that experience a low frequency of data changes. Databases in this category may be better off using logical exports for backup, or offline physical backups. Either approach will work to store the data once changes are made, providing needed recoverability for the system with only nominal effort and maintenance on the backup process by the DBA. However, any type of online transaction processing system with data volatility should archive redo logs. By saving all the data changes made to the database with the use of archived redo logs, the DBA can recover all committed data changes to the database with minimal requirements for end users to reenter data they may have placed into the trust of the database environment.
In general, there are two categories for database recovery: recovery with archiving and recovery without. Without archiving, the database can be recovered to the point in time where the last database backup took place. With archiving enabled, the DBA has more options in the event of a disaster. Usually, the DBA will perform a database recovery to the point of failure on the database. This is complete recovery. To have complete recovery, the DBA needs the most recent database backup and a full set of archived redo logs for that backup to the time of failure. There are other options for incomplete recovery as well. Incomplete recovery is the recovery option available when the DBA does not have a copy of every archived redo log present in order to make a full recovery, or not every redo log up to the last committed transaction will be used for the recovery of the database. In this situation, the DBA can only recover the damaged data to the point in time prior to the missing archived redo log.
Database recovery is handled with the recover command. The recover command is available from the Server Manager utility or in conjunction with the alter database statement. There are three different types of media recovery that can be performed on the database with respect to the scope of recoverability required for the database. The first type of recovery available in situations where archiving is enabled is database-wide recovery. The command for database-wide recovery is the recover database command. A database that requires database-wide recovery cannot be available for use by the applications and users. When the recover database command is issued, Oracle will examine all tablespaces and datafiles of the database and apply archived redo log information to those areas that require recovery. The instance must be mounted in exclusive mode. Otherwise, the database should remain closed. An example for specifying the database recovery command appears in the following code block. Note that the database name need not be specified either in the shutdown or the recover command, as Oracle will operate on the database that is mounted to the instance. From Server Manager, the following commands may be issued:
STARTUP PFILE=orgdb01.ora MOUNT EXCLUSIVE orgdb01
ALTER DATABASE orgdb01 RECOVER DATABASE;
A more localized option is the recovery of a tablespace. This option is accomplished with the recover tablespace command within Server Manager. Recovering a tablespace allows the database to be open for use while the recovery takes place. The only area of the database that is not available during the period of tablespace recovery is the tablespace itself, which must be offline for the duration of the restoration. A tablespace can be brought offline using the alter tablespace offline SQL statement, either from Server Manager or from SQL*Plus. Even more localized is the recovery of an individual datafile for a tablespace. This option is accomplished with the recover datafile command from Server Manager, and can be compared to laser surgery in that it offers the DBA a highly selective option to restore lost data while the rest of the database stands untouched. This operation, along with the overall requirements for recovery while leaving the database online, will be discussed in Chapter 15, in the section on minimizing the downtime of a database. The following code block demonstrates the online recovery of a tablespace:
ALTER TABLESPACE data_301 OFFLINE;
ALTER DATABASE orgdb01 RECOVER TABLESPACE data_301;
Database recovery is handled in an interactive manner when archived redo logs are involved. Oracle expects that all archived redo logs required for database recovery will be located in the LOG_ARCHIVE_DEST location specified as part of the init.ora parameter file. If the redo logs are restored from archive at another location due to space constraints, the DBA should specify the new location.
There are three types of incomplete database recovery: change-based, time-based, and cancel-based recovery. All three of these types of recovery are incomplete forms of recovery because they will limit the number of archived redo logs that are applied to the database, either by time, by change, or by cancellation. These three types of recovery will take place generally only when the database has been mounted in exclusive mode and otherwise left closed.
In change-based recovery, Oracle will restore database changes made up to the change that is specified by the DBA as part of the command that initiates the recovery. The DBA may, in some cases, use the change-based recovery option to identify the last known good transaction to have taken place for the purpose of recovering the database to the point just after that transaction committed. The change that indicates the end of the recovery to Oracle is defined with the use of a system change number, or SCN. The SCN is a number that Oracle assigns to every transaction that occurs in the database. This "transaction ID number" is then attached to every statement that executes as part of the transaction. The change information can then be tracked both in the rollback segments and the online redo logs, gathering all statements that make changes together as one transaction. The SCN is specified as an integer. To find the last SCN archived by Oracle or what the SCN that was written at the last checkpoint was, the DBA can query the V$DATABASE dynamic performance view. The following code block demonstrates the specification of change-based recovery:
ALTER DATABASE orgdb01 RECOVER DATABASE UNTIL CHANGE 4043;
The incomplete database recovery may be conducted to a certain point in time. This type of recovery is considered to be a time-based recovery. Time-based and change-based recoveries are similar in that Oracle will restore data to the database to some point in the past. The appropriate format for the time a DBA can specify is a four-digit year, followed by a two-digit month, followed by two-digit date. These three items are separated by a hyphen. Immediately following the date should be a colon, followed by the hour of the day in 24-hour format, followed by the minute, followed by the second. The time components should be separated by colons. A sample alter database statement with recovery to a point in time identified is listed in the following code block:
ALTER DATABASE orgdb01 RECOVER UNTIL 1999-12-05:14:34:00;
The final type of incomplete recovery considered is the cancel-based recovery. This type allows the DBA to conduct database recovery until a cancel command is issued. Cancel-based recovery requires the DBA to monitor the recovery process. In order to demonstrate the usage of cancel-based recovery, the following code block illustrates an alter database statement that specifies cancel-based recovery. In order to cancel recovery, the DBA can issue the cancel command as Oracle interactively requests the DBA to confirm the application of suggested redo logs.
ALTER DATABASE orgdb01 RECOVER UNTIL CANCEL;
During normal operation of the database when archiving is enabled, the archived redo entries are moved to a location specified by the LOG_ARCHIVE_DEST parameter. This parameter is declared in the init.ora file, which is used at instance startup. The archived redo logs will then be taken out of the LOG_ARCHIVE_DEST and stored on offline storage media, usually tape. When database recovery is required, then the DBA must put the required redo logs back into the LOG_ARCHIVE_DEST. However, there may be situations where all the archived redo information required for the database recovery will not fit into the location specified for LOG_ARCHIVE_DEST. In this case, the DBA may specify an alternate location for putting archived redo information with the set logsource command from within Server Manager before the issuance of the alter database recover command. An example for using this command within Server Manager at the command prompt is listed below:
SET LOGSOURCE /u01/oracle/archive/bkp
Alternately, the DBA can specify a location for the archived redo log information in the statement used to recover the database. The syntax for this command used is the from keyword. An example of this clause appears in the following code block:
ALTER DATABASE orgdb01
RECOVER FROM /u01/oracle/archive/bkp
DATABASE UNTIL CANCEL;
Once incomplete recovery is complete, the database will reflect the changes made to it up to the specified point in time in the past. All archives of changes made to the database after that point in time must be discarded. To dispose unneeded redo information after database recovery has taken place, open the database with resetlogs option. The redo log number is reset to one , and Oracle starts writing a new set of redo logs according to LOG_ARCHIVE_FORMAT. Examples for discarding and not discarding redo logs at the time the database is opened appear in the following code block:
ALTER DATABASE OPEN RESETLOGS;
ALTER DATABASE OPEN NORESETLOGS;
The course of conducting database recovery is an interactive one within Oracle. As the recovery proceeds, Oracle prompts the DBA to specify which archived redo log to apply next. Based on information supplied to the database by LOG_ARCHIVE_DEST for the location of the archived redo logs and LOG_ARCHIVE_FORMAT for the naming convention of the archived redo log files, Oracle can recommend to the DBA the next archived redo log to apply as part of the database recovery. However, the DBA must give some manual intervention, either by confirming that "suggestion" or specifying the appropriate file. It is during this interaction between Oracle and the DBA that the DBA may also choose to cancel the recovery altogether in the case of cancel-based incomplete database recovery. However, if desired, the DBA can also bypass the interaction between Oracle and the DBA entirely with the use of automatic recovery. When automatic recovery is used, Oracle will prompt the DBA with its suggested next archived redo log to apply, and then Oracle will automatically apply that redo log suggestion. This utility reduces the need for the DBA to interact with the database, especially when all archived redo logs are in place specified by LOG_ARCHIVE_DEST and Oracle can correctly identify them using LOG_ARCHIVE_FORMAT. The following code block identifies the correct syntax and semantics for using automatic recovery:
ALTER DATABASE orgdb01
DATABASE UNTIL CHANGE 39455;
ALTER DATABASE orgdb01
DATABASE UNTIL TIME 1999-01-20:20:04:00;
In this section, you will cover the following topics related to database recovery without archiving:
|Using a full physical backup|
|Recovering read only tablespaces|
This discussion focuses on the tasks required to recover a database using a full physical backup without archiving. The advantage to not use archiving is simplicity in database recovery. Only the most recent full backup need be applied to the database. On the other hand, if archiving is not enabled on the database, then all changes that have taken place since the most recent full backup will not appear in the database after the recovery takes place. Additionally, the requirements for recovering a read only tablespace will be addressed.
The DBA must conduct a complete recovery from full backup if the database does not have archiving enabled to resolve problems of media failure. The database cannot be open for use during complete recovery. If the database is not shut down already, the DBA can issue shutdown abort to shut it down. At the command line within Server Manager, the following statement can be executed by the DBA while connected as a user with the sysdba privilege granted.
Media failure usually requires some sort of hardware repair or replacement. In order to restore the database to its full functionality, the DBA should ensure that the disk hardware that was damaged--the initial cause for the media failure--is fixed. Alternately, the DBA may choose to circumvent the problem by restoring the database using another disk to store the different files of the database. Assuming the disk has been replaced, the following steps are necessary to perform complete recovery:
ALTER DATABASE OPEN RESETLOGS;
If the disk cannot be replaced, then the DBA may need to move the files to other disks, and update the control file accordingly. The following steps can be used for this purpose.
ALTER DATABASE orgdb01
RENAME FILE /u01/oracle/data/data_301a.dbf
ALTER DATABASE orgdb01
RENAME FILE /u01/oracle/ctl/rdorgdb01
ALTER DATABASE OPEN RESETLOGS;
The read and write status of the tablespace is recorded in the control file for the database. Changing the status of a tablespace from read-write to read only or vice versa should always be followed by a full backup of the tablespace. If the backup of the database was taken after the status of the tablespace was changed from read only to read-write, there should be no problem with database recovery because the status of the tablespaces for the database is reflected accurately in the control file. The DBA should be sure to recover the control file along with the other database files that are used for the complete database recovery. However, if the tablespace status was changed after the most recent backup and no subsequent backup was taken, then the DBA may have to restore the tablespace in the state it was in when the backup was taken, and then change the status manually. This method is consistent with the overall requirement in restoring databases where archiving is disabled, in that all changes that took place after the most recent backup must be reentered.
Oracle behaves a little differently in situations where recovery of read only tablespaces takes place on databases that archive their redo logs. As with databases that leave archiving disabled, the DBA should always back up the database, particularly the control file for that database, after the status change is made to a tablespace. However, with archiving enabled, there are some steps that must be taken in order to recover the database. If the database has been backed up since the status was changed, the backup taken can be used to aid recovery. After recovering the datafile that comprise the tablespace, which were lost in the media failure, the DBA should apply redo log information. Although no data will be changed in the read only tablespace, Oracle may need to set other information in the database appropriately. In any case, Oracle will prompt the DBA with the suggested redo log. The same process of restoring the datafile backup and applying archived redo information is required even when the status of the tablespace has changed since the most recent backup was taken. This is because the archived redo information for that tablespace will contain, among other things, the status change of the tablespace. The steps required for database recovery of a read only tablespace with the original control file in place, regardless of whether a backup of the database was taken before or after the tablespace write-ability status was changed, is listed:
The importance of backing up control files after the status of a tablespace is changed becomes more pronounced if read only tablespace recovery is required due to the failure of a disk that also contained a control file. There are some factors to remember with respect to the control file and recovery of a read only tablespace. If the recovery is using a backed-up control file, Oracle cannot recover a read only tablespace. The DBA must take the read only tablespace offline in order to conduct the recovery. After the archived redo logs are applied, the DBA must apply the current online redo log as the last step in recovery, and then the rest of the database recovery completes. After that, the DBA can bring the read only tablespace online and things should work well. The steps required for recovery with a backed-up control file are as follows:
In this section, you will cover the following topics related to enabling archiving:
|Displaying archive mode|
|Using the ARCH background process|
|Displaying a list of archived redo logs|
|Selectively archiving redo logs|
|Taking backups after enabling archiving|
The archiving or backup of redo information is Oracles innate method for backing up the database. If archiving is enabled and all archives are stored properly, the emphasis on backups for database recovery that are present in both the logical and the offline physical (full) backup strategies (presented in Chapters 11 and 12) is reduced for a backup and recovery strategy. Database recovery options are vastly more flexible when archiving of redo log information is enabled in Oracle. Every transaction in the database writes redo log entries. These entries are stored in the SGA until LGWR copies the redo entry to the online redo log, which resides on disk and is mirrored on several disks, ideally. There are at least two sets of mirrored redo log files on the Oracle database that are online or available for LGWR to write entries at any time. When a redo log fills with redo information, a log switch occurs. LGWR starts writing redo to another set of online redo logs. A checkpoint also occurs at a log switch. LGWR tells DBWR to flush the buffer caches dirty buffer write queue of all buffers containing database changes. LGWR also writes log switch information to datafile headers and to the control file. The last thing that optionally happens if the DBA has set up automatic archiving is that the ARCH background process will copy the online redo log to the archive log destination as denoted by LOG_ARCHIVE_DEST. Automatic archiving is enabled with LOG_ARCHIVE_START set to TRUE in the init.ora parameter file or with the log archive start command in Server Manager. Starting archiving will be discussed in this section.
The first and most basic step to understanding whether or not archiving is enabled on the Oracle database, the DBA must display the status of archiving on the Oracle database. There are a number of different methods that may be used for displaying archiving information. The most prevalent for DBAs is a command run out of Server Manager called archive log list. In addition to whether Oracle is archiving redo logs, this command displays extensive information about the actual redo log being written and whether automatic movement of archives to the archive destination is being done. The following code block illustrates the use of this command:
SVRMGR> ARCHIVE LOG LIST
Database log mode NOARCHIVELOG
Automatic archival DISABLED
Archive destination /DISK01/Oracle/home/arch/
Oldest online log sequence 20
Next log sequence to archive 21
Current log sequence 21
Note several different things about the list of information for archiving. First of all, the database log mode is the item that denotes whether the database has archiving of redo logs enabled. This information is supplied in the database log mode line. Another important feature of information displayed by this command is whether the DBA has set up Oracle to move archives to their archiving destination automatically or whether the DBA has chosen to move archives to their destination manually. A complete discussion of automatic vs. manual movement of archived redo logs to the archive destination is presented later in this section. If archiving is enabled, there will be a destination detailed in the archive log destination line of the output for archive log list. This pathname is taken from a parameter in the init.ora file of the database called LOG_ARCHIVE_DEST. The last three lines of output detail information about the sequence number of the online redo logs that have been written or are being written in the database. When the DBA opens a database using the resetlogs option, the numbers being reset are the values for these three lines. Also, when a log switch occurs, the number for the current log sequence is incremented.
Another method that may be used to obtain only the status of archiving on the Oracle database comes from using a dynamic performance view called V$DATABASE. This view contains several columns detailing system change number information. Its use has been mentioned in the context of presenting the requirements of change-based incomplete recovery, a method of database recovery that will be discussed in greater depth in Chapter 14. V$DATABASE has several columns, including one called LOG_MODE that contains the archive log mode for the database, either noarchivelog or archivelog mode. The following code block contains a query and the output that can be used to determine the archiving information for a database:
SELECT name, log_mode
The information for this query, indeed all information for the V$DATABASE view, comes from the control file of the database. One other method that can be used for determining what archiving mode the database is in is to use the control file for the database itself. If the DBA was in a pinch and none of the other options were available, such as in the situation where the instance was shut down, the DBA may look in the control file to determine the archiving status of the database as well. However, this method is really only useful as a last resort since the other two methods detailed above should suffice in almost all situations.
One last area that should be presented here is the use of V$LOG to identify whether specific online redo log groups have been archived. The V$LOG performance view identifies several items of information about the current status of the online redo log, including the SCN range for the online redo log and the archiving status of the redo log. The ARCHIVED column of the V$LOG performance view will identify whether or not the redo log has been archived.
If a physical backup strategy is in place on the database and the DBA needs to ensure several different recovery options are in place, including recovery to the final committed transaction before the media failure occurred, then the DBA should enable archiving on the Oracle database in question. Archiving makes all sorts of recovery strategies possible, including complete recovery and several forms of incomplete recovery. Without archiving, recovery is only possible to the point in time of the last backup, whether it is complete in the case of a full offline physical backup or partial as in the use of the backup strategy presented for use with EXPORT.
If the methods described for determining whether archiving is in use on the database denote that archiving is not in use on the database, and the DBA determines that archiving should be activated for the database, then the DBA can enable archiving with the following methods. There are two points in time that the DBA can enable archiving on the Oracle database. The first point is at the time the database is created. If the DBA determines that archiving should be used, the DBA can create the database with archiving already in place. The DBA can set archiving to be enabled as part of the create database statement. At some point in this statement, the DBA should include the keyword archivelog to enable archiving. The placement of this keyword is not important, so long as its appearance does not disrupt the syntax of other options such as the specification for datafiles or redo log files. A sample create database statement with archiving enabled at database creation appears in the following code block:
CREATE DATABASE newdb01
LOGFILE GROUP 2 (log01,log02)
DATAFILE (users01.dbf, data01.dbf, index01.dbf)
If the archivelog option is not included in the database creation statement, Oracle defaults to disabling the archiving of redo logs, or noarchivelog. It is possible to switch the archiving status of a database after the database has been created. This operation is essentially an alteration of the database, and as such it uses the alter database statement. Altering the database to begin archiving redo logs can be executed along with any other database alteration, so long as the syntax is correct for the statement when issued. An example for altering the database to archive redo log information without altering any other aspect of the database appears in the following code block:
ALTER DATABASE orgdb01 ARCHIVELOG;
In general, it is recommended that the DBA back up the database when archiving is enabled or disabled. A more complete discussion of the implications of this recommendation is presented later in the section.
When archiving is enabled on the Oracle database, there must be some process by which archived redo logs are copied to an archive destination. The DBA can handle the process manually using operating system copy commands to move the files to the destination determined to be appropriate. The archiving of online redo logs must be handled as the redo logs fill. In order to archive an online redo log manually, the DBA must issue the alter system archive log statement. The usage of this statement appears in the following code block. The statement above archives all full online redo logs that have not already been archived. The archive is specified in the pathname that is provided in quotes in the to clause of the statement.
ALTER SYSTEM ARCHIVE LOG ALL TO /u01/Oracle/arch
Manual archiving of redo logs may work fine for the DBA who manages a smaller or more static database who wants a hands-on approach to managing archived redo information. However, this method can be time consuming, and ultimately a performance problem. Since the DBA must manually enter this statement or develop and test a script that enters this statement, there is some effort involved. Also, if the DBA does not handle the manual archival of online redo logs before all online redo logs fill, the DBA will create a performance problem on the database. If archiving is enabled, archiving is handled manually, and when all online redo logs are full, then Oracle will not be able to continue running until the DBA archives the data in the online redo logs so that LGWR can switch to writing redo to another log. Figure 13-1 demonstrates the result of Oracle filling all online redo logs and the DBA failing to manually archive the redo information.
Figure 1: When all online redo logs fill
In order to reduce any problems that may come about as a result of manually archiving redo log information, the DBA can have the database automatically handle archiving of redo information. This process is called automatic archiving. When Oracle handles archiving of online redo logs automatically, there is little chance that the database will cease to run due to the fact that an online redo log was not archived. Automatic archiving creates a new background process that runs in the Oracle database. This process is called ARCH, for archiver. The ARCH process executes the copying and movement of recently filled online redo logs to the archive destination specified. To enable automatic archiving, the DBA must execute another alter system statement, this time with the start option. The proper syntax for this option appears in the first statement of the following code block. To stop the automatic archival of online redo logs, the second statement in the following code block may be used.
ALTER SYSTEM ARCHIVE LOG START;
ALTER SYSTEM ARCHIVE LOG STOP;
When this statement is issued, a parameter within the instance is set to TRUE. That parameter is called LOG_ARCHIVE_START. If the DBA so desires, she may also set the parameter within the init.ora parameter file. In this manner, the automatic archiving of redo log entries will be in effect every time the instance is started. There are two major points to consider in the configuration of automatic archiving for redo information. These points include defining the destination for all archived copies of redo information and the format ARCH should automatically apply when moving redo log files to the archive destination. These two items are handled with parameters called LOG_ARCHIVE_DEST and LOG_ARCHIVE_FORMAT, respectively. These parameters can be identified in the init.ora initialization parameter file, which is used at the startup of the database instance.
The location of the database archive for redo log information is specified to Oracle using the LOG_ARCHIVE_DEST. There must be a value for this initialization parameter at instance startup. If the DBA should wish to specify another archive destination, the DBA can issue the alter system archive log start with a destination specified at the time the statement is issued. The following code block illustrates this point:
ALTER SYSTEM ARCHIVE LOG START /u02/Oracle/archive;
In order to specify an archive destination for the LOG_ARCHIVE_DEST parameter, the DBA should use the absolute pathname for the destination directory or area of the filesystem. An example for specifying this parameter in the init.ora file is listed in the following code block:
This parameter is also specified in the init.ora file. It handles the identification of a naming convention for ARCH to use when moving online redo logs to the archived redo log location. The format for redo information is arbitrary, and dependent on the filename formats supported by the operating system of the machine hosting the Oracle database. However, since the archived redo logs will essentially be a sequence of files, there are some formatting options available with Oracle that can be used in the LOG_ARCHIVE_FORMAT parameter to ensure that the sequence of numbers for each redo log archived will be consistent and in a manner prescribed by the DBA. The four formatting conventions that are usable with LOG_ARCHIVE_FORMAT are listed below:
|%T Thread number for the redo logs of that instance within the Oracle Parallel Server architecture, padded with zeros to the left. A thread is a running set of redo log information for one instance within a parallel database server.|
|%t Thread number for the redo logs of that instance within the Oracle Parallel Server architecture, not padded to the left with zeros.|
|%S Log sequence number, a sequential number representing the number of redo logs that have been written and archived since archiving began, instance started, or sequence reset by resetlogs, padded to the left with zeros.|
|%s Log sequence number, not padded to the left with zeros.|
The format specified for the LOG_ARCHIVE_FORMAT must conform to any naming conventions required for files at the operating system. It will generate archived redo logs that are similar in form to the sample filenames listed after the specification for the variable. Typically, the archived redo logs will carry the .arc extension as well.
LOG_ARCHIVE_FORMAT = Log%S.arc
LOG_ARCHIVE_FORMAT = Arch-%t-%S.arc
During database recovery, the suggestions that Oracle formulates for the DBA to confirm are based on the value of the LOG_ARCHIVE_FORMAT parameter. The automatic archiving destination specified by LOG_ARCHIVE_DEST can be overridden during a manual archive by specifying a destination in the alter system statement. However, the automatic archiving destination will still default to the LOG_ARCHIVE_DEST location.
Once the redo logs are archived either automatically or manually, there are several ways to identify which redo logs exist and what their contents are. The V$LOG_HISTORY dynamic performance view provides a listing of all archived redo logs for the database, along with detailed information about the transactions in each of the redo logs identified by system change number. The ARCHIVE_NAME column of the V$LOG_HISTORY performance view will identify the name of the archived redo log according to the format specified by the LOG_ARCHIVE_FORMAT parameter or by the filename used when the DBA manually archived the online redo log file. The following SQL query allows the DBA to identify the filename, the transactions in the redo log by SCN, and the sequence and thread numbers for the archived redo log:
to_char(low_change#) || .. || to_char(high_change#) SCN_RANGE,
In addition, the DBA can use the archive log list command to display detailed information about the archiving of redo logs. The usage of this feature is detailed earlier in the section--refer to the discussion of identifying the mode for archiving that was presented in the section.
In general, the DBA will find the use of automatic archiving with the ARCH background process to be sufficient for archiving purposes. Likewise, the DBA may want to manage manual archiving with the alter system archive log all statement to get all online redo logs archived that have not been archived already. However, archiving of redo logs is actually a full-featured function, complete with many different options that allow for the selective archiving of online redo logs. Several options are available for selectively archiving online redo log information. Some of the options that are available for selectively archiving redo logs with the alter system archive log statement include specifying a particular online redo log sequence, system change number, current redo log group, and several others. In addition to archiving all redo logs with the alter system archive log all statement identified above, the different options available with this statement are as follows: seq, change, current, group, logfile, and next. Each term will now be explained.
This option is used to manually archive online redo logs according to sequence number. The sequence number for a given current log group is displayed when the DBA executes the archive log list command. Alternately, the DBA can determine current redo log sequence and thread information using the V$LOG dynamic performance view. The sequence number for the redo log to archive is specified after the SEQ keyword option. An example for archiving according to the sequence number appears in the following code block:
ALTER SYSTEM ARCHIVE LOG SEQ 39;
This option is used to manually archive logs according to a system change number, or SCN. To obtain the range of transaction system change numbers for a given set of online redo logs, the DBA can query the V$LOG dynamic performance view. The columns that define the range of system change numbers available on the database are LOW_CHANGE# and HIGH_CHANGE# in V$LOG, respectively. If the DBA wants to archive a specific redo log based on a system change number, the following statement can be used for that purpose:
ALTER SYSTEM ARCHIVE LOG CHANGE 450394;
This option is used to manually archive the redo log that is currently being written by the instance. In effect, issuing the alter system archive log statement with the current option forces Oracle to conduct a log switch. The same effect can be created if the DBA uses automatic archiving of redo log entries and issues the alter system switch logfile statement. For an example of using this option, the following statement illustrates the syntax of this alter system option. Its usage is straightforward.
ALTER SYSTEM ARCHIVE LOG CURRENT;
This option is used to manually archive redo log information according to the online redo log group that has the value specified by the integer that follows this option in the alter system archive log statement. The DBA can obtain group information from the V$LOG performance view in the GROUP# column. The syntax and semantics for using this option are illustrated by the following alter system statement:
ALTER SYSTEM ARCHIVE LOG GROUP 3;
This option is used to manually archive redo log information according to the online redo log group containing the member identified by the filename specified as part of the logfile option. The value specified for logfile must be a filename for one of the members of a log file group. The members of a log file group are specified as part of the create database command, but that information can be obtained from the V$LOGFILE dynamic performance view. Meanwhile, an illustration for usage of the logfile option appears in the code block below:
ALTER SYSTEM ARCHIVE LOG LOGFILE LOG1A;
This option is used to manually archive the most recent online redo log that has filled but not been archived. If all online redo logs have been filled, then there is nothing to archive. Usage of this option appears in the following example:
ALTER SYSTEM ARCHIVE LOG NEXT;
Archiving offers the added protection of knowing that all changes in the database are recorded for posterity in the online redo log, and all online redo logs that are created by Oracle are archived for protection. However, the DBA cannot simply rely on archived redo logs to save the day. Recovery with archiving is slow. In almost all cases, the more recent the backup a DBA has to recover lost information and the fewer the number of archived redo logs there are to apply, the faster database recovery will run. For this reason, backing up the database frequently is still an important feature of database recovery, even when the added protection of archiving is used.
There is an important time for backups in the operation of a database for the DBA to be aware of when using archiving. The DBA must always remember that a backup must be taken of the database immediately after the status of archiving is changed from enabled to disabled, or archivelog to noarchivelog. This step is important mainly because Oracle writes special information to the backup of a database related to archiving or not archiving. If archiving is enabled but the most recent backup used to recover the database was taken at a time when archiving was not enabled, then archived redo logs may not be applied to the database as part of recovery from that backup. So, in essence, if the DBA enables archiving on a database but does not create a backup of the database when it is in archivelog mode status, the archived redo log information is worthless for use in recovery. The DBA should remember to always take a backup of the database after the archiving status of the database changes.
Finally, it may be valuable to the DBA to keep backups of the archived redo logs. Recall earlier in the chapter that if a redo log archive is lost, the database can only be recovered to the point in time just prior to the lost redo log. Therefore, the only choice a DBA will have in this situation is a forced incomplete recovery. Once archived redo logs are placed in the LOG_ARCHIVE_DEST, the DBA is free to move the archives offline or copy them in any manner permitted by the operating system copy commands available on the host machine running Oracle.
This chapter covers some important areas of Oracle database recovery. There are three main areas this chapter presents, including an introduction to database recovery, the requirements of database recovery without the use of archival of online redo logs, and enabling archivelog mode on the Oracle database. The introduction to database recovery covers several different areas, including a review of the various failure scenarios occurring in Oracle and the role of communication in crisis situations. The role of archiving and the various types of database recovery are also presented. This chapter also offers guidance on the steps required for the recovery of a database using full database backups taken offline when archiving is disabled. A special presentation on recovering read only tablespaces is also included. Finally, the requirements for initiating archiving on the database are presented. Overall, this chapter is a lengthy one, comprising 27 percent of test material on OCP Exam 3.
The first discussion in this chapter reviewed failure scenarios. There are five different types of failure scenarios in an Oracle database. They are statement failure, process failure, user error, instance failure, and media failure. Statement failure is something that generally occurs as a result of a syntax error on the part of someone issuing a statement against the database. There is usually no need for intervention on this problem, as people must correct the issue for themselves.
Process failure arises as the result of cancellation of a user SQL statement that otherwise was running. The process may have been canceled either by the user, manually, or by Oracle automatically. In order to handle user process failures, Oracle has a built-in process called PMON that monitors user processes. If one should fail, PMON handles cleanup activities like releasing locks the process acquired, rolling back the incomplete transaction, and taking the process off the active process list.
User error is a unique situation on the database whereby some user of the database inadvertently changes data or eliminates a table. Depending on whether the organization has made a commitment to support this situation, the DBA may need to formulate a strategy for handling user error. Generally, the solution for user errors requires the use of backup and recovery. Whatever the commitment for resolving user error, the DBA should ensure that it fits into the overall strategy for backup and recovery on the database.
Instance failure arises when, for some reason, the Oracle instance cannot continue running. Some potential causes for this situation include hardware failure or a power outage. When instance failure occurs on the database, the DBA must identify the root cause of the failure and resolve it before restarting the instance. However, once the instance is restarted, the DBA neednt put that much effort into the instance recovery because the SMON background process will handle the recovery automatically.
Media failure is the problem to watch out for. It is related to the failure of a disk drive on the machine that hosts Oracle. There are two categories for media failure: temporary and permanent. Temporary media failure occurs when something happens that prevents access to database files on a particular disk, but no permanent damage to those files has occurred. For example, the failure of a disk controller device may cause the disk to be inaccessible temporarily. But, when the problem is corrected, the data will again be available for use by Oracle. Permanent media failure occurs when the database disk drive becomes damaged or destroyed and the data on the drive is rendered inaccessible. In this case, the DBA must usually recover the data that was lost from a backed-up version of the database.
The importance of communication, though not critical knowledge for certification, is a real-world skill that a DBA should master. During the crisis event, which produces the need for a database recovery, tempers often flare and there is a serious need for strong communication skills. The DBA should be able to address communication issues in the situation of a crisis so that the users of the database are not left wondering what is happening to recover the damaged system. The importance of speaking to the level of technical understanding for other people in the organization, such as users and managers, was also identified. These groups of people usually have a technical language all their own, which is not similar to the technical language of the DBA. The DBA must bear the information needs of the audience in mind when dealing with crises that create a need for database recovery.
The importance of archiving was again highlighted as an integral part of database recovery. The overall architecture of archiving was detailed for review in this discussion. The architecture for redo log archiving consists of many different components. The first component for redo log archiving is the redo log buffer, a memory area designed to capture the redo information produced by statements changing data in the database. This redo information is written to a disk file, called the online redo log, by a special background process called LGWR. There must be at least two of these online redo log groups in order for Oracle to function properly. LGWR writes data to the redo log group, switching back and forth between them. When one log group fills, LGWR writes to the other log group. This design forces LGWR to overwrite redo information unless the DBA chooses to archive redo log information. When archiving is enabled, then Oracle allows for the archiving, or saving, of redo information. The movement of this data is handled either manually by the DBA or automatically by the ARCH background process.
The DBA should strongly consider archiving redo log entries because it allows for many different options when the need arises to make database recovery happen with the backups available. When archiving is used and recovery is required, Oracle makes the recovery an interactive process with the DBA. Along the way, Oracle suggests archived redo logs to apply based on the contents of its V$ performance views for redo logs, namely V$LOG and V$LOG_HISTORY. The DBA can confirm these suggestions or supply her own recommendation, which Oracle will then use to execute the recovery. In addition, the DBA can use automatic recovery, whereby Oracle will simply apply its suggestions for recovery, bypassing the interactive portion of the recovery.
There are several different types of recovery available to the DBA. One division of recovery options into categories is complete and incomplete recovery. Within complete recovery, there is generally only one type of recovery that is complete. That is the complete recovery made possible with archived redo logs to recover database changes to the moment of the database failure.
Forms of incomplete recovery include any type of recovery that takes place by applying archived redo logs to a point in time in the past. There are three types of incomplete recovery: change-based, time-based, and cancel-based. These three types of database recovery are designed to allow the DBA to recover the database to a point in time in the past, specified by a variety of different methods. The first of these methods is cancel-based recovery. A cancel-based recovery is one that runs based on the interactive part of database recovery with archived redo logs. The recovery will continue to apply archives until Oracle makes a suggestion for the next redo log and the DBA issues the cancel command.
Another type of incomplete recovery is the change-based recovery. This recovery is based on the system change numbers (SCN) that Oracle applies to every transaction that runs on the database. When issuing the command for change-based recovery, the DBA must supply a system change number that Oracle will use to determine the stopping point for the recovery. When Oracle completes application of the redo log information for the SCN for the log containing that SCN, the recovery will end automatically. The result is a database that is recovered to a point in time in the past at which the final transaction is the one whose SCN is the one defined at recovery time by the DBA.
The final type of incomplete recovery that uses archiving is the time-based recovery. Perhaps the most straightforward of all incomplete recoveries, the time-based recovery requires the DBA to specify a time to which the database should be recovered. When Oracle has applied enough archived redo information to recover the committed transactions to the point in time specified for time-based recovery, Oracle will automatically cease the recovery effort.
Although there are only the three incomplete recoveries described that work in conjunction with archived redo logs, other types of database recovery that do not work with archived redo also may be considered "incomplete" because they are only capable of recovering data to a point in time in the past. These types of recovery are database imports, described in Chapter 12, and physical recovery based on the full physical offline backups that were explained in Chapter 11. A full recovery from complete backup is handled in the following way. The database must first be shut down. The DBA replaces all files in the Oracle database with the backup versions. If need be, the name and/or location of the control file may be changed in the init.ora file before mounting the database. At this point, the database can be mounted, but if there was some movement of datafiles or redo log files from one disk to another, the DBA should use the alter database statement at this time to ensure Oracles awareness of the change. Finally, the DBA should open the database using the resetlogs option to discard all online redo information.
There are special considerations the DBA must make for recovery of read only tablespaces. The golden rule for facilitating recovery on read-only tablespaces is to always make a backup of the control file after the status of a tablespace changes from read only to read-write, or vice versa. Without this backup of the control file, there are always extra steps to manage the recovery of read only tablespaces.
The mechanisms to enable archiving were also covered in this chapter. There are several aspects to setting up the database to handle archived redo log information. The first of these items covered is determining whether or not archiving is enabled on the database. One method for handling this step is to look in the V$DATABASE dynamic performance view. Another is to issue the archive log list command from within Server Manager. The output from this command not only gives the DBA information about the archiving status, but also gives a great deal of information about the archived redo logs themselves. Finally, if archiving is enabled, the V$LOG view gives information about whether the individual online redo logs have been archived.
Putting the database into archivelog mode is perhaps the greatest and simplest step a DBA can take to ensure the recovery of the Oracle database. Archiving is enabled in two ways. First, at the creation of a database with the create database statement, the DBA can include the archivelog option to begin archiving of redo log entries right away. Alternately, the DBA can issue the alter database archivelog statement to change a previously nonarchiving database into one that archives its redo logs. To change the database into noarchivelog mode, the DBA can issue the alter database noarchivelog statement. It is important that the DBA take a complete backup of the entire database after changing the archiving status of the database to archivelog mode in order to ensure that the database archived redo logs are useful in database recovery. Archived redo logs cannot be applied to a database that has been recovered from a backup taken of the database when it was not in archivelog mode.
Once the DBA has set the database to archivelog mode, she can either handle the archiving of redo logs manually with the alter system archive log all statement or by setting Oracle up to handle automatic archiving with the use of the ARCH background process. Automatic archiving is started using the alter system archive log start statement. When enabled, the ARCH background process archives redo information automatically every time a log switch occurs, placing them into a destination specified by the value for the LOG_ARCHIVE_DEST in the init.ora file. The name ARCH will file the archive under is contingent on the value specified for the LOG_ARCHIVE_FORMAT parameter, also specified in the init.ora file.
To view the archived log list, the DBA may look in the V$LOG_HISTORY performance view. The ARCHIVE_NAME column contains the name assigned to the archived redo log sequence and thread, based on automatic archiving using the LOG_ARCHIVE_FORMAT parameter. Finally, the DBA can selectively archive redo log information by manually archiving redo with the alter system archive log statement. There are several manual options available for selective archiving. They are seq, change, current, group, logfile, and next. The seq option allows the DBA to archive redo logs according to sequence number. Each redo log is assigned a sequence number as LGWR fills the online redo log. The change option can be used to archive a redo log that contains a certain SCN. The current option archives the redo log that is currently being written by LGWR, which forces Oracle to perform a log switch. The group option allows the DBA to specify a redo log group for archiving. The logfile option allows the DBA to archive redo logs by named redo log member files. The next option allows the DBA to archive redo information based on which redo log is next to be archived. Also, it should be noted that thread is also an option for archiving redo logs. A thread is a number representing the redo information for a single instance in a multi-instance parallel server setup using Oracles Parallel Server Option. The thread option can be set for any of the options for manually or automatically archiving redo log information using the alter system archive log statement.
|There are two types of media failure: temporary and permanent.|
|Temporary media failure is usually the result of hardware failure of something other than the actual disk drive. After it is corrected, the database can access its data again.|
|Permanent media failure is usually the result of damage to the disk drive itself. Usually, the drive will need to be replaced and the DBA will need to recover the data on the disk from backup.|
|In crisis situations, it is beneficial to the DBA to have strong communication skills to facilitate important decisions in tough situations with input from users and managers.|
|There are several different types of recovery. With archiving in place, the DBA has more options to choose from.|
|Two categories of recovery exist: recovery with archiving and recovery without.|
|In recovery with archiving, there are two categories: complete recovery to the point in time of the database failure, and incomplete recovery to some point in time before the failure occurred.|
|There are three types of incomplete recovery when the database runs in archivelog mode: change-based, cancel-based, and time-based.|
|Change-based recovery is where the DBA specifies a system change number that Oracle should use to denote the end of database recovery.|
|Time-based recovery is where the DBA specifies a date and time that Oracle should use to determine the end of database recovery.|
|Cancel-based recovery runs until the DBA issues the cancel command, taking advantage of the interactive process that happens as Oracle restores archived redo log information.|
|Automatic recovery can be used to reduce the amount of interaction required for database recovery. When enabled, Oracle will apply its suggestions for archive logs to apply automatically.|
|When the DBA opens the database after recovery, the resetlogs option can be used to discard online redo logs and to reset the sequence number.|
|Database recovery can be accomplished from full offline backups. The DBA should ensure that all files are restored from backup, not just damaged ones, to ensure that the database is consistent to a single point in time.|
|When restoring read only tablespaces, it is important that a backup of the control file be made after the status of the tablespace was changed to read-write or to read only.|
|There are two methods for determining the archive status of the database: the DBA can look in V$DATABASE or execute archive log list from Server Manager.|
|The DBA can set archiving on or off using the archivelog or noarchivelog options in create database or alter database statements.|
|There are two methods available for archiving redo logs: manual and automatic.|
|Automatic archiving is started with the alter system archive log start statement. Substitute stop for start to shut off automatic archiving.|
|If manual archiving is used, the DBA must make sure to archive redo logs before LGWR runs out of online redo logs to write information to. If archiving is used, LGWR will not overwrite an online redo log until it has been archived. If Oracle runs out of online redo logs for LGWR to write redo information to, no user can make database changes until archiving happens.|
|Automatic archiving needs the LOG_ARCHIVE_DEST and LOG_ARCHIVE_FORMAT parameters to be set in init.ora.|
|LOG_ARCHIVE_DEST determines where redo log archives will be placed.|
|LOG_ARCHIVE_FORMAT determines the nomenclature for the archived redo information.|
|Information about the archived redo log files is listed in the V$LOG_HISTORY dynamic performance view.|
|Selective archiving of redo information is possible with the use of several options for manual archiving. Those options are seq, change, current, group, logfile, next, thread, and all.|
|seq allows the DBA to archive redo logs according to sequence number. Each redo log is assigned a sequence number as LGWR fills the online redo log.|
|change can be used to archive a redo log that contains a certain SCN.|
|current archives the redo log that is currently being written by LGWR, which forces Oracle to perform a log switch.|
|group allows the DBA to specify a redo log group for archiving.|
|logfile allows the DBA to archive redo logs by named redo log member files.|
|next allows the DBA to archive redo information based on which redo log is next to be archived.|
|thread is also an option for archiving redo logs. A thread is a number representing the redo information for a single instance in a multi-instance parallel server setup using Oracles Parallel Server Option.|
|The thread option can be set for any of the options for manually or automatically archiving redo log information using the alter system archive log statement.|
|The all option specifies archival of all redo logs that are currently in need of being archived.|