AD Disaster Recovery
Having a good disaster-recovery plan is vital
Active Directory (AD) is the foundation for Windows 2000’s new technologies. AD serves as a central storage location and access point for a tremendous amount of information, making its health and availability crucial to your network’s ongoing operations. If AD fails, chances are that your IT services infrastructure is pretty much useless, so having a good AD recovery plan is vital.
Proper disaster-recovery planning begins long before a failure occurs. An AD recovery plan can include clustering, RAID, and backup and restore procedures as well as ensuring that you have an adequate number of domain controllers (DCs) for each of your domains and geographical locations. Knowing the minimum information that you need to back up so that you can perform a successful restore will help you determine which recovery method will work for your situation and will help you get AD back online.
DCs Are Key
Each DC in a domain maintains a copy of that domain’s AD partition, so having an AD disaster-recovery plan is really a matter of being prepared to recover or replace one or more DCs. Each Win2K AD-enabled DC has a read/write copy of the directory and uses multimaster replication to ensure that changes to its copy of the directory are transferred to other DCs in the domain. Because your Win2K network isn’t dependent on one particular PDC in each domain, Win2K is more scalable and fault tolerant than Windows NT 4.0 is. However, the multimaster replication model makes performing certain types of recovery operations fairly complicated, requiring an understanding of the AD replication process.
What to Back Up
For a successful restore operation, you must have backups of the right data. Because AD resides on your network’s DCs, to fully recover AD requires that you have a full backup of every DC. At a minimum, you should have a backup of all the Global Catalog (GC) servers and all Operations Masters for each domain in your forest. Although a full backup of these machines is ideal, the specific data that you need to back up for each DC is in the system partition and the System State. The System State on a DC includes AD and the files that AD-dependent services require (e.g., the registry, boot files, Sysvol, the Certificate Services database, Microsoft Cluster service). Because of DNS’s importance to AD, only AD-integrated zones are included in the System State. This detail is one reason to back up the system partition, which will include standard zones. (For more information about backing up the System State, see “Related Articles.”)
Although you have your choice of several third-party Win2K-certified backup applications, you don’t need anything more than the backup application that ships with Win2K. Microsoft has improved Windows’ backup application in Win2K, giving it the ability to back up to media other than an attached tape device and to schedule backups at regular intervals without using a batch file. To use this application to back up a DC’s System State, on the DC, click Start, Programs, Accessories, Backup. Alternatively, you can click Start, Run, and type
in the Open text box. In the resulting Backup window, select the System State check box in the left pane of the Backup tab, which Figure 1 shows. To back up the System State, you must be a member of the Backup Operators or Administrators group; to restore the System State, you must be a member of the local Administrators group. The only backup type that the System State supports is Normal (i.e., you can’t use the Copy, Daily, Differential, or Incremental backup types). Although Normal backups take longer than other types to create, you can create the backup while the server is online. However, Win2K’s backup application has a limitation that doesn’t plague third-party programs: Win2K’s backup application lets you perform only a local AD backup; you can’t back up the System State on a remote machine.
In addition to calling for backups, a good disaster-recovery plan provides up-to-date information about which DCs are serving as GC servers and Operations Masters. GC servers maintain a copy of every object in a forest, enabling searches of your entire AD without requiring communication with a DC in every domain. You can use the Microsoft Management Console (MMC) Active Directory Sites and Services snap-in to determine which DCs host the GC. (In the console, right-click a server’s Ntds object and select Properties. In the resulting dialog box, the Global Catalog check box will be selected if the server is a GC server.)
Operations Masters are DCs that play specific roles within their domains or forests for crucial AD functions that don’t utilize AD’s multimaster replication functionality. For example, AD schemas define what types of objects users can create, and the Schema Master DC is the only DC in the forest that can accept schema updates. (For more information about Operations Masters, see “Related Articles.”) You can use the MMC Active Directory Users and Computers snap-in to determine which DCs are serving as Operations Masters for all roles except the Schema Master role, which requires you to use the MMC Schema Manager snap-in. (To use the Active Directory Users and Computers snap-in to view Operations Masters, right-click the domain name in the console’s left pane and select Operations Masters from the resulting menu.) Alternatively, you can use the Ntdsutil command-line utility.
Even if the first DC implemented in your root domain (i.e., the first DC implemented in the forest) isn’t a GC server or an Operations Master, it probably plays an important role in time synchronization, which is necessary for Kerberos. Therefore, be sure that you also back up this system.
Restoring a Failed DC
After you know what you must back up on a DC and which DCs in your network you need to back up at minimum, you can plan a restore method. Depending on the situation, you can recover AD and restore a failed DC through reinstallation or through a nonauthoritative or authoritative restore from a backup.
Restoring AD Through Reinstallation
If you’re having problems with a DC that has data or database corruption, the easiest way to restore AD is to reinstall Win2K, then run the Active Directory Installation Wizard (DCPromo). During the DCPromo process, the system contacts another DC in the domain to obtain an up-to-date copy of AD.
Although this process basically creates a new DC, you need to address a few considerations to avoid problems. If the problems you were experiencing with the DC were severe enough to warrant a reinstallation, you probably won’t be able to demote the DC before you reinstall Win2K. Thus, information in AD that references the DC won’t be correct, and naming conflicts might result. An easy way to work around this problem is to give your reinstalled DC a different name from that of the failed DC. If this solution is acceptable, you can simply delete the original server from the appropriate site in the Active Directory Sites and Services snap-in and from the DC’s organizational unit (OU) in the Active Directory Users and Computers snap-in.
However, if you must give the reinstalled DC the same name as the failed DC (e.g., if this DC hosts an application that users connect to by name or through mapped drives), you must remove the failed machine’s ntdsDSA object from AD at one of the remaining DCs. To remove this object, you can use Ntdsutil, a command-line utility that performs AD store database maintenance, manages Flexible Single-Master Operation (FSMO) roles, and cleans up metadata that failed DCs leave behind. To use this utility, click Start, Run and type
in the Open text box. In the resulting window, type the command sequence that Listing 1, page 45, shows.
Recovering AD from a Backup
Recovering AD by reinstalling Win2K is appropriate for some situations, but other scenarios require you to rely on your DC backups. For example, restoring AD from a backup is more appropriate than reinstalling Win2K if you’re working with the only DC at a site and the replication traffic associated with DCPromo across a WAN link is unacceptable. This situation calls for a nonauthoritative restore. Alternatively, if you’re trying to restore information that was erroneously deleted from AD (and therefore isn’t available from other DCs in the domain), you need to perform an authoritative restore of AD.
Nonauthoritative restore. To use Win2K’s backup application to perform any type of restore, you must boot into Directory Services Restore Mode by selecting this mode from the advanced startup options menu, which you reach by pressing F8 during the boot process. This mode boots the DC without starting AD so that you can overwrite the existing files. The system will prompt you to log on as you usually would. But rather than providing domain credentials, you must provide the local Administrator account name and password. After you log on, launch the backup application. You can then restore the system from the Backup window’s Restore tab.
After restoring, reboot the system, which will contact its replication part-ners and receive any AD changes that have been made since the backup. This method eliminates much of the replication traffic that a reinstalla-tion with DCPromo creates, because only the updates need to be replicated.
Authoritative restore. Use an authoritative restore whenever you need the AD data on your backup to take precedence over existing (and more recent) AD data on your network. For example, you’d use this method after someone has mistakenly deleted information that would be difficult to recreate.
Suppose you accidently delete the Marketing OU from your AD domain. Several minutes later, you realize your mistake, but replication has already occurred. In this situation, performing a nonauthoritative restore to the DC won’t restore the OU. When the restored DC reboots out of Directory Services Restore Mode and begins the usual replication process, the system’s replication partners will update the restored DC’s copy of AD by removing the Marketing OU, and the lost objects will remain lost. To fix this problem, your backup must replace the more recent version of AD.
You perform an authoritative restore just as you perform a nonauthoritative restore, but you include an additional step: After performing the restore, use Ntdsutil to mark the portion of the directory that you want to restore as authoritative. For example, to restore the Marketing OU, you would enter the following commands at a command prompt:
ntdsutil Authoritative restore Restore subtree OU=Marketing,DC=Win2000,DC=com
The authoritative restore commands are available in Ntdsutil only when you’re in Directory Services Restore Mode. You can use these commands to restore any portion of AD by supplying the portion’s proper distinguished name (DN).
To perform an authoritative restore of the entire AD, you can use the Restore Database command. However, use this command with extreme caution because all changes made to the AD after the most recent backup will be lost.
If you’re restoring AD from a backup, the more recent the backup the better, but your backup image must be more recent than AD’s tombstoneLifetime setting. TombstoneLifetime, which is set to 60 days by default, is the number of days that AD stores a deleted object before permanently removing any record of the object. AD uses tombstone tags to mark objects that have been deleted and to replicate the deletion to other DCs in the domain. The tombstoneLifetime setting has a large value to account for replication latencies that can occur when a DC is down for an extended period of time, but you can set the tombstoneLifetime value to as few as 2 days. You can use the MMC ADSI Edit snap-in, which Figure 2 shows, to view and change the tombstoneLifetime value.
Even if your backup isn’t older than the tombstoneLifetime setting, you can encounter time-related problems regarding computer-account and trust-relationship passwords. Win2K renegotiates these passwords every 7 days by default. Thus, if you perform an authoritative restore of a portion of your AD that affects either the computer-account or trust-relationship password, you might need to manually reset the passwords. Failure to do so can affect replication and users’ ability to log on to the domain. To reset a trust-relationship password, you must remove and recreate the trust through the MMC Active Directory Domains and Trusts snap-in. To reset a computer account, open the Active Directory Users and Computers snap-in by right-clicking the account and selecting Reset Account.
Restoring to Dissimilar Hardware
The restore-from-backup methods assume that you’re restoring AD on the same machine on which you created the backup. But in many disaster-recovery situations, a hardware failure causes the need for the restore, so restoring to the same machine isn’t possible. To restore AD to a new machine, first ensure that the new system’s hard disk configuration matches that of the original machine and that the partitions are at least as large as the partitions on the original machine. In addition, be sure that you restore to a machine that uses the same hardware abstraction layer (HAL) as the original machine. If the new system uses a different video card or contains multiple NICs, uninstall them before performing the restore. After you reboot, Win2K’s Plug-and-Play (PnP) functionality should reinstall these items.
Performing Remote Backups and Restores of AD
As I mentioned, Win2K’s built-in backup program supports only local backups of AD, so you can’t use the program to back up and restore the System State on a remote system. However, you can use Win2K Server Terminal Services to overcome this limitation—simply perform the restore through a terminal window as if you were at the machine.
To perform a remote recovery of a DC that’s accessible through a terminal connection, access the DC’s boot.ini file through the connection and add the /safeboot:dsrepair switch at the end of the default Advanced RISC Computing (ARC) path, as Figure 3 shows. Then, you can proceed with the restore operations.
Plan for the Worst AD recovery includes several considerations that you should be aware of. The best way to save time and prevent stress when you need to recover AD is to have a prepared plan. As anyone who has ever had to restore a failed server knows, the process is never as easy as it’s supposed to be. However, having an AD disaster-recovery plan will make it a little easier.
LISTING 1: Ntdsutil Cleanup Command Sequence NONEXECUTABLE Metadata cleanup Connections Connect to server Quit Select operation target List domains Select Domain List Site Select Site List servers in site Select server Quit Remove selected server