How do we recover from a AD Domain that is now not responding as a DC at all

active-directorydomain-controllerwindows-server-2012

Windows Server 2012 is the primary server environment.
All machines are located in a server room that services a network testing lab. There are over 400 servers in the lab, most of which are virtualized, and give us a working range of 3500-4000 machines to be used by the test lab.

Scenario:
Our backup DC went down about a month ago and no one realized it since the previous admin had just left and our new infrastructure guy was learning our network on the spot. The DC had hardware failure and is a complete loss.

Our PDC started having issues replicating and eventually gave us a message that it declared itself invalid due to not completing the role of managing the DCs. This task could not be validated (duh!).

About a week ago, we ran a Powershell script to add 400 machines to the AD/Domain. About 25% of the way through, the script quit recognizing the PDC as the domain controller. Ever since then, we are unable to add machines to the domain, we are currently running out of cache time on the other clients for their being logged into a domain with no actual PDC.

We were unable to remove the damaged DC from the PDC, we can't backup the AD and migrate data because of this so.

What we have done so far

  • Created a two new DCs from scratch.
  • The original PDC is now disconnected from the lab while we try to get the new one to take over.

Problems

  • We are unable to backup our GPOs and unable to export our AD list.
  • We can't demote the original PDC, it errors out.

Is there a way to avoid completely rebuilding our AD from scratch?

Are there options for recovering our GPOs without manually recreating them?

dcdiag data

After running dcdiag, all but CheckSDRefDom tests failed. All LDAP dependent tests failed with LDAP Error 0x3a (58). FMSO checks succeeded. DNS failed to respond and reported as not started even though the service was started.

I think we will take Mathius's suggestion and take this as an oportunity to redesign and learn.

Best Answer

I'm afraid to sound elitist and patronizing, but from the information you've provided and the way you've worded your question, I think the most feasible long-term solution is to

  • Create a new domain in a new forest with a new FQDN on a "new" (freshly installed) server
  • Unjoin all the clients from the existing domain
  • Reboot and join the new domain
  • Re-install the operating system on the existing Domain Controllers
  • Install AD DS on the servers and join the new domain
  • Use this opportunity to revisit your understanding of Active Directory, what it is and how it works

Microsoft has published a number of guides to approaching Active Directory design and deployment, worth mentioning is definitely:

The AD DS Design Guide on TechNet
The Active Directory Domain Services guide from Microsofts Infrastructure and Planning Guide Series

Good luck!


Backing up your GPO's:

From your recent updates it sounds like AD DS is not currently operational, so here is a last resort GPO backup-and-recover solution, not including Links and WMI Filters.

A Group Policy Object consists of 2 parts, a Group Policy Container and a Group Policy Template.

The Container is an object in active directory that holds the Group Policy links that are used to apply the given GPO to a given OU - if the DSA is unavailable to you at this point, you won't be able to retrieve these without mounting and exploring an offline copy of your NTDS database (not as easy as it may sound).

The Template on the other hand, contains the meat and potatoes of the GPO, all the settings, the name, version information and so on, and is stored in the SYSVOL folder on the filesystem.
With the default configuration you'll be able to find all your GPT's in C:\Windows\SYSVOL\domain\policies\. With a file level backup of the GPT's, you'll be able to recreate the GPO's in the new domain, preferably using PowerShell as demonstrated below:

$gptBackupFilePath = "C:\backup\policies\"
$ServerName = $env:COMPUTERNAME

Import-Module GroupPolicy

$GPTs = Get-ChildItem $gptBackupFilePath -Directory |Where-Object {$_.Name -imatch "^\{([0123456789abcdef-]){36}\}$"}

foreach($GPT in $GPTs)
{
    if("{31B2F340-016D-11D2-945F-00C04FB984F9}" -eq $GPT.Name.ToUpper())
    {
        Write-Host "Skipping Default Domain Policy "
    }

    if("{6AC1786C-016F-11D2-945F-00C04FB984F9}" -eq $GPT.Name.ToUpper())
    {
        Write-Host "Skipping Default Domain Controllers Policy "
    }
    $GPTPath = $GPT.FullName
    $GPOName = (Get-Content (Join-Path $GPTPath "GPT.ini") |Where-Object {$_ -match "^displayName="}).Substring(12) |Select -First 1
    if(-not($GPOName))
    {
        Write-Warning "Unable to read GPO name from $GPTPath"
        continue
    }

    $newGPO = New-GPO -Name $GPOName -Server $ServerName
    if(-not($?))
    {
        Write-Warning "Unable to create new GPO $GPOName"
        continue
    }

    $GPOGuid = $newGPO.Id.ToString()

    $Destination = Get-Item ("C:\Windows\SYSVOL\domain\policies\{" + $GPOGuid + "}")
    if(-not(Test-Path $Destination))
    {
        Write-Warning "Unable to access new GPT for GPO $GPOName"
        continue
    }

    Get-ChildItem -Path $GPTPath -Recurse -Exclude @("gpt.ini") |Copy-Item -Destination $($_.FullName -replace $GPTPath,$DestinationPath.FullName) -Force
    if($?)
    {
        Write-Host "Successfully recreated GPO $GPOName as $GPOGuid"
    }
}

I doubt this is a supported solution, and unlike a regular GPO import with migration tables, you'll need to appropriate UNC paths other domain-specific references by hand.

The above example is intended to be run on a Domain Controller in your new forest with $gptBackupFilePath changed to the folder containing the contents of [..]\policies on the old Domain Controller


The only other current answer to this questions, suggesting that you've lost the Domain Controller currently possessing the RID Master FSMO role, and have exhausted the current RID pool is with all probability entirely correct, and you may very well be able to recover the forest from it's current state.

My recommendation to start from scratch is not an easy-frag default goto response, but a carefully chosen one, based on personal experience with AD Disaster Recovery, and more importantly, cleaning up after other peoples disastrous disaster recovery efforts.

If you don't fully understand what to expect from a healthy Active Directory environment, and by trial-and-error do what you're told needs to be done (FSMO seizing, metadata cleanup etc.), underlying unresolved issues may still be present - but too elusive and therefore hidden to the untrained eye.

Any inconsistency introduced during the last 30 or 60 days might not manifest itself right here and now, but if and when it does - you're gonna wish you started from scratch when you had the opportunity