This post is part two of several posts focussing on the Azure Active Directory connector. In part one I introduced the connector. In this post I will discuss an approach to multi-forest directory synchronisation, focussing on information relevant to the success of the project (as opposed to a sequence of steps to implement).
Multi-forest directory synchronisation: a definition
For the purpose of this post I’d like to define multi-forest directory synchronisation as the synchronisation of two or more on-premises Active Directory Domain Services (AD DS) forests, each forest containing an Exchange organisation. This is important, as there are two very different topologies: multiple forests and multiple Exchange organisations; multiple forests and one Exchange organisation, i.e. mailboxes vs. linked mailboxes. I wont’ be covering the hybrid of mailboxes and linked mailboxes. You’ll have to engage us for that filth.
For the purpose of this post we’ll also assume that Global Address List (GAL) synchronisation (GALSYNC) is performed by Forefront Identity Manager (FIM) 2010 R2. Free/busy sharing is provided by federated delegation. The GALSYNC solution is live and running and looks like this:
Note that I have a succinct post, GALSYNC and DirSync in harmony, that discusses a couple of configurations and mitigations to ensure GALSYNC (custom or FIM) is working properly with DirSync.
You want to implement Exchange Online and SharePoint Online using federated authentication, ergo you need to synchronise mail objects and users from the three on-premises infrastructures into your Azure Active Directory (AAD) tenant, like this:
I’m specifically calling out Exchange Online (EXO) and SharePoint Online (SPO) here because SPO is easy. EXO adds a number of extra requirements and requires 100% on-premises remediation (otherwise people don’t get email).
Multi-forest directory synchronisation: an overview
The Directory Synchronization appliance, DirSync, synchronises all contact, group and user objects from your on-premises AD DS forest into your AAD tenant. Filtering is hugely limited by default; some system objects are excluded. You have the ability to tweak the connector filter(s) and in-scope containers (see Configure filtering for directory synchronization). You are not allowed to do anything else. DirSync uses the on-premises objectGUID value as the immutable ID. The objectGUID attribute is a binary attribute – a byte array – and is therefore base64 encoded as part of the SAML token issued by Active Directory Federation Services (AD FS). DirSync therefore converts the Byte into a base64 string. You cannot change this (well, you cannot change this and be in a supported state).
Multi-forest directory synchronisation requires FIM. There isn’t really a readily available solution – you have to create your own. I used DirSync as my starting point and then through several iterations refined it. For what attributes of what objects are synchronised, see the List of Attributes that are Synced by the Windows Azure Active Directory Sync Tool.
Note. The AAD connector documentation focuses on a common scenario (and Exchange deployment pattern) of one resource forest and one Exchange organisation and multiple account forests. This post, as stated in the introduction, is not focussing on this architecture. This post focuses on the multiple account forest, each with its own Exchange organisation architecture.
With DirSync there is a choice to enable Hybrid mode or not. In the context of DirSync the hybrid option is a configuration option that enables export attribute flow (EAF) on the Active Directory Management Agent (ADMA), i.e. “write-back” is enabled (I discuss the attributes and required permissions in this post). In a multi-forest deployment there isn’t really a choice – you have to enable “hybrid mode”, or at least you have to EAF proxyAddresses. The reason for this is the Outlook recipient cache. You need to write the X500 proxy address that is the cloudLegacyExchangeDN back to on-premises so that non-migrated mailboxes can still reply to the recipient, even if they’re in a different forest using a GALSYNC’d contact. Typically you also need it to facilitate off-boarding, i.e. moving mailboxes back to on-premises.
The scope of synchronisation is therefore:
- on-premises data is pushed to the cloud; and
- a small set of cloud attributes are written to the on-premises infrastructure.
What to import
For SharePoint and/or Office click to run or CRM you can just synchronise user (or inetOrgPerson) objects. For Exchange Online and Lync Online you’ll also need contact and group object types.
On a per ADMA basis you choose which containers are in-scope. I typically use Declared (Import Filter) connector filters to exclude out of scope objects that reside within in-scope containers. I exclude any user objects where the UPN doesn’t end with a verified domain; I exclude UPNs with bad characters. The configuration of the connector filters is quite specific to the environment into which you’re deploying, which is why I prefer to do this using declarative (UI-based) connector filters as opposed to rules-based filters.
It is important that the GALSync target folder is included in the scope of FIM DIRSYNC. Without GALSync-authoritative contact objects distribution list membership will be incomplete in EXO, which is a major integrity concern. These contacts need special handling. You have cannot project these, only join them.
What to join and project
Each ADMA will project contact, group and user objects. If GALSync is in place then you need to build logic into the solution to not project contact objects that are not mastered in the forest, i.e. the contact objects created by GALSync. You do want to join these to the correct contact, group or user object however. If you don’t you’ll have inconsistent distribution list memberships! I drive this logic from my rules extension configuration file (XML). I specify authoritative SMTP suffixes and only project contact objects with an authoritative SMTP suffix.
Joining is done using several criteria for me. I join on immutable ID (sourceAnchor), but also join on something unique like email (mail) or employee ID (employeeID). Again, if GALSync is in play then you need to join the GALSync’d contact objects to the real identities so email makes quite a lot of sense.
The general sequence of synchronisation is as follows:
- Delta Import all ADMAs
- Delta Synchronise each ADMA
- Export to AAD
- Import from AAD
- Synchronise AAD
- Export each ADMA where hybrid write-back is enabled
Unless there’s a compelling reason not to, I start with the DirSync schedule of once every three hours. You run this according to your environment, i.e. how long does synchronisation take, how often are new users added to AD, how fast do you need to synchronise deletions, etc.
I utilise a configuration file to specify whether or not the hybrid write-back configuration is enabled and only export to on-premises AD objects when the XML element for the MA in question is enabled. This is particularly helpful during pilot and early deployment. In my experience this is one of the last things to get enabled as it has more impact and the deployment of permissions more formal.
For the most part precedence is irrelevant as any given object has, logically, two connectors – one for the authoritative on-premises directory and one for AAD. With GALSync in play that statement isn’t true, but the implication of the statement is true, i.e. you’ll get additional connectors in the other connector spaces for contact objects but no IAF or EAF. For user and group objects there is no flow from contact objects, the join simply maintains referential integrity of reference attributes (importantly, the member attribute of group objects). For contact objects this isn’t true. Now you have a precedence problem that has to be handled using manual precedence. Utilise the same logic as for join and projection (described above) – only flow data from the primary, authoritative object and reject flows from the downstream contact objects managed by GALSync. I do this using the SMTP suffix.
There are two sides to deployment. Installing and configuring FIM; on-boarding in-scope objects and turning on. Installation and configuration is trivial. You install, patch, import the configuration (from a non-production environment), apply the correct credentials and select your in-scope containers. Easy. On-boarding the initial population and transitioning to live service is the real challenge. The directory synchronisation solution is reasonably simple in terms of configuration therefore I generally take a two or three phased approach, depending on the project timelines and scope:
- Proof. Utilise an aggressive connector filter so that a very small set of objects are in-scope of the solution, i.e. five to ten user objects, a contact or two and a group or two. These can all be test users, i.e. created for this purpose and deleted soon after go-live. Here we synchronise and prove provisioning; export to AAD and prove credentials and permissions; and then test SSO which proves that we’re pushing the right data in-line with the AD FS design and deployment.
- Pilot. When the test users are working the connector filters are relaxed to allow remediated objects to be synchronised. If you’re lucky you have some easy way of capturing remediated objects and can filter non-remediated objects. This is often unlikely, but you might be fortunate enough to take any and all in-scope contact and group objects and remediated user objects. That might help in some (but not all) scenarios. This process will take several hours and will result in a lot of exports to AAD. It is highly important that during deployment you are one of the recipients of the AAD technical notification emails throughout the deployment as the AAD connector doesn’t generate many errors or write much to the log – everything is handled by AAD and emailed to the configured recipients.
- Deployment. Here everything else goes up, assuming you only synchronised a subset during pilot. Get ready for synchronisation errors and data errors from AAD. Work through them fast and efficiently and try hard to root-cause each issue and categorise it so that the project team can feedback to the customer and operations team(s) to mitigate reoccurrence.
The ability to pilot or fully deploy is largely dependent on what services are being enabled. For SharePoint Online pilot is easy, as we’re only really concerned about SSO (security groups can go last). For Exchange Online it’s harder as you can’t really migrate objects to EXO until you have a complete GAL, which means remediation must be complete or close to it.
Try hard to remediate ahead of enabling synchronisation. If the customer is pushing to enable FIM and SSO push back and agree on a scoped pilot where possible. Dropping a large number of mail objects into scope ahead of remediation activities is painful. There’s often synchronisation errors but the real issue is the amount of AAD errors on export. This isn’t so bad now the export loop bug is fixed, but it can still be difficult to understand each real on-premises data error versus failures due to linkage, i.e. if a group member fails to export because of a duplicate proxyAddresses value there will be exported-change-not-reimported errors for the group (writing member).
Lastly, provide input where possible to the operations team. While you should and usually will attain 100% remediation of in-scope objects that won’t last. Errors will start again in a week or two and those errors need to be understood because there’s likely a fundamental process change required. This isn’t truly the job of the IdM consultant, however FIM resource is typically scarce and the DirSync piece is often looked upon as a black box, therefore if you want to leave without looking back you need people on the ground picking up those technical notification emails and understanding them and ultimately fixing the issue that is causing the error and not just the error itself.
I’ll keep this short and sweet. Here’s a succinct list of recommendations for those of you planning to implement this. These aren’t true FIM recommendations, but the lines blur and as the IdM consultant responsible you will be a key member of the O365 deployment team.
- Discuss the immutable ID as early on in the engagement as possible. Don’t push it to one side or forget about it. Remember what immutable means and don’t let the customer choose an almost-but-not-quite immutable ID. It really needs to remain unchanged for the lifetime of the identity. If it doesn’t you have an operational nightmare ahead of you!
- Choose the immutable ID wisely. Don’t let the customer lead this discussion and don’t believe their claims of immutability or uniqueness. Prove it first! Also consider the termination of contractors rehired as permanent employees (and vice-versa), delays in process that result in expired contractors being extended after expiration (because the contract was extended at the last minute or because the process failed), and check and recheck whether or not there are migrations ahead, and will that matter, and are there processes around the immutable ID candidate (I used employee ID only to find out that employee ID was also used to link room and equipment mailboxes to identities!)
- Don’t turn FIM on until the domains used in UPNs are properly verified in O365. If you do the UPN will be successfully exported to AAD web Services (AWS) but will fail validation between MSODS and the “front-end” interfaces thus the UPN will be the default @tenant.onmicrosoft.com with no sign of an exported-change-not-reimported error as it was actually successfully exported, ergo you have to fix the UPNs yourself (via PS).
- Remediate AD before exporting to AAD. Don’t think you can shortcut remediation and handle it on an ad-hoc basis. Clean-up as much as you can upfront, and expect and therefore leave time for synchronisation errors and AAD errors. Don’t rush the first mailbox migrations, as the GAL will take time to become complete.
It is my hope that I have sufficiently provided an overview of what multi-forest directory synchronisation to AAD looks like. I’ve defined the scenario – multiple account forests, each with or without an Exchange organisation and no resource forest with linked mailboxes – and described what data needs to flow from each AD forest and the typical synchronisation pattern. I’ve talked at a high level about object scope, join and projection, the immutable ID problem (which isn’t a FIM specific problem, but an important AAD sign-on problem) and wrapped things up with some titbits of experience.
In my next post I was planning to talk about the initial on-boarding and go-live piece, which basically means on-premises data troubleshooting and remediation, however I feel a discussion on immutable ID should probably come first.
I hope this post has been helpful.
Post-summary appendix-style piece of information
I’ve mentioned Technical notification emails. To see what recipients are defined run the following two commands from the AAD PS console (or add the module to your shell or, if you have Win 8, just type the commands and let the PS lazy load feature work its magic).
Connect-MsolService -Credential (Get-Credential firstname.lastname@example.org) (Get-MsolCompanyInformation).TechnicalNotificationEmails
The actual value that you read from and write to is a .NET List<String> however I’m not truly convinced the system is working with multiple recipients, so utilise a distribution list. And remember to get yourself removed from the DL when your project ends…