FIM Reporting Extract, Transform and Load (ETL) process

Forefront Identity Manager (FIM) Reporting and System Center Service Manager (SCSM) Data Warehouse Extract, Transform and Load (ETL) process

Or…How data moves from the FIM Service to the data warehouse (otherwise known as what on earth is all that disk activity and why is the Service Manager transaction log that size!!)

FIM Reporting

FIM Reporting adds a number of changes to the FIM Service.  It introduces new tables and schema, and a SQL Server agent job.  It also updates the SCSM configuration with a set of pre-defined management packs which describe FIM schema and reports to the Data Warehouse.

FIM Reporting flows data from the FIM Service to SCSM using one of three ETL processes (arguably they are not ETL processes but rather Extract processes).  Once the data resides in SCSM the System Center Data Warehouse ETL processes archive the data for long term storage and reporting.

FIM Reporting ETL

The FIM Reporting ETL process is processed by the FIM Reporting Service – a thread within the FIM Service.  msidmReportingJob resources created in the FIM Service invoke the ETL.  An msidmReportingJob resource can be one of three types: Initial, Initial-Partial, or Incremental.  Initial and Initial-Partial are deployment processes, whereas Incremental is a continual process that runs throughout the normal operating conditions of the FIM Service.

  • Initial.  The initial process reads data directly from the FIM Service database and stages it to SCSM.  Initial synchronisation is the step performed during deployment after Reporting has been successfully installed.  This process must complete in order for the other processes to be viable.  This process provides SCSM with a baseline and defines the watermark for the subsequent incremental runs.
  • Initial-Partial.  This process moves new schema and bindings into SCSM.  Every time the FIM Service schema is changed and the changes need to be reflected in FIM Reporting (the changes don’t necessarily need to be defined in FIM Reporting so this is optional depending on whether you wish to report on this data or not[1]) this process must be run to add the new schema to the Data Warehouse configuration.
  • Incremental.  This process reads data from the FIM Reporting export log.  The export log maintains all deltas since the last successful run.  The changes are staged to SCSM.  The process runs every eight hours by default, starting at 1200 local time.  The schedule can be modified.  Generally the timeline of eight hours remains, but the start times are modified to better fit with other scheduled processes.

All of the FIM Reporting processes stage data in SCSM via the Data Access Layer – the interface into SCSM (basically the API).  The actual data is written into the ServiceManager database.  Once the data is staged the FIM Reporting work is complete until the next incremental process.  The System Center Data Warehouse (SCDW) jobs perform the rest of the process.

Figure 1 illustrates the flow of data from FIM to SCSM.

image

Figure 1: FIM Reporting ETL process

System Center Data Warehouse

SCDW has an ETL process that results in the FIM Service data staged to SCSM being added to the SCDW for long term archival and reporting.  SCDW has three physical databases, one for each phase of the ETL.

SCDW jobs

The data warehouse deploys with six jobs, two of which are administrative jobs, the other four are ETL jobs.  The jobs are:

Name Type Default schedule
DWMaintenance ADMIN:Maintenance Hourly
Extract_DW_FIMReporting ETL: Extract 5 minutes
Extract_FIMReporting ETL: Extract 5 minutes
Load.Common ETL: Load Hourly
MPSyncJob ADMIN: Synchronisation Hourly
Transform.Common ETL:Transform 30 minutes

Table 1: SCDW jobs and their schedules

The administrative jobs should be left in their default state.  The jobs are vital to the successful functioning of the DW.  The Extract_DW_FIMReporting extract job can be disabled.  This job is only required for FIM Reporting topologies that integrate with an existing SCSM deployment.  If SCSM is deployed purely for FIM Reporting this job is irrelevant and should be disabled.  The remaining jobs Extract_FIMReporting, Transform.Common and Load.Common are the ETL process.

Extract, Transform and Load

The ETL process first processes the extract job.  The relevant data (all changes since last run) is extracted from the SCSM (ServiceManager) database and written to the SCDW staging (DWStagingAndConfig) database.  The basic format is the same in the staging database.

Next the Transform.Common job runs.  This job takes the raw data from the staging (DWStagingAndConfig) database and performs the necessary aggregation, cleansing, and reformatting required for reporting and writes the data to the transformed (DWRepository) database.

Finally the Load.Common job runs.  The transformed data (DWRepository) is written to the reporting (DWDatamart) database.

When the Load.Common job completes new data is available to reporting.  SQL Server Reporting Services (SSRS) targets the reporting (DWDatamart) database.

Figure 2 illustrates the process (this is a re-drawing of the image in the Service Manager blog post in the references section).  Green arrows are extract; orange transform; blue load.

image

Figure 2: SCDW ETL process

Viewing reports

Once the data is loaded into the DWDatamart database it is available to any permitted consumer.  In the case of FIM Reporting this is SQL Server Reporting Services.  Illustrated simply, Figure 3 depicts this.

image

Figure 3: SSRS data access

Technically SQL Server Analysis Services (SSAS) surfaces the data via an online analytical processing (OLAP) cube, however I don’t think that is wholly relevant to this discussion and is managed by the SCDW.

Summary

Figure 4 depicts the overall FIM Reporting and SCDW ETL architecture.

image

Figure 4: Overall ETL process

References

In addition to nosing at configuration and sending a couple of e-mails the principal sources of information for this post were:

Any mistakes are my own, and I’m grateful for comments and feedback.

I hope this is useful!

[1] You only define new schema in the data warehouse if you wish to report on that data.  This does not include historical reporting.  By default the request report handles modifications to both the FIM Service schema and attribute changes in FIM, regardless of whether or not you have pushed this schema into Reporting.  The reason you define new schema and bindings is to report on current values, e.g. instead of trawling through the request history and ascertaining current values you extend the schema and move current values into the data warehouse.  This is an important point, hopefully I have explained it properly.

Advertisements

About Paul Williams

IT consultant working for Microsoft specialising in Identity Management and Directory Services.
This entry was posted in FIM, FIM 2010 R2 and tagged , , , , . Bookmark the permalink.

7 Responses to FIM Reporting Extract, Transform and Load (ETL) process

  1. sami says:

    Very informative–thanks for posting.

  2. Pingback: FIM Reporting Extract, Transform and Load (ETL) process | FIMSpecialist

  3. Mannn says:

    Good and very helpful Article!

  4. JLj says:

    Thanks for a good article. How do you make a high available installation of FIM Reporting? Is it possible to install FIM Reporting on two servers and have them both active?

    • You can install FIM Reporting on more than one FIM Service instance. You can also install System Center Service Manager management service on multiple servers. Such a topology will allow for either FIM Service node to process msidmReportingJob resources and write to the local SCSM data access service, giving you resilience in the event of a FIM Service node being offline.

      At this point however we’re into System Center territory, and I’m just learning this stuff too. I need to find out if more than one management service will process the ETL WFs…

      One thing worth stating at this point is that there doesn’t seem to be HA options for the data warehouse, but this is also expected as you’re not going to have super critical SLAs on running reports for historic audit data.

      SSRS can be made available via NLB.

  5. Aryan Nava says:

    Reblogged this on .

  6. Srinwantu says:

    Very helpful article..

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s