Forefront Identity Manager (FIM) Reporting and System Center Service Manager (SCSM) Data Warehouse Extract, Transform and Load (ETL) process
Or…How data moves from the FIM Service to the data warehouse (otherwise known as what on earth is all that disk activity and why is the Service Manager transaction log that size!!)
FIM Reporting adds a number of changes to the FIM Service. It introduces new tables and schema, and a SQL Server agent job. It also updates the SCSM configuration with a set of pre-defined management packs which describe FIM schema and reports to the Data Warehouse.
FIM Reporting flows data from the FIM Service to SCSM using one of three ETL processes (arguably they are not ETL processes but rather Extract processes). Once the data resides in SCSM the System Center Data Warehouse ETL processes archive the data for long term storage and reporting.
FIM Reporting ETL
The FIM Reporting ETL process is processed by the FIM Reporting Service – a thread within the FIM Service. msidmReportingJob resources created in the FIM Service invoke the ETL. An msidmReportingJob resource can be one of three types: Initial, Initial-Partial, or Incremental. Initial and Initial-Partial are deployment processes, whereas Incremental is a continual process that runs throughout the normal operating conditions of the FIM Service.
- Initial. The initial process reads data directly from the FIM Service database and stages it to SCSM. Initial synchronisation is the step performed during deployment after Reporting has been successfully installed. This process must complete in order for the other processes to be viable. This process provides SCSM with a baseline and defines the watermark for the subsequent incremental runs.
- Initial-Partial. This process moves new schema and bindings into SCSM. Every time the FIM Service schema is changed and the changes need to be reflected in FIM Reporting (the changes don’t necessarily need to be defined in FIM Reporting so this is optional depending on whether you wish to report on this data or not) this process must be run to add the new schema to the Data Warehouse configuration.
- Incremental. This process reads data from the FIM Reporting export log. The export log maintains all deltas since the last successful run. The changes are staged to SCSM. The process runs every eight hours by default, starting at 1200 local time. The schedule can be modified. Generally the timeline of eight hours remains, but the start times are modified to better fit with other scheduled processes.
All of the FIM Reporting processes stage data in SCSM via the Data Access Layer – the interface into SCSM (basically the API). The actual data is written into the ServiceManager database. Once the data is staged the FIM Reporting work is complete until the next incremental process. The System Center Data Warehouse (SCDW) jobs perform the rest of the process.
Figure 1 illustrates the flow of data from FIM to SCSM.
Figure 1: FIM Reporting ETL process
System Center Data Warehouse
SCDW has an ETL process that results in the FIM Service data staged to SCSM being added to the SCDW for long term archival and reporting. SCDW has three physical databases, one for each phase of the ETL.
The data warehouse deploys with six jobs, two of which are administrative jobs, the other four are ETL jobs. The jobs are:
|Extract_DW_FIMReporting||ETL: Extract||5 minutes|
|Extract_FIMReporting||ETL: Extract||5 minutes|
Table 1: SCDW jobs and their schedules
The administrative jobs should be left in their default state. The jobs are vital to the successful functioning of the DW. The Extract_DW_FIMReporting extract job can be disabled. This job is only required for FIM Reporting topologies that integrate with an existing SCSM deployment. If SCSM is deployed purely for FIM Reporting this job is irrelevant and should be disabled. The remaining jobs Extract_FIMReporting, Transform.Common and Load.Common are the ETL process.
Extract, Transform and Load
The ETL process first processes the extract job. The relevant data (all changes since last run) is extracted from the SCSM (ServiceManager) database and written to the SCDW staging (DWStagingAndConfig) database. The basic format is the same in the staging database.
Next the Transform.Common job runs. This job takes the raw data from the staging (DWStagingAndConfig) database and performs the necessary aggregation, cleansing, and reformatting required for reporting and writes the data to the transformed (DWRepository) database.
Finally the Load.Common job runs. The transformed data (DWRepository) is written to the reporting (DWDatamart) database.
When the Load.Common job completes new data is available to reporting. SQL Server Reporting Services (SSRS) targets the reporting (DWDatamart) database.
Figure 2 illustrates the process (this is a re-drawing of the image in the Service Manager blog post in the references section). Green arrows are extract; orange transform; blue load.
Figure 2: SCDW ETL process
Once the data is loaded into the DWDatamart database it is available to any permitted consumer. In the case of FIM Reporting this is SQL Server Reporting Services. Illustrated simply, Figure 3 depicts this.
Figure 3: SSRS data access
Technically SQL Server Analysis Services (SSAS) surfaces the data via an online analytical processing (OLAP) cube, however I don’t think that is wholly relevant to this discussion and is managed by the SCDW.
Figure 4 depicts the overall FIM Reporting and SCDW ETL architecture.
Figure 4: Overall ETL process
In addition to nosing at configuration and sending a couple of e-mails the principal sources of information for this post were:
- System Center: Service Manager Engineering Blog – Data Warehouse – Anatomy of Extract, Transform, Load (ETL)
- Deployment Guide for Forefront Identity Manager 2010 R2 – Reporting
Any mistakes are my own, and I’m grateful for comments and feedback.
I hope this is useful!
 You only define new schema in the data warehouse if you wish to report on that data. This does not include historical reporting. By default the request report handles modifications to both the FIM Service schema and attribute changes in FIM, regardless of whether or not you have pushed this schema into Reporting. The reason you define new schema and bindings is to report on current values, e.g. instead of trawling through the request history and ascertaining current values you extend the schema and move current values into the data warehouse. This is an important point, hopefully I have explained it properly.