Identifying updates/changes in database delta views with nullable columns

When working with database management agents such as the SQL Server or Oracle MAs there will come a time when you’ll need a delta view. Thus far I’ve been lucky and haven’t had many instances whereby I’ve had to use the multivalued table or view, however when you do you’ll quickly realise that delta tables or views can be quite handy even with small data sets (the multi-value table is horrifically slow). Most people build the delta view using the “snapshotting” idea documented here. Carol also has a nice succinct description here.

My reason for this post is that I was essentially doing the same as Carol. Here’s the “Modify” section of my view:

SELECT
    c.[EMAIL_ADDRESS], 
    c.[SURNAME], 
    c.[FIRST_NAME], 
    c.[MIDDLE_NAMES], 
    c.[INITIALS], 
    c.[GENDER], 
    c.[TITLE], 
    c.[PREFERRED_NAME], 
    c.[MANAGEMENT_AREA], 
    c.[CATEGORY_OF_PERSON], 
    c.[MANAGER_PERSON_ID], 
    c.[PERSON_ID], 
    'Modify' AS [CHANGETYPE]
FROM [FIMHRStage].[dbo].[CurrentDataSet] AS C 
    INNER JOIN [FIMHRStage].[dbo].[CurrentDataSnapshot] AS S 
ON c.[PERSON_ID] = s.[PERSON_ID] 
    WHERE 
        c.[EMAIL_ADDRESS] <> s.[EMAIL_ADDRESS] OR 
        c.[SURNAME] <> s.[SURNAME] OR 
        c.[FIRST_NAME] <> s.[FIRST_NAME] OR 
        c.[MIDDLE_NAMES] <> s.[MIDDLE_NAMES] OR 
        c.[INITIALS] <> s.[INITIALS] OR 
        c.[GENDER] <> s.[GENDER] OR 
        c.[TITLE] <> s.[TITLE] OR 
        c.[PREFERRED_NAME] <> s.[PREFERRED_NAME] OR 
        c.[MANAGEMENT_AREA] <> s.[MANAGEMENT_AREA] OR 
        c.[CATEGORY_OF_PERSON] <> s.[CATEGORY_OF_PERSON] OR 
        c.[MANAGER_PERSON_ID] <> s.[MANAGER_PERSON_ID] OR 
        c.[PERSON_ID] <> s.[PERSON_ID]; 

Now I wasn’t seeing modifications using this technique. The reason was because most of my columns are nullable, and in a number of instances the modification was the setting of a previously null value. It took a while to remember and realise that an expression of column <> column yields UNKNOWN if one of the values is NULL. A query filter will not return a row for which the filter expression evaluates to UNKNOWN –more information here and here.

I initially worked around this by combining the not equal (<>) expression and the IS NOT NULL predicate as follows:

SELECT
    c.[EMAIL_ADDRESS], 
    c.[SURNAME], 
    c.[FIRST_NAME], 
    c.[MIDDLE_NAMES], 
    c.[INITIALS], 
    c.[GENDER], 
    c.[TITLE], 
    c.[PREFERRED_NAME], 
    c.[MANAGEMENT_AREA], 
    c.[CATEGORY_OF_PERSON], 
    c.[MANAGER_PERSON_ID], 
    c.[PERSON_ID], 
    'Modify' AS [CHANGETYPE] 
FROM [FIMHRStage].[dbo].[CurrentDataSet] AS C 
    INNER JOIN [FIMHRStage].[dbo].[CurrentDataSnapshot] AS S 
ON c.[PERSON_ID] = s.[PERSON_ID] 
    WHERE 
        (c.[EMAIL_ADDRESS] <> s.[EMAIL_ADDRESS] OR (c.[EMAIL_ADDRESS] IS NOT NULL AND s.[EMAIL_ADDRESS] IS NULL)) OR 
        (c.[SURNAME] <> s.[SURNAME] OR (c.[SURNAME] IS NOT NULL AND s.[SURNAME] IS NULL)) OR 
        (c.[FIRST_NAME] <> s.[FIRST_NAME] OR (c.[FIRST_NAME] IS NOT NULL AND s.[FIRST_NAME] IS NULL)) OR 
        (c.[MIDDLE_NAMES] <> s.[MIDDLE_NAMES] OR (c.[MIDDLE_NAMES] IS NOT NULL AND s.[MIDDLE_NAMES] IS NULL)) OR 
        (c.[INITIALS] <> s.[INITIALS] OR (c.[INITIALS] IS NOT NULL AND s.[INITIALS] IS NULL)) OR 
        (c.[GENDER] <> s.[GENDER] OR (c.[GENDER] IS NOT NULL AND s.[GENDER] IS NULL)) OR 
        (c.[TITLE] <> s.[TITLE] OR (c.[TITLE] IS NOT NULL AND s.[TITLE] IS NULL)) OR 
        (c.[PREFERRED_NAME] <> s.[PREFERRED_NAME] OR (c.[PREFERRED_NAME] IS NOT NULL AND s.[PREFERRED_NAME] IS NULL)) OR 
        (c.[MANAGEMENT_AREA] <> s.[MANAGEMENT_AREA] OR (c.[MANAGEMENT_AREA] IS NOT NULL AND s.[MANAGEMENT_AREA] IS NULL)) OR 
        (c.[CATEGORY_OF_PERSON] <> s.[CATEGORY_OF_PERSON] OR (c.[CATEGORY_OF_PERSON] IS NOT NULL AND s.[CATEGORY_OF_PERSON] IS NULL)) OR 
        (c.[MANAGER_PERSON_ID] <> s.[MANAGER_PERSON_ID] OR (c.MANAGER_PERSON_ID IS NOT NULL AND s.MANAGER_PERSON_ID IS NULL)) OR 
        (c.[PERSON_ID] <> s.[PERSON_ID] OR (c.[PERSON_ID] IS NOT NULL AND s.[PERSON_ID] IS NULL)); 

However I felt that was a little messy. I reached out for some clarification of why my first option wasn’t working and received some good advice that meant that I eventually went with the following, more elegant solution:

SELECT
    c.[EMAIL_ADDRESS], 
    c.[SURNAME], 
    c.[FIRST_NAME], 
    c.[MIDDLE_NAMES], 
    c.[INITIALS], 
    c.[GENDER], 
    c.[TITLE], 
    c.[PREFERRED_NAME], 
    c.[MANAGEMENT_AREA], 
    c.[CATEGORY_OF_PERSON], 
    c.[MANAGER_PERSON_ID], 
    c.[PERSON_ID], 
    'Modify' AS [CHANGETYPE] 
FROM [FIMHRStage].[dbo].[CurrentDataSet] AS C 
    INNER JOIN [FIMHRStage].[dbo].[CurrentDataSnapshot] AS S 
ON c.[PERSON_ID] = s.[PERSON_ID] 
    WHERE ISNULL(c.[EMAIL_ADDRESS] , '') <> ISNULL(s.[EMAIL_ADDRESS] , '') 
        OR ISNULL(c.[SURNAME] , '') <> ISNULL(s.[SURNAME] , '') 
        OR ISNULL(c.[FIRST_NAME] , '') <> ISNULL(s.[FIRST_NAME] , '') 
        OR ISNULL(c.[MIDDLE_NAMES] , '') <> ISNULL(s.[MIDDLE_NAMES] , '') 
        OR ISNULL(c.[FORENAMES] , '') <> ISNULL(s.[FORENAMES] , '') 
        OR ISNULL(c.[INITIALS] , '') <> ISNULL(s.[INITIALS] , '') 
        OR ISNULL(c.[GENDER] , '') <> ISNULL(s.[GENDER] , '') 
        OR ISNULL(c.[TITLE] , '') <> ISNULL(s.[TITLE] , '') 
        OR ISNULL(c.[PREFERRED_NAME] , '') <> ISNULL(s.[PREFERRED_NAME] , '') 
        OR ISNULL(c.[MANAGEMENT_AREA] , '') <> ISNULL(s.[MANAGEMENT_AREA] , '') 
        OR ISNULL(c.[CATEGORY_OF_PERSON] , '') <> ISNULL(s.[CATEGORY_OF_PERSON] , '') 
        OR ISNULL(c.[MANAGER_PERSON_ID] , '') <> ISNULL(s.[MANAGER_PERSON_ID] , '') 
        OR ISNULL(c.[PERSON_ID] , '') <> ISNULL(s.[PERSON_ID] , ''); 

Obviously toggling ANSI_NULLS is an option but this was strongly discouraged by the SQL server community thus the ISNULL function, which returns an empty string in the above example when the specified column is null, was the approach I eventually took.

Advertisements

About Paul Williams

IT consultant working for Microsoft specialising in Identity Management and Directory Services.
This entry was posted in FIM, SQL Server, Troubleshooting and tagged , , , , , , . Bookmark the permalink.

One Response to Identifying updates/changes in database delta views with nullable columns

  1. I do all this inside a custom XMA, but you might look at the MERGE operator if you’re on SQL Server 2008+. You could probably handily build something more effective (adds and deletes included) with less code.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s