How Do You Ensure Data Integrity in Cubes?

A friend of mine recently asked for assistance on creating a custom DLL. I’m a total n00b at this. But, i asked him, why he was pulling his hair over this.

He wants the DLL to implement data integrity in the cube. For example, in a customer dimension with SSN as an attribute, he was hoping, a DLL can be created to ensure that SSN is always 9 digit number.

I had no clue. However, I’ve suggested him to use constraints in the sql table, source data warehouse itself. Check constraint on SSN column can get the job done.

There are many benefits of using constraints in the data warehouse for this particular case:

  • Check constraints are simple. Even people like me, with no knowledge on DLL’s can maintain them.
  • They ensure that data in the warehouse is clean. You don’t want to allow 15 digit number for SSN, for example.
  • Only the data warehouse contains source data for cubes. And if data here is clean, there is no way bad data is entered into the cube. At least in the perfect world.
  • With clean data in the warehouse, reports can consume data from either warehouse or cubes.

What’s your approach?

What would be your approach to this problem? Did you create a DLL? What other solutions do you have?

You are welcome to post a comment below.

~Sam.

One thought on “How Do You Ensure Data Integrity in Cubes?

  1. Definitely agree that data integrity should be handled before the cube (i.e. preferably on the way into the data warehouse).

    Even if bad data (e.g. 15 digit ssn) was detected in the cube, there’s no easy/good way to correct it there. Dimension writeback would be the only way (I can think of before my first cup of coffee this morning) to change bad data in the cube…but very few clients (e.g. excel) support that feature and I believe it has been deprecated.

    Table constraints are a good practice, but a MDS/DQS solution will be better in the long run for a number of reasons.

Leave a comment