Distinct Count in Analysis Services

Business users are often interested in finding distinct counts in addition to other aggregations. They may want to know the number of distinct customers calling customer support, the number of distinct employees selling business, or the number of distinct products that are being sold. Note the difference between distinct count and the other aggregations usually modeled in OLAP cubes; think sum of sales, count of orders.

Requirement:

I’ll use an online forum website as an example. Assume *SiteX* has several users, and these users are allowed to post on forums daily. Given this, let’s suppose, we want to find the count of distinct users that are posting in forums on any given day.

Star Schema:

I have created a star schema with two dimension tables, DimUser to store all the users, and DimDate to store all the dates. Fact table FactForum records user activity on forums.

clip_image001

DimData and DimUser has five rows each. The following image shows the Fact data – all the posts were posted by three distinct users.

DateKey     UserKey     TotalPosts
———– ———– ———–
20120124    1           15
20111217    1           9
20110921    1           16
20120115    3           5
20111119    3           9
20111119    5           17

Cube:

Next, I’ve created a cube with dimensions and measure group from the star schema.

clip_image002

The following is a snapshot from the cube browser, showing the total posts sliced by user and date.

clip_image003

It’s a walk in the park thus far. ain’t it?

Distinct Count:

Next, I created a new measure from the measures pane, configured usage to count of rows, and source table to DimUser. This creates a measure group from a dimension, in other words, dimension DimUsers now acts as a dimension as well as a measure.

clip_image004

clip_image006

clip_image008

The above dimension usage shows that DimUsers measure group is related to DimUsers dimension. The relationship is a fact relationship.

Lastly, I added a many to many relationship between DimUsers and DimDate. Many to many requires an intermediary measure group which in this case is FactForums.

clip_image009

The following image from the cube browser shows total posts and number of distinct users posting over calendar date (dimension).

clip_image010

Conclusion:

This is one way of modeling distinct count in analysis services. I find this useful for several reasons, one of which is performance.

@SamuelVanga

Further Reading:

SSAS Distinct Count Optimization White Paper.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s