Monday, March 19, 2012

Composite Primary Key in fact table

I have a sales fact table with dimensions: dimDate, dimCustomer, dimSalesRep, dimProduct. I am using the surrogate key from each dimension as the composite primary key in my Sales fact table. I had a problem in that the same customer may order the same product multiple times from the same sales rep on the same date. In other words I had a situation where my composite primary key would not have been unique.

It was suggested in another thread that I add the Sales Order Number, which is a bigint identity field in the OLTP, as an additional field in the PK in order to guarantee uniqueness. This I've done.

My question is this: The Sales Order Number will guarantee a unique factSales row so do I need to include all of the surrogate keys in my composite primary key or just just Sales Order Number as the primary key and maintain the surrogate keys as attributes for use in loading data?

Thanks.

Logically speaking, the fact table does not have a composite key. You may have a degenerate dimension in the fact table (such as a Sales Order Number) but logically the primary key is really associated with that dimension. Lots of folks try to put a primary key constraint on their fact tables but you do not necessarily need it from a logical modeling perspective.

In your specific situation, I'd recommend your fact table include the Sales Order Number which would have a PK on it. Foreign key references to your other dimensions would reside in your fact table as well.

B.

|||

Hello! Have a look at www.kimballgroup.com for help with general design issues.

Normally you will create foreign keys from each dimension table to the fact table. Not a composite key.

If the customer order the same product several times on the same date from the sales representative you have the option to aggregate that information to the date level(you will only see totals for one date) or to use the time part of the date column.

It is an design issues for your business requirement.

HTH

Thomas Ivarsson

|||Thanks. I'm pulling data from 8 or 10 different OLTP tables so it'd me next to impossible to aggregate. The Sales Order ID is an identity field in the OLTP so it will always be unique. I'm using that as my primary key in my fact table.

I'm not sure where I got the idea for the composite Primary Key.......must be relational hangover from my OLTP design days......


I've used the Kimball group vid on the SSIS forum as the basis of my ETL design. I'll check out their design ideas. (I'm trying to work through the Melomed, et. all. SSAS book but I get the impression that the authors never heard of Visual Studio....Call me lazy but I'm not going to design my enterprise DW solution writing several hundred thousand lines of XMLA code.)

|||

The Melomed et al. book is really an "Inside SSAS"-type exploration of the technology. BIDS/VS really is just hiding the XMLA stuff from you with a slick interface.

Regarding the Kimball-standards, check out Chapter 5 of "The Data Warehouse Toolkit". I think it covers the exact star-schema you are attempting to create.

Good luck,

B.

|||I use the code veiw when I have to. I just wish they could relate the inside information to modern tools, perhaps refer to both the actual desing interface AND the underlying code. They could also refer to all the examples in binary but we have modern tools these days.....|||

From "The Microsoft Data Warehouse Toolkit" by Kimball Group, page 42: "The key to the fact table is a multi-part key made up of a subset of the foreign keys from each dimension table involved in the business event."

That would seem to say something different.

|||

Kimball also says you should model your data at the lowest level of granularity available.

So, if you assemble a model at the transaction level but do not include the transaction's primary key elements in your dimensions, setting a primary key on your foreign key may result in constraint violations. The only way to avoid that is to pre-aggregate your data to the level supported by your dimensions but you don't want to do that because of data lose (hence Kimball's mandate).

If you also review the models in The Data Warehouse Toolkit, you will not find many fact tables with primary keys defined on them.

B.

No comments:

Post a Comment