Monday, March 19, 2012

composite key structure

I'm just looking to get suggestions as to what the best way to handle this
key structure is, in terms of performance. At present, the current pk is on
an identity value. I 'inherited' this and am sing to change post haste
for many reasons. These three columns combined equate to the primary key
from a business-perspective: tradetime,endpoint,ordernumber
Because of this, I have created a clustered compound primary key with the
columns in this order: tradetime,endpoint,ordernumber
Tradetime has the most selectivity, as it nearly always montonically
increases.
Endpoint is used first in most of the ad hoc where clauses.
All my procedures, however, are very date-specific. So this, too, puts
tradetime in the where clauses a lot.
tradetime is datetime, endpoint is varchar(8) and ordernumber is varchar(32)
hence, the thing is kind of wide. but, as i said before, it is the only
uniqueID from a business perspective. the db structure is both current and
historical. current is about 300-600K and nearly all continuous inserts,
historical averages around 38M and it's where the reporting and analysis is
done. the data is rarely ever updated in either.
So, should it be clustered or non? Are the columns ordered correctly in the
compound key? And any thoughts as to how this will impact insertion
performance?
thank you in advance,
-- LynnLynn,

Base on the this, you should already have a unique index by these three
columns. I think the key is to wide to be used as a clustered index. Remembe
r
that the key of a clustered index is used by the rest of nonclustered indexe
s.
Tips on Optimizing SQL Server Clustered Indexes
http://www.sql-server-performance.c...red_indexes.asp
AMB
"Lynn" wrote:
> I'm just looking to get suggestions as to what the best way to handle this
> key structure is, in terms of performance. At present, the current pk is
on
> an identity value. I 'inherited' this and am sing to change post haste
> for many reasons. These three columns combined equate to the primary key
> from a business-perspective: tradetime,endpoint,ordernumber
> Because of this, I have created a clustered compound primary key with the
> columns in this order: tradetime,endpoint,ordernumber
> Tradetime has the most selectivity, as it nearly always montonically
> increases.
> Endpoint is used first in most of the ad hoc where clauses.
> All my procedures, however, are very date-specific. So this, too, puts
> tradetime in the where clauses a lot.
> tradetime is datetime, endpoint is varchar(8) and ordernumber is varchar(3
2)
> hence, the thing is kind of wide. but, as i said before, it is the only
> uniqueID from a business perspective. the db structure is both current an
d
> historical. current is about 300-600K and nearly all continuous inserts,
> historical averages around 38M and it's where the reporting and analysis i
s
> done. the data is rarely ever updated in either.
> So, should it be clustered or non? Are the columns ordered correctly in t
he
> compound key? And any thoughts as to how this will impact insertion
> performance?
> thank you in advance,
> -- Lynn|||ok, amb, but are you suggesting i do not create a compound pk on these three
columns and that a unique index or unique constraint will suffice, or are yo
u
suggesting i should do the compound pk but that it should be non-clustered?
Lynn
"Alejandro Mesa" wrote:
> Lynn,
>
>
> Base on the this, you should already have a unique index by these three
> columns. I think the key is to wide to be used as a clustered index. Remem
ber
> that the key of a clustered index is used by the rest of nonclustered inde
xes.
> Tips on Optimizing SQL Server Clustered Indexes
> http://www.sql-server-performance.c...red_indexes.asp
>
> AMB
> "Lynn" wrote:
>|||> ok, amb, but are you suggesting i do not create a compound pk on these
> three
> columns and that a unique index or unique constraint will suffice, or are
> you
> suggesting i should do the compound pk but that it should be
> non-clustered?
Do you have related tables (like OrderDetails) that need referential
integrity back to this table? Do you want to store OrderNumber there, or do
you want to repeat the tradetime,endpoint,ordernumber combo in every related
table?|||Actually, no, there are very few related tables. It's a trading repository
that we continually load with trades. The other tables are specific to
accounts and bi, but not to the ordernumbers themselves. There is a need to
go back in a report on/query the data regularly, but it's done against the
historical table, which is structured identically.
--
Lynn
"Aaron Bertrand [SQL Server MVP]" wrote:

> Do you have related tables (like OrderDetails) that need referential
> integrity back to this table? Do you want to store OrderNumber there, or
do
> you want to repeat the tradetime,endpoint,ordernumber combo in every relat
ed
> table?
>
>|||Lynn,
What I meant was that an identity column as pk, does not asure that there
will not be duplicated rows by (tradetime,endpoint,ordernumber) and because
these columns, as you said, equate to the primary key from a
business-perspective then it should exists already a constraint to force the
uniqueness by these columns. The question is if this unique index should be
clustered or not?. I will also consider what Aaron is asking you for (is
there any other table that is referencing this table?). Well, there are othe
r
things to consider like if there are other columns that you use for range
queries or are used in the "group by" clause, what is more important for you
"select" or "insert" performance, etc. Take a look to the article in the
link, it can help you to understand what are the columns in your table prope
r
for a clustered index.
AMB
"Lynn" wrote:
> ok, amb, but are you suggesting i do not create a compound pk on these thr
ee
> columns and that a unique index or unique constraint will suffice, or are
you
> suggesting i should do the compound pk but that it should be non-clustered
?
> --
> Lynn
>
> "Alejandro Mesa" wrote:
>|||Yes, I understand that the identity column pk didn't prevent dupes, nor is i
t
portable at all. All it does is sequentially number the records and it has
no pertinence whatsoever to the actual data value. Except for very few
maintenace scripts, the data is never queried looking on identity value.
Hence, I'm changing it. But I'm not done yet. That's the reason for my
inquiry. I'm trying to find the best way to get us where we need to be. I
understand the three columns already imply a constraint to force the
uniqueness. But that constraint does not yet exist. I intend to drop the
existing pk and replace with one that is actually meaningful, and I feel it
should be based upon these three fields, which uniquely identify our data.
I've already done this in my dev bed as clustered, and I've actually read
that article before several times - it is what prompted me to cluster the pk
in the first place because of the remark near the bottom about wide indices.
But today i've been researching and have seen the nonclustered composite pk
referenced several times, so now I'm debating whether I've done it properly
and I thought I'd s a little advice.
--
Lynn
"Alejandro Mesa" wrote:
> Lynn,
> What I meant was that an identity column as pk, does not asure that there
> will not be duplicated rows by (tradetime,endpoint,ordernumber) and becaus
e
> these columns, as you said, equate to the primary key from a
> business-perspective then it should exists already a constraint to force t
he
> uniqueness by these columns. The question is if this unique index should b
e
> clustered or not?. I will also consider what Aaron is asking you for (is
> there any other table that is referencing this table?). Well, there are ot
her
> things to consider like if there are other columns that you use for range
> queries or are used in the "group by" clause, what is more important for y
ou
> "select" or "insert" performance, etc. Take a look to the article in the
> link, it can help you to understand what are the columns in your table pro
per
> for a clustered index.
>
> AMB
> "Lynn" wrote:
>|||>> Except for very few
maintenace scripts, the data is never queried looking on identity
value.
Hence, I'm changing it.<<
usually we need a stronger reason for changing a working system. What
are your other reasons?|||We have no unique ID. And many, many holes because of this.
--
Lynn
"AK" wrote:

> maintenace scripts, the data is never queried looking on identity
> value.
> Hence, I'm changing it.<<
> usually we need a stronger reason for changing a working system. What
> are your other reasons?
>|||> We have no unique ID. And many, many holes because of this.
AK is merely suggesting that you could easily add a UNIQUE CONSTRAINT or
INDEX to your three columns, and leave the rest as is. The IDENTITY doesn't
have to be the PK but you don't have to remove it completely if the system
will continue to work while it is still there. And you never know, you may
later want to use a skinnier pk than you three wide columns for related
tables (given that such things don't exist now, think about trying to undo
this change later).

No comments:

Post a Comment