Sql Comparing two tables: clustered

Showing posts with label clustered. Show all posts

Tuesday, March 20, 2012

Compound Primary Key - order not as expected

Hello,

if you create this table:

create table hello (
int a
, int b
constraint pk_hello primary key clustered ( a, b )
)

and then insert the following records

a,b
1,1
1,2
1,3
2,1
2,2
2,3
3,1
3,2
3,3

and then do

select a,b from hello

the output seems to be:

a,b
1,1
2,1
3,1
1,2
2,2
3,2
1,3
2,3
3,3

which is wrong and (i think) is reflecting the actual index order
and physical order on disk

it should be:

a,b
1,1
1,2
1,3
2,1
2,2
2,3
3,1
3,2
3,3

i have tested this on a table with 500,000 records

and sure enough if you declare the clustered primary key fields in
reverse order:

constraint pk_hello primary key clustered ( b, a )

two things happen:

- the select with no order by returns the records in the expected order
- queries relying on that order run MUCH FASTER

has anyone else seen / noticed this?John Rivers wrote:
> Hello,
> if you create this table:
> create table hello (
> int a
> , int b
> constraint pk_hello primary key clustered ( a, b )
> )
> and then insert the following records
> a,b
> 1,1
> 1,2
> 1,3
> 2,1
> 2,2
> 2,3
> 3,1
> 3,2
> 3,3
> and then do
> select a,b from hello
> the output seems to be:
> a,b
> 1,1
> 2,1
> 3,1
> 1,2
> 2,2
> 3,2
> 1,3
> 2,3
> 3,3
> which is wrong and (i think) is reflecting the actual index order
> and physical order on disk

This is not wrong at all. As long as you do not have an "ORDER BY"
clause the RDBMS is free to return records in *any* order.

> it should be:
> a,b
> 1,1
> 1,2
> 1,3
> 2,1
> 2,2
> 2,3
> 3,1
> 3,2
> 3,3
> i have tested this on a table with 500,000 records
> and sure enough if you declare the clustered primary key fields in
> reverse order:
> constraint pk_hello primary key clustered ( b, a )
> two things happen:
> - the select with no order by returns the records in the expected order

Again: you have to adjust your expectations.

> - queries relying on that order run MUCH FASTER
> has anyone else seen / noticed this?

Yes.

Cheers

robert|||Order is not guaranteed unless you include an ORDER BY. This is by design.

--
Tom

----------------
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Toronto, ON Canada
..
"John Rivers" <first10@.btinternet.com> wrote in message
news:1146048739.469710.138210@.e56g2000cwe.googlegr oups.com...
Hello,

if you create this table:

create table hello (
int a
, int b
constraint pk_hello primary key clustered ( a, b )
)

and then insert the following records

a,b
1,1
1,2
1,3
2,1
2,2
2,3
3,1
3,2
3,3

and then do

select a,b from hello

the output seems to be:

a,b
1,1
2,1
3,1
1,2
2,2
3,2
1,3
2,3
3,3

which is wrong and (i think) is reflecting the actual index order
and physical order on disk

it should be:

a,b
1,1
1,2
1,3
2,1
2,2
2,3
3,1
3,2
3,3

i have tested this on a table with 500,000 records

and sure enough if you declare the clustered primary key fields in
reverse order:

constraint pk_hello primary key clustered ( b, a )

two things happen:

- the select with no order by returns the records in the expected order
- queries relying on that order run MUCH FASTER

has anyone else seen / noticed this?|||Let's get back to the basics of an RDBMS. Rows are not records; fields
are not columns; tables are not files; there is no sequential access or
ordering in an RDBMS, so "first", "next" and "last" are totally
meaningless. If you want an ordering, then you need to have a column
that defines that ordering. You must use an ORDER BY clause on a
cursor or in an OVER() clause.

You need to read a book on RDBMS; you are still locked into a file
system mind set.|||Hello,

when a clustered index is present the records *are* physically ordered
on disk to match the index

that is the whole point of a clustered index

and by default a select statement with no ORDER BY will always return
data in the order of the clustered index (when present)

this can easily be proved by watching the Execution Plan

the issue i am trying to highlight concerns the order of the records on
disk when a *compound* clustered index is present

i have seen cases when it is not as expected

maybe you can enjoy reading that RDBMS book :-)

best wishes,

john|||John Rivers wrote:
> Hello,
> when a clustered index is present the records *are* physically ordered
> on disk to match the index
> that is the whole point of a clustered index
> and by default a select statement with no ORDER BY will always return
> data in the order of the clustered index (when present)
Um. No. I've seen it return them out of order with only a few hundred
rows. As soon as the table is occupying more than one page, the query
optimizer *can* decide to produce a parallel plan. You'll see the
result as chunks of output which are in clustered index order, but no
deterministic ordering between the chunks. e.g. it'll look like:

1
2
3
4
5
11
12
13
14
15
6
7
8
9
10

The *only* way to guarantee the order of output is to put an order by
clause on your select statement.

Damien|||John Rivers wrote:
> and by default a select statement with no ORDER BY will always return
> data in the order of the clustered index (when present)

Not true at all. As Joe says, tables are not logically ordered. There
is no guarantee that any queries will match the physical order on disk
or in a clustered index.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/...US,SQL.90).aspx
--|||On 28 Apr 2006 05:41:26 -0700, John Rivers wrote:

>Hello,
>when a clustered index is present the records *are* physically ordered
>on disk to match the index
>that is the whole point of a clustered index

Hi John,

Correct.

>and by default a select statement with no ORDER BY will always return
>data in the order of the clustered index (when present)

Incorrect. Damien already pointed out the risk of parallellism.

Another potential issue is an optimization technique MS employs called
"piggybacking" - if a query on another connection is in the middle of a
tbale scan on the table you need, the DB will use the values coming in
for your query as well, then (when the first query's table scan is
finished) restart the scan from start up to where it started to
piggyback. The results would be like 6 - 7 - 8 - 9 - 10 - 1 - 2 - 3 - 4
- 5

This is almost impossible to reproduce in a test environment, but it
MIGHT happen intermittently in a heavily used production DB. Tough lluck
if your app expects the rows to be in order, even without ORDER BY.

>the issue i am trying to highlight concerns the order of the records on
>disk when a *compound* clustered index is present
>i have seen cases when it is not as expected

How did you "see" those cases? Using a query reallly doesn't prove
anything. Did you issue DBCC PAGE commands to inspect the actual
contents of the index and data pages?

--
Hugo Kornelis, SQL Server MVP|||John Rivers (first10@.btinternet.com) writes:
> when a clustered index is present the records *are* physically ordered
> on disk to match the index
> that is the whole point of a clustered index

Actually, they are ordered if you follow the page links. But if pages
are in disorder, the physical order on disk may be yet another one.

> and by default a select statement with no ORDER BY will always return
> data in the order of the clustered index (when present)

No. This may have been true by chance for SQL Server up version 6.5. It is
definitely not correct for SQL 7 and later.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Thanks for your knowledgable answers

I will check out DBCC PAGE

Monday, March 19, 2012

Composite Index and Query Optimizer

Hi
I have a question. If I have a composite primary key on table (in essence I
will have a composite clustered index as PK by default creates a clustered
index) when would the Query optimzer would not use the index on it?. Is it
like if the Query optimzer thinks that a table scan would be much faster tha
n
using a index, it would not use the index?.
MVPs, please enlighten me on this scenario.
Thanks
MIf you have a clustered index on a table, you wouldn't see a table scan, bu
t
a clustered index scan.
Whether the optimizer picks a scan of a s

depends on a lot of things,
like what columns the select statement retreives, the nature of the WHERE
condition (exact matches are more likely to result in a s

; operators like
LIKE or BETWEEN a scan), and the number of rows that statistics on the table
indicate would be returned by the SELECT. (For a query that statistics
indicate would return a large number of rows, the optimizer may choose an
index scan, even when you would think a s

would be better).
If you also have nonclustered indexes on the table, you could also see a
s

on the nonclustered index with a bookmark lookup to the clustered index
.
Again, where the optimizer chooses that path depends on the same things
mentioned above.
"Meher" wrote:

> Hi
> I have a question. If I have a composite primary key on table (in essence
I
> will have a composite clustered index as PK by default creates a clustered
> index) when would the Query optimzer would not use the index on it?. Is it
> like if the Query optimzer thinks that a table scan would be much faster t
han
> using a index, it would not use the index?.
> MVPs, please enlighten me on this scenario.
> Thanks
> M|||On Thu, 2 Feb 2006 12:20:28 -0800, Meher wrote:

>Hi
>I have a question. If I have a composite primary key on table (in essence I
>will have a composite clustered index as PK by default creates a clustered
>index) when would the Query optimzer would not use the index on it?. Is it
>like if the Query optimzer thinks that a table scan would be much faster th
an
>using a index, it would not use the index?.
Hi Meher,
Mark already gave some answers, but he was mistaken on one point and he
forgot one point.
Here's all you ever wanted to know but were afraid to ask:
- Search for equality (Col01 = 'Abdef'), and you get an index s

for
an exact match. For a composite index, this can still result in more
than one row, so it's technically still an index s

for a range of
rows.
- Search for a range, for instance with BETWEEN, with < and/or >
comparisons operator, with LIKE (but only if the first character is not
a wildcard - LIKE 'abc%' is good; LIKE '%abc' isn't), and you get an
index s

for a range of matching values.
- An index scan will be used if the indexed value is embedded in an
expression (WHERE Col02 + 17 = 34), for a LIKE that starts with a
wildcard (LIKE '%abc' or LIKE '_abc'), and you get a clustered index
scan (which is basically the same as a table scan).
- Also, if you search for a value in the second column of a composite
index, a table scna has to be used. Compare this to finding everyone
living on Elm Street in a phone book that is sorted by last name, with
street as secondary sort order for people with equal last name.
Hugo Kornelis, SQL Server MVP

Composite clustered index - column order

Want to check my thinking with you folks...

I have a table with a clustered composite index, consisting of 3 columns, which together form a unique key. For illustration, the columns are C1, C2 & C3.

Counts of distinct values for columns are C1 425, C2 300,000 & C3 4,000,000

C3 is effectively number of seconds since 01/01/1970.

The usage of the table is typically, insert a row, do something else, then update it.

Currently, the index columns are ordered C3,C1,C2. Fill factor of 90%.

My thinking is that this composite index is better ordered C1,C2,C3.

My reasoning is that having C3 as the leading column, biases all the inserts towards one side of the indexes underlying B-tree, causing page splits. Also, there'll be a bunch of "wasted" space across the tree, as the values going into C3 only ever get bigger (like an identity), so the space due to the fill factor in lower values never gets used.

Welcome your thoughts.

What are the data types of these columns? If C3 is a datetime or a bigint, updating it with a larger value (more seconds since 1970) should not be causing page splits. That usually happens with varchars that are updated to a larger value for example. You are usually better off to have a narrow clustered index.

What are you trying to accomplish here? Are you worried about SELECT performance, INSERT/UPDATE performance, or about index size and maintenance?

If C3 is being updated a lot, you might be better off to have the clustered index on C1, C2, and then have a non-clustered index on C3.

|||

"What are the data types of these columns"

char(4),Char(4) and int

"If C3 is a datetime or a bigint, updating it with a larger value (more seconds since 1970) should not be causing page splits"

Together the 3 columns provide unique key, and none of the columns are updated. The page splitting aspect I'm considering is, if the first column in the clustered is effectively an identity (so the next value inserted can only ever be bigger than the last), does this bias the inserts to one side of the tree - page splits being necessary there, because a fill factor spreads the free space throughout the tree?

"You are usually better off to have a narrow clustered index"

Yes. I appreciate that, because it gets tagged onto all non-clustered indexes. Let's assume that space isn't an issue.

Looking for best pewrformance for select \ insert & update. Index size & maint not an issue.

Thanks

Tuesday, March 20, 2012

Compound Primary Key - order not as expected

Monday, March 19, 2012

Composite Index and Query Optimizer

Composite clustered index - column order

Sql Comparing two tables

Blog Archive

About Me