Showing posts with label socreate. Show all posts
Showing posts with label socreate. Show all posts

Monday, March 19, 2012

composite indexes - a subtle question

I have a table like so
CREATE TABLE [dbo].[account] (
[pty_id] [int] NOT NULL ,
[sort_code] [int] NOT NULL ,
[account_no] [int] NOT NULL ,
[account_open_dt] [datetime] NULL ,
[account_close_dt] [datetime] NULL ,
[account_nm] [varchar] (50) COLLATE Latin1_General_CI_AS NULL ,
[market_sector] [char] (2) COLLATE Latin1_General_CI_AS NULL ,
[market_segment] [char] (2) COLLATE Latin1_General_CI_AS NULL ,
[category_code] [smallint] NULL ,
[prime] [char] (1) COLLATE Latin1_General_CI_AS NULL
CONSTRAINT PK_ACCOUNT PRIMARY KEY CLUSTERED (sort_code, account_no)
) ON [PRIMARY]
as you can see there's a primary key clustered index on account_no and
sort_code
I can link it up to any other table efficiently which also has sort_code and
account_no as part of a single index (primary or otherwise)
ON (a.sort_code= b.sortCode) ANd (a.account_no = b.account_no)
etc...
and that will work fine...
Here's my question
if account_no is part or the sort_code/Account_no composite key (and
therefore index)
could I link another table to only the account_no column and still get some
indexing benefit
i.e can you join on a single column of a multiple column index and still get
some help from that composite index or does that composite index only work a
s
a single entity and in fact you need to create an extra nonclustered index
for that column on top of the other index it's participating in to get the
benefit of indexing on that column...
phew
I hope this makes sense
any help would be great appreciated
Regards and thanks in advance,
CharlesACharles,
When joining to the first column of a multi-column index, some
of the same optimizations are available--for example, a merge
join (if the other table also has a supporting index) or nested loop
joins with index ss. Depending on the exact query, the two-column
index may be just as useful as a separate one-column index or a
little less useful (typically because the two-column index takes up
more data pages because of duplications in the first column and the
presence of the second column even when duplicates are few), but it
should still help in most cases. The only real way to be sure is
to create the additional index and measure the performance.
Steve Kass
Drew University
CharlesA wrote:

>I have a table like so
>CREATE TABLE [dbo].[account] (
> [pty_id] [int] NOT NULL ,
> [sort_code] [int] NOT NULL ,
> [account_no] [int] NOT NULL ,
> [account_open_dt] [datetime] NULL ,
> [account_close_dt] [datetime] NULL ,
> [account_nm] [varchar] (50) COLLATE Latin1_General_CI_AS NULL ,
> [market_sector] [char] (2) COLLATE Latin1_General_CI_AS NULL ,
> [market_segment] [char] (2) COLLATE Latin1_General_CI_AS NULL ,
> [category_code] [smallint] NULL ,
> [prime] [char] (1) COLLATE Latin1_General_CI_AS NULL
> CONSTRAINT PK_ACCOUNT PRIMARY KEY CLUSTERED (sort_code, account_no)
> ) ON [PRIMARY]
>
>
>as you can see there's a primary key clustered index on account_no and
>sort_code
>I can link it up to any other table efficiently which also has sort_code an
d
>account_no as part of a single index (primary or otherwise)
>ON (a.sort_code= b.sortCode) ANd (a.account_no = b.account_no)
>etc...
>and that will work fine...
>Here's my question
>if account_no is part or the sort_code/Account_no composite key (and
>therefore index)
>could I link another table to only the account_no column and still get some
>indexing benefit
>i.e can you join on a single column of a multiple column index and still ge
t
>some help from that composite index or does that composite index only work
as
>a single entity and in fact you need to create an extra nonclustered index
>for that column on top of the other index it's participating in to get the
>benefit of indexing on that column...
>phew
>I hope this makes sense
>any help would be great appreciated
>Regards and thanks in advance,
>CharlesA
>|||Charles,
Can you recreate your PK as (account_no, sort_code)? What are the
cardinalities of these 2 columns?|||Charles,
I know you'd like an immediate answer, but it might really help to
watch Kimberly Tripp's (SQL Server MVP) presentation at the following
website. I'm in the process of designing indexes and it was incredibly
helpful to watch the presentation. She keeps it simple but I finally
got it...
http://www.microsoft.com/uk/technet...aspx?videoid=29
Other links:
http://msevents.microsoft.com/CUI/E...e=en-U
S
http://www.sqlskills.com/blogs/kimb...
3-a58ba7b1265e
(The presentation took me a while to download and it is an hour long
but totally worth it.)|||Hi,
Whether you get performance benifit or not depends on the location of the
column in the order of the index key.
You want to join Account_no from tbl1 with Account no in tbl2 with
sort_code/Account_no as an index.
If the left most column in the index is sort_code i.e
Index(sort_code,Account_no) , you don't find any use of the index, it will
still go for an idex scan.
But if your index had been Index(Account_no,sort_code)
then this index is sufficient for joining Account_no alone or Account_no and
sort_code.
Hope I made sense :)|||Thanks everyone for your answers, much appreciated...
I will download that presentation when I'm at home, I've been dying to
understand indexing properly for ages now
as I say, thanks All
Regards,
CharlesA

Sunday, March 11, 2012

complicated update (for me)

hello everyone,
just having trouble with an update statement. so i started off with a
table like so:
create table foo (
id uniqueidentifier not null primary key nonclustered,
name varchar(50) not null,
type int not null,
datecreated datetime not null default getdate(),
constraint foo1 unique (name)
)
go
then i added a new column, like so:
alter table foo add column rank int null
go
and here's where the tricky update comes in. my goal is to populate
this new "rank" column with incremental values for each distinct "type"
value.
for example, i the original table looks like so:
select name, type from foo
go
name type
-- --
john 1
jane 1
jim 1
jake 2
jeff 2
kyle 2
keli 2
kim 2
i would want the update statement to populate the table so that AFTER
the update, it looks like so:
select name, type, rank from foo
name type rank
-- -- --
john 1 1
jane 1 2
jim 1 3
jake 2 1
jeff 2 2
kyle 2 3
keli 2 4
kim 2 5
it's important to note that the original "rank" really doesn't matter,
it is just something that needs to be kept track of moving forward, so
i don't even particularly have to rank them alphabetically, or any
other way. just need an update statement that can put in incremental
values, but that sort of "resets" the incrementation for each changing
value of another field.
if that's possible.
thanks in advance for any help!
jasonYou have a few problems with this. First, a uniqueidentifier cannot be
a relational key, so this is not a table by definition. Where does it
occur in the reality of the data model? Next, Rows are not records;
fields are not columns; tables are not files. Totally differerent
concepts. You are building a sequential file in SQL
DEFAULT comes after the data type in STANDARD SQL, and you can use
CURRENT_TIMESTAMP instead of the proprietary getdate(). But you should
not put audit trail information in the table (ask your accountant about
proper procedures).
There is no sequential access or ordering in an RDBMS, so "first",
"next" and "last" are totally meaningless. So what are you using to
assign the rank values? The data model should have some rule so you
can validate the data.
Granted this is a sample table, but all of the column names are
incomplete (type of what? Name of what? Etc.) Your sample design
should look more like this:
CREATE TABLE NewFoo
(foo_name VARCHAR(50) NOT NULL PRIMARY KEY,
foo_type INTEGER NOT NULL,
bar_rank INTEGER NOT NULL,
UNIQUE (foo_type, bar_rank)); -- is this the PK?
Now, let's kill the old one and get things into an RDBMS:
INSERT INTO NewFoo (foo_name, foo_type, bar_rank)
SELECT name, type,
(SELECT COUNT(F1.*)
FROM Foo AS F1
WHERE F1.type = Foo.type
AND F1.name = Foo.name)
FROM Foo;
Then drop Foo and rename NewFoo.|||> First, a uniqueidentifier cannot be a relational key, so this
> is not a table by definition. Where does it occur in the
> reality of the data model?
i call bs on this one, CELKO. a uniqueidentifier is a perfectly valid
artificial key for any record. what makes you think it can't be a
relational key?

> But you should not put audit trail information in the table
i call bs on this one too, i'm afraid. row-level auditing is a
perfectly valid practice for a number of reasons.
as for the rest, yeah, i was lazy, i didn't actually try the create
table statement before posting :) and i'll try to be more descriptive
in future examples.
as for the rank, it's definitely not an attempt to make the table a
sequential file, so you guessed wrong there i'm afraid. i use clustered
indexes if order is important for something, which has specific,
uncommon purposes in my opinion.
the rank is a meaningful value that a subscriber application needs to
know. you can think of it like your standing in a contest, you're
either in first, second, third, etc. place. the rank can't actually be
determined by any values in the database, it is determined by the
providers of the data, and we just store the rank. it's meaningful, not
physically sequential, etc.
and lastly, regarding the insert statement: THANK YOU. exactly what i
was looking for. i look forward to trying it out when i get back to the
office tomorrow.
thankful as always,
jason|||>> call bs on this one, CELKO. a uniqueidentifier is a perfectly valid artificial
key for any record [sic]. what makes you think it can't be a relational key? <<
The very definition of a relationall key, the stuff Dr. Codd wrote and
basic data model concepts. A relational key is subset of the
attributes of an entity that is unique. A uniqueidentifier is derived
from physical storage and has nothing whatsoever to do with the entity.
Newbies who do not know that a row (logical construct) is nothing like
a record (physical storage) constantly make this mistake and build file
systems in SQL.
An artificial key has to have validation and verification rules, and a
uniqueidentifier does not.
That is fine, but putting the audit trail into the table that is being
audited is not a proper accounting practice. The changes need to be
caught outside of the table. Talk to the accounting department or the
SOX guy for your company. This is like letting developers do their own
QA.|||assuming (name, type) is unique
update foo set rank = (select count(*) from foo f1 where f1.type =
foo.type and f1.name<foo.name)+1
if (name, type) is NOT unique, it it still doable but more complex|||> A uniqueidentifier is derived from physical storage
well, I guess whoever says this at a job interview is less likely to
get hired ;)|||> A relational key is subset of the
> attributes of an entity that is unique.
actually, if i'm reading you correctly, that's a called NATURAL key.
relational keys do not have the necessary condition of being natural
elements of an entity. the only necessary condition of a relational key
is that it be a column or columns whose values are gauranteed to be
unique across all occurrences in a given table. that's it.
by this definition, a relational key can be natural OR artificial. what
you're describing is a natural relational key, and good for you, that's
totally fine. and so are artificial relational keys.

> An artificial key has to have validation and verification rules, and a
> uniqueidentifier does not.
the only "validation and verification" rule required to act as a
relational key is that it be UNIQUE. and uniqueidentifiers, when
properly used, are certainly that.
i presume that you would have just as many objections about using an
identity integer as a relational key? if that's true, then you're
grossly misrepresenting your argument. you're not arguing the
definition of a relational key, you're arguing the validity of
artificial versus natural keys AS relational keys. totally different
argument.

> That is fine, but putting the audit trail into the table that is being
> audited is not a proper accounting practice.
i don't see your logic here. what is the difference between attaching
such a column to the entity it is auditing and putting it in another
table, and relating it to the entity it is auditing? the only
difference i can think of is that you could apply different user
permissions to each table. that's fine and well, but there are plenty
of other places to handle security, and other considerations, such as
performance.|||this worked like a charm, thank you very much!
jason|||yeah, i wasn't sure where that was coming from either. aren't they
derived from like a bunch of crazy variables? datetime, cpu serial
number, mac address, your mother's maiden name, the position of the
every valence electron in your body ...|||>> what is the difference between attaching such a column to the entity it i
s auditing and putting it in another table, and relating it to the entity it
is auditing? <<
You do not have to put the audit information in the schema at all. It
can be in an external file system or other RDBMS.
Separation is a basic accounting principle, like double entry
bookkeeping. For example, when I submit an article to a publisher, I
send the editor one copy of the invoice and another copy to Accounts
Payable. The A/P clerk has to match both copies of the invoice before
they issue a check to me.
My editor deletes their copies of my invoice. The A/P clerk now has an
invoice without a mate at the end of the payment cycle, so they know to
start calling people.
The A/P department deletes their copies of my invoice. The editor now
has an invoice which was no paid at the end of the payment cycle, so
they know to call the A/P department.
Both editor and A/P delete their copies of my invoice. Accounting sees
a missing invoice number at the end of the payment cycle, because they
designed an invoice number that can be validated and verified rather
than a meaningless, hardware generated number. Accounting makes life
hell for everyone until they can trace that missing invoice number.
.

complicated select

through no fault of my own, there exists a table with varchar(50)
column, the values of which are comma separated integers. something
like so:
create table foo (
fookey int primary key not null,
foointegerlist1 varchar(50) null,
foointegerlist2 varchar(50) null,
fooprice money null,
foosize int null
)
insert into foo values (1, ',2,3', '1', 10, 100)
insert into foo values (2, '3', '4', null, 100)
insert into foo values (3, ',1,11', ',1,5,6', 10, 100)
insert into foo values (4, ',3,5', ',11', 10, null)
(yes, the leading comma will randomly appear)
unpleasant? yes. scheduled to be redesigned? yes. but for now, this is
what i have to contend with.
the problem i'm having is that there is another table with a very
similar column, like so:
create table foomatch (
foomatchkey int primary key not null,
foomatchintegerlist1 varchar(50) null
foomatchintegerlist2 varchar(50) null,
foomatchprice money null,
foomatchsize int null
)
insert into foomatch values (1, ',2', ',4,5,6', 10, 100)
insert into foomatch values (2, ',1,3,6', ',1', null, null)
the goal of the select statement is to capture all of the rows in the
table foo that "match" the values of a given single row in the foomatch
table. the rules for matching are as follows:
foomatch.foomatchprice = foo.fooprice AND
foomatch.foomatchsize = foo.foosize AND
(this is where it gets dicey for me)
at least ONE of the integer values in foomatch.foomatchintegerlist1
must occur in the foo.foointegerlist1 column AND
at least ONE of the integer values in foomatch.foomatchintegerlist2
must occur in the foo.foointegerlist1 column
where foomatch.foomatchkey = 1 (for example)
i have a user defined function that can turn a comma separated string
value into a single column table. i mention this in case building
tables of the values will make this easier / possible (perhaps with
relational math?)
thanks in advance for any help, and just let me know if you need
clarification,
jasonhi jason,
its quite a long story
use string manipulations to meet the desired solution.
we have several string functions that you can use
thanks,
Jose de Jesus Jr. Mcp,Mcdba
Data Architect
Sykes Asia (Manila philippines)
MCP #2324787
"jason" wrote:

> through no fault of my own, there exists a table with varchar(50)
> column, the values of which are comma separated integers. something
> like so:
> create table foo (
> fookey int primary key not null,
> foointegerlist1 varchar(50) null,
> foointegerlist2 varchar(50) null,
> fooprice money null,
> foosize int null
> )
> insert into foo values (1, ',2,3', '1', 10, 100)
> insert into foo values (2, '3', '4', null, 100)
> insert into foo values (3, ',1,11', ',1,5,6', 10, 100)
> insert into foo values (4, ',3,5', ',11', 10, null)
> (yes, the leading comma will randomly appear)
> unpleasant? yes. scheduled to be redesigned? yes. but for now, this is
> what i have to contend with.
> the problem i'm having is that there is another table with a very
> similar column, like so:
> create table foomatch (
> foomatchkey int primary key not null,
> foomatchintegerlist1 varchar(50) null
> foomatchintegerlist2 varchar(50) null,
> foomatchprice money null,
> foomatchsize int null
> )
> insert into foomatch values (1, ',2', ',4,5,6', 10, 100)
> insert into foomatch values (2, ',1,3,6', ',1', null, null)
> the goal of the select statement is to capture all of the rows in the
> table foo that "match" the values of a given single row in the foomatch
> table. the rules for matching are as follows:
> foomatch.foomatchprice = foo.fooprice AND
> foomatch.foomatchsize = foo.foosize AND
> (this is where it gets dicey for me)
> at least ONE of the integer values in foomatch.foomatchintegerlist1
> must occur in the foo.foointegerlist1 column AND
> at least ONE of the integer values in foomatch.foomatchintegerlist2
> must occur in the foo.foointegerlist1 column
> where foomatch.foomatchkey = 1 (for example)
> i have a user defined function that can turn a comma separated string
> value into a single column table. i mention this in case building
> tables of the values will make this easier / possible (perhaps with
> relational math?)
> thanks in advance for any help, and just let me know if you need
> clarification,
> jason
>|||thanks for the reply Jose.
i spent some time looking at string manipulation approaches, i'm not
sure string manipulation will do the trick here. even if we boiled it
down to a single comparison, how would you use string manipulation to
answer the question are any of the comma separated elements in ',1,2,3'
also in the comma separated list ',3,4,5'? i've spent some time going
down this path, and it didn't yield anything close to sane results.
though if you can post something a little more specific, i'd be happy
to entertain the notion.
and yes, my explanations are rarely brief. this isn't even close to one
of my longer ones :)|||If you use tables or table valued user defined functions for the integer
lists it is reasonably straightforward:
Assuming there are tables/udfs foolist1 (fookey, foointeger1) ,
foomatchlist1 (foomatchkey, foomatchinteger1) and foomatchlist2
(foomatchkey, foomatchinteger2)
SELECT f.<column list>
FROM foo f
INNER JOIN foomatch fm
ON fm.foomatchprice = f.fooprice
AND fm.foomatchsize = f.foosize
WHERE EXISTS
(
SELECT NULL FROM foolist1 fl1
INNER JOIN foomatchlist1 fml1
ON fl1.foointeger1 = fml1.foomatchinteger1
WHERE fl1.fookey = f.fookey AND fml1.foomatchkey = fm.foomatchkey
)
AND EXISTS
(
SELECT NULL FROM foolist1 fl1
INNER JOIN foomatchlist2 fml2
ON fl1.foointeger1 = fml2.foomatchinteger2
WHERE fl1.fookey = f.fookey AND fml2.foomatchkey = fm.foomatchkey
)
Jacco Schalkwijk
SQL Server MVP
"jason" <iaesun@.yahoo.com> wrote in message
news:1128432811.427135.37460@.g44g2000cwa.googlegroups.com...
> through no fault of my own, there exists a table with varchar(50)
> column, the values of which are comma separated integers. something
> like so:
> create table foo (
> fookey int primary key not null,
> foointegerlist1 varchar(50) null,
> foointegerlist2 varchar(50) null,
> fooprice money null,
> foosize int null
> )
> insert into foo values (1, ',2,3', '1', 10, 100)
> insert into foo values (2, '3', '4', null, 100)
> insert into foo values (3, ',1,11', ',1,5,6', 10, 100)
> insert into foo values (4, ',3,5', ',11', 10, null)
> (yes, the leading comma will randomly appear)
> unpleasant? yes. scheduled to be redesigned? yes. but for now, this is
> what i have to contend with.
> the problem i'm having is that there is another table with a very
> similar column, like so:
> create table foomatch (
> foomatchkey int primary key not null,
> foomatchintegerlist1 varchar(50) null
> foomatchintegerlist2 varchar(50) null,
> foomatchprice money null,
> foomatchsize int null
> )
> insert into foomatch values (1, ',2', ',4,5,6', 10, 100)
> insert into foomatch values (2, ',1,3,6', ',1', null, null)
> the goal of the select statement is to capture all of the rows in the
> table foo that "match" the values of a given single row in the foomatch
> table. the rules for matching are as follows:
> foomatch.foomatchprice = foo.fooprice AND
> foomatch.foomatchsize = foo.foosize AND
> (this is where it gets dicey for me)
> at least ONE of the integer values in foomatch.foomatchintegerlist1
> must occur in the foo.foointegerlist1 column AND
> at least ONE of the integer values in foomatch.foomatchintegerlist2
> must occur in the foo.foointegerlist1 column
> where foomatch.foomatchkey = 1 (for example)
> i have a user defined function that can turn a comma separated string
> value into a single column table. i mention this in case building
> tables of the values will make this easier / possible (perhaps with
> relational math?)
> thanks in advance for any help, and just let me know if you need
> clarification,
> jason
>|||ahh! i think i see the solution here. yes, i think i can modify my
udf's slightly to make this work. i will give that a shot.
thanks very much
jason