Tuesday, March 27, 2012

Computing hash values

Hi,
In hash joins, how the hash value is computed? For example in this query:
SET SHOWPLAN_ALL ON
select c.customerid ,o.orderid, o.shipcountry from
customers c right outer join orders o
on c.customerid=o.customerid
and o.shipcountry='germany'
How the fields those appear in HASH) predicate help to create hash values?
I think my problem is that I don't know that what the hash value is.
Thanks,
Leila
Hi Leila
For your query tuning, it shouldn't matter what the actual hash values are.
If possible, you should try to build an index that will allow SQL Server to
perform a different join technique than hashing.
Microsoft does not document any details of the hash functions they use for
processing hash join operations. If you want to know more about hashing in
general, read "The Art of Computer Programming -- Volume 3: Sorting and
Searching" by Donald Knuth.
HTH
Kalen Delaney
SQL Server MVP
www.SolidQualityLearning.com
"Leila" <lelas@.hotpop.com> wrote in message
news:%23W3h0HPoEHA.324@.TK2MSFTNGP11.phx.gbl...
> Hi,
> In hash joins, how the hash value is computed? For example in this query:
> SET SHOWPLAN_ALL ON
> select c.customerid ,o.orderid, o.shipcountry from
> customers c right outer join orders o
> on c.customerid=o.customerid
> and o.shipcountry='germany'
> How the fields those appear in HASH) predicate help to create hash
> values?
> I think my problem is that I don't know that what the hash value is.
> Thanks,
> Leila
>
|||Hi Kalen,
Thanks for your suggestion.
I'm a little confused about the difference between Hash Match and Nested
Loops. As far as I learned from BOL, in Hash Match, the hash values are
moved from the base table to a new place in memory(called hash table), then
an operation like nested loop happens between hash table and another table.
In nested loops, no value is moved from the base table, instead the loop
begins (with no hash table in between) directly with other table.
It seems the only difference is the existence of hash table in between, is
that true?
Thanks again,
Leila
"Kalen Delaney" <replies@.public_newsgroups.com> wrote in message
news:euCtqpPoEHA.3460@.TK2MSFTNGP10.phx.gbl...
> Hi Leila
> For your query tuning, it shouldn't matter what the actual hash values
are.
> If possible, you should try to build an index that will allow SQL Server
to[vbcol=seagreen]
> perform a different join technique than hashing.
> Microsoft does not document any details of the hash functions they use for
> processing hash join operations. If you want to know more about hashing in
> general, read "The Art of Computer Programming -- Volume 3: Sorting and
> Searching" by Donald Knuth.
> --
> HTH
> --
> Kalen Delaney
> SQL Server MVP
> www.SolidQualityLearning.com
>
> "Leila" <lelas@.hotpop.com> wrote in message
> news:%23W3h0HPoEHA.324@.TK2MSFTNGP11.phx.gbl...
query:
>
|||>Leila" <lelas@.hotpop.com> wrote in message
news:%23t9lNSQoEHA.2340@.TK2MSFTNGP10.phx.gbl...

> I'm a little confused about the difference between Hash Match and Nested
> Loops. As far as I learned from BOL, in Hash Match, the hash values are
> moved from the base table to a new place in memory(called hash table),
then
> an operation like nested loop happens between hash table and another
table.
> In nested loops, no value is moved from the base table, instead the loop
> begins (with no hash table in between) directly with other table.
> It seems the only difference is the existence of hash table in between, is
> that true?
In a nested loop, the inner loop is executed once for each outer loop. In a
hash match, the top ("build") input is created, then the bottom ("probe")
input is matched against it. This means that the bottom table is only
scanned once.
|||The 'only' difference is a very expensive one.
If you have an index, SQL Server can take a value from the outer table and
use the index to find matching rows in the inner table.
With a hash match, which is used because there IS no useful index, the data
in the inner table is organized into a hash table, so that SQL Server can
find matching rows using the hash table instead of an index.
Al though the inner table is scanned only once, the process of building the
hash table is resource intensive, and the hash table uses a lot of memory
for a big table.
You're better off building a good index to make the nested loops possible.
HTH
Kalen Delaney
SQL Server MVP
www.SolidQualityLearning.com
"Leila" <lelas@.hotpop.com> wrote in message
news:%23t9lNSQoEHA.2340@.TK2MSFTNGP10.phx.gbl...
> Hi Kalen,
> Thanks for your suggestion.
> I'm a little confused about the difference between Hash Match and Nested
> Loops. As far as I learned from BOL, in Hash Match, the hash values are
> moved from the base table to a new place in memory(called hash table),
> then
> an operation like nested loop happens between hash table and another
> table.
> In nested loops, no value is moved from the base table, instead the loop
> begins (with no hash table in between) directly with other table.
> It seems the only difference is the existence of hash table in between, is
> that true?
> Thanks again,
> Leila
>
> "Kalen Delaney" <replies@.public_newsgroups.com> wrote in message
> news:euCtqpPoEHA.3460@.TK2MSFTNGP10.phx.gbl...
> are.
> to
> query:
>
>
|||Hi Mark,
I cannot understand that how the matching can be performed with one scan?
Maybe because yet I don't know about the real contents of hash table (hash
values).
Could you please help me.
Thanks,
Leila
"Mark Wilden" <mark@.mwilden.com> wrote in message
news:saGdncQrlOIWgc_cRVn-jA@.sti.net...[vbcol=seagreen]
> news:%23t9lNSQoEHA.2340@.TK2MSFTNGP10.phx.gbl...
> then
> table.
is
> In a nested loop, the inner loop is executed once for each outer loop. In
a
> hash match, the top ("build") input is created, then the bottom ("probe")
> input is matched against it. This means that the bottom table is only
> scanned once.
>
|||"Leila" <lelas@.hotpop.com> wrote in message
news:%23ipgFuQoEHA.2140@.TK2MSFTNGP11.phx.gbl...

> I cannot understand that how the matching can be performed with one scan?
> Maybe because yet I don't know about the real contents of hash table (hash
> values).
Kalen is a much better person to explain this than I am. However, to
clarify, there are two scans - one scan to create the hash table in the
first place from the contents of the upper or outer table, and another scan
to match the lower or inner table against this hash table.
For example:
select id from A join B on A.something = B.somethingElse
If a hash match were used, this would look at all the A.something values and
create a hash table from them. For example, if A.something = "Mark's the
best", there might be a hash value created from it like 123. Another row
might contain "Kate's better", and that might hash to a different number,
like 342.
Having created the "build" hash table from A, B is then scanned, creating
hash values from the B.somethingElse column. Each of those values is used as
a "probe" into the original hash table to see if there is a match - i.e., if
A.something really does equal B.somethingElse.
To answer your question, it doesn't matter what hash value is generated for
"Mark's the best". Hashing is just a way of reducing a large number of
possible values to a smaller number.
I hope this didn't make it worse!
|||"Leila" <lelas@.hotpop.com> wrote in message
news:uvRR81QoEHA.1160@.tk2msftngp13.phx.gbl...

> I think I got it! You mean the bottom table is scanned once (for creating
> hash table) and then nested loop is needed for matching rows. Is that
true?
We're so close - and I'm honestly looking forward to what Kalen has to say.

The top table is scanned once to create the table, then the bottom table is
scanned once (not in a nested loop) and matched against the table.
|||Thanks Kalen!
You mentioned 'the data in the inner table is organized into a hash table'.
I read in BOL 'the smaller of the two inputs is the build input'.
Are they different?
"Kalen Delaney" <replies@.public_newsgroups.com> wrote in message
news:ODudYpQoEHA.260@.TK2MSFTNGP10.phx.gbl...
> The 'only' difference is a very expensive one.
> If you have an index, SQL Server can take a value from the outer table and
> use the index to find matching rows in the inner table.
> With a hash match, which is used because there IS no useful index, the
data
> in the inner table is organized into a hash table, so that SQL Server can
> find matching rows using the hash table instead of an index.
> Al though the inner table is scanned only once, the process of building
the[vbcol=seagreen]
> hash table is resource intensive, and the hash table uses a lot of memory
> for a big table.
> You're better off building a good index to make the nested loops possible.
> --
> HTH
> --
> Kalen Delaney
> SQL Server MVP
> www.SolidQualityLearning.com
>
> "Leila" <lelas@.hotpop.com> wrote in message
> news:%23t9lNSQoEHA.2340@.TK2MSFTNGP10.phx.gbl...
is[vbcol=seagreen]
Server
>
|||It really clarified the issue. Thank you very much indeed!
"Mark Wilden" <mark@.mwilden.com> wrote in message
news:yvCdnQJbXZiYtM_cRVn-jA@.sti.net...[vbcol=seagreen]
> "Leila" <lelas@.hotpop.com> wrote in message
> news:%23ipgFuQoEHA.2140@.TK2MSFTNGP11.phx.gbl...
scan?[vbcol=seagreen]
(hash
> Kalen is a much better person to explain this than I am. However, to
> clarify, there are two scans - one scan to create the hash table in the
> first place from the contents of the upper or outer table, and another
scan
> to match the lower or inner table against this hash table.
> For example:
> select id from A join B on A.something = B.somethingElse
> If a hash match were used, this would look at all the A.something values
and
> create a hash table from them. For example, if A.something = "Mark's the
> best", there might be a hash value created from it like 123. Another row
> might contain "Kate's better", and that might hash to a different number,
> like 342.
> Having created the "build" hash table from A, B is then scanned, creating
> hash values from the B.somethingElse column. Each of those values is used
as
> a "probe" into the original hash table to see if there is a match - i.e.,
if
> A.something really does equal B.somethingElse.
> To answer your question, it doesn't matter what hash value is generated
for
> "Mark's the best". Hashing is just a way of reducing a large number of
> possible values to a smaller number.
> I hope this didn't make it worse!
>

No comments:

Post a Comment