Friday, February 10, 2012

Comparing Two Tables Without a Cursor...HELP!

I have a project I am working on that has two tables. One is a reference
table, and the other is a table that stores incoming data from an outside
vendor. The customer wants us to write a "solution" that will compare the
incoming data table to the lookup table, and find records that don't match.
Any records in the incoming data table should have corresponding records in
the lookup table, and specific columns need to match. The "solution" will
create a list of records with anomolies.
I had planned to write a cursor to march through the lookup table and check
for records in the incoming data table, but the customer feels cursors are a
bad idea, and does not want us to use them.
Are there alternatives to cursors?
BV.Please post DDL along with
1) definition of matching rows (e.g., incoming.col1 = lookup.col1 and
incoming.col2 = lookup.col2...)
2) sample data [match and mismatch]
3) which data you're returning - in most cases, the desired result is
the incoming data w/o corresponding lookup values. however, you're
initial approach sounds like you want lookups that aren't in the
incoming data. whichever, a cursor is indeed unnecessary and probably bad
w/o the above all you'll get is guesses, like this
-- use NOT EXISTS
select * -- list columns in real code
from incoming i
where not exists (
select *
from lookup
where col1 = i.col1
and col2 = i.col2
)
BenignVanilla wrote:
> I have a project I am working on that has two tables. One is a reference
> table, and the other is a table that stores incoming data from an outside
> vendor. The customer wants us to write a "solution" that will compare the
> incoming data table to the lookup table, and find records that don't match
.
> Any records in the incoming data table should have corresponding records i
n
> the lookup table, and specific columns need to match. The "solution" will
> create a list of records with anomolies.
> I had planned to write a cursor to march through the lookup table and chec
k
> for records in the incoming data table, but the customer feels cursors are
a
> bad idea, and does not want us to use them.
> Are there alternatives to cursors?
> BV.
>|||"Trey Walpole" <treypole@.newsgroups.nospam> wrote in message
news:Odbq1JkFGHA.516@.TK2MSFTNGP15.phx.gbl...
> Please post DDL along with
> 1) definition of matching rows (e.g., incoming.col1 = lookup.col1 and
> incoming.col2 = lookup.col2...)
> 2) sample data [match and mismatch]
> 3) which data you're returning - in most cases, the desired result is the
> incoming data w/o corresponding lookup values. however, you're initial
> approach sounds like you want lookups that aren't in the incoming data.
> whichever, a cursor is indeed unnecessary and probably bad
Trey, thanks for helping me clarify. Here goes...In the sample data below, I
have a lookup table with a list of people, and an Incoming table, with new
records. I need to be able to run a query whereby I search for either name,
address, or phone in the lookup table, and compare the records in the
Incoming table to ensure the incoming data is accurate.
Using the data below, let's assume I am searching by name. The results of
this run of the "solution" would return a hit on the Bob row, as the
incoming address does not match what is in the lookup table.
My plan was to cursor through the lookup table, fetching rows from the
incoming data table, and generate my report output into a temp table, then
return the contents of the temp table.
Does this help clarify my issue?
Lookup Table
Name Address Phone
Bob Maine 410-555-1212
Tom New Jersey 908-555-1234
Jane Delware 402-555-4392
Incoming Table
Name Address Phone
Bob Georgia 410-555-1212
Tom New Jersey 908-555-1234
Jane Delware 402-555-4392|||You're approaching the problem from a procedural perspective; try to
think relationally. You do not need a cursor for this; a simple SQL
join will do it for you.
SELECT inc.Name, inc.Address, inc.Phone
FROM Incoming inc LEFT JOIN Reference ref
ON inc.Name = ref.Name AND inc.Address = ref.Address AND inc.Phone =
ref.Phone
WHERE ref.Name is NULL
You can use NOT EXISTS as suggested by Trey above, and it may perform
faster; I'm just used to reading LEFT JOIN's.
HTH,
Stu|||"Stu" <stuart.ainsworth@.gmail.com> wrote in message
news:1136951096.202363.67240@.z14g2000cwz.googlegroups.com...
> You're approaching the problem from a procedural perspective; try to
> think relationally. You do not need a cursor for this; a simple SQL
> join will do it for you.
> SELECT inc.Name, inc.Address, inc.Phone
> FROM Incoming inc LEFT JOIN Reference ref
> ON inc.Name = ref.Name AND inc.Address = ref.Address AND inc.Phone =
> ref.Phone
> WHERE ref.Name is NULL
> You can use NOT EXISTS as suggested by Trey above, and it may perform
> faster; I'm just used to reading LEFT JOIN's.
>
Stu, this was our option #2, I just wasn't sure it was a good idea from
performance angle, so I thought I'd avoid it. My concern is that the
customer wants to be able to do the comparison via different comparisons of
fields, by name, by address, etc. So I'll need to write different
statements, which is not a big deal, I am just worried that if one of these
fields is not indexed, and I am joining millions of rows to millions of
rows...what will that do to the server?
BV.|||Stu wrote:
> You're approaching the problem from a procedural perspective; try to
> think relationally. You do not need a cursor for this; a simple SQL
> join will do it for you.
> SELECT inc.Name, inc.Address, inc.Phone
> FROM Incoming inc LEFT JOIN Reference ref
> ON inc.Name = ref.Name AND inc.Address = ref.Address AND inc.Phone =
> ref.Phone
> WHERE ref.Name is NULL
> You can use NOT EXISTS as suggested by Trey above, and it may perform
> faster; I'm just used to reading LEFT JOIN's.
> HTH,
> Stu
instead of joining the tables and compare the cols it is sometimes
easieer (especially with many many columns to compare) to use the
checksum() functions of SQL Server.
you can do things like
select * from incoming x where checksum(name,address,phone) in (select
checksum(name,address,phone) from lookup)
to compare many many rows very easily. I dont know if this violates some
best practices or something, but checksum() helped me out very often.
hth
Gregor Stefka|||As long as you are joining on the key fields every time, performance should
not be an issue with this approach. I am assuming that the key fields have
to be equal before you begin looking at any other criteria. Adding extra
criteria should not prevent you from using indexes (assuming the extra
criteria does not include an additional table).
"BenignVanilla" <bvanilla@.tibetanbeefgarden.com> wrote in message
news:JOudnY6-w907uFjeRVn-sw@.giganews.com...
> "Stu" <stuart.ainsworth@.gmail.com> wrote in message
> news:1136951096.202363.67240@.z14g2000cwz.googlegroups.com...
> Stu, this was our option #2, I just wasn't sure it was a good idea from
> performance angle, so I thought I'd avoid it. My concern is that the
> customer wants to be able to do the comparison via different comparisons
of
> fields, by name, by address, etc. So I'll need to write different
> statements, which is not a big deal, I am just worried that if one of
these
> fields is not indexed, and I am joining millions of rows to millions of
> rows...what will that do to the server?
> BV.
>|||Performance tuning is always an issue, especially with a very large
database; however, a JOIN will ALWAYS perform better than a cursor.
I'm not saying that cursors don't have their place; this just ain't one
of them.
If you're worried about performance, be sure that your indexes are
onthe appropriate joining columns, and you may consider reducing your
isolation level (since it sounds as if you are reporting on batched
data, rather than live data).
Stu

No comments:

Post a Comment