Friday, February 10, 2012

Comparing two tables

I have two tables of "trade" data, one comes from out accounting
system, the other from text files sent to us by the counterparties
(brokers). I attacked the problem by creating two views that map the
data into a common format. I then used both of these in a UNION
query...
SELECT *
FROM vCounterpartyTrades AS c LEFT JOIN vOurTrades AS p
ON c.portfolioId = p.portfolioId AND c.cusip = p.cusip AND
c.custodianId = p.custodianId
WHERE p.quantity IS NULL OR c.quantity <> p.quantity
UNION
SELECT *
FROM vCounterpartyTrades AS c RIGHT JOIN vOurTrades AS p
ON c.portfolioId = p.portfolioId AND c.cusip = p.cusip AND
c.custodianId = p.custodianId
WHERE p.quantity <> 0 AND (c.quantity IS NULL OR c.quantity = 0)
The problem is that the UNION causes a temp table to be created. When
I try to add filters to reduce the number of rows that are returned,
the filter is applied to the temp table, not the views, so every
single query takes the same amount of time -- about 20 seconds, and
that's simply too long.
In the past I have had excellent results by replacing views with the
SQL that makes the view. So in this case, "vCounterpartyTrades as c"
is replaced by "( all of the SQL in that view ) as c". That way I can
apply any WHERE filters directly in the SQL by splicing in the WHERE
in VBA, and everything gets a lot faster.
But the problem here is that both views appear twice, on either side
of the UNION. That not only dramatically expands the resulting SQL
(both the views are pretty long as it is) but means both are run
twice. So to really speed this up I think I need to remove the UNION
and replace it with some sort of join...
SELECT *
FROM *a whole bunch of SQL* AS c, *another whole bunch of SQL* AS p
WHERE c.portfolioId = p.portfolioId AND c.cusip = p.cusip AND
c.custodianId = p.custodianId
AND (p.quantity IS NOT null AND (c.quantity IS NULL OR c.quantity = 0)
OR ( p.quantity IS NULL OR c.quantity <> p.quantity))
...is the obvious solution, but this eliminates anything where one
side or the other is null. I know there's a simple solution to this
using other join styles, but I have very little familiarity with them.
Can someone help me out?
Maury
p.s. Is anyone else having problems with MS's interface to these
groups? I can no longer post via their web site, because the post form
will not pop up.> But the problem here is that both views appear twice, on either side
> of the UNION. That not only dramatically expands the resulting SQL
> (both the views are pretty long as it is) but means both are run
> twice. So to really speed this up I think I need to remove the UNION
> and replace it with some sort of join...
I don't fully understand your requirements but is it possible to use UNION
ALL instead of just UNION? I would expect that to reduce the amount of work
that needs to be performed. Another option is to use a FULL JOIN instead of
the UNION of the LEFT/RIGHT JOINs. Maybe something like:
SELECT *
FROM vCounterpartyTrades AS c
FULL JOIN vOurTrades AS p ON
c.portfolioId = p.portfolioId AND
c.cusip = p.cusip AND
c.custodianId = p.custodianId
WHERE
p.quantity IS NULL OR
c.quantity <> p.quantity OR
(p.quantity <> 0 AND
(c.quantity IS NULL OR c.quantity = 0)
)
Hope this helps.
Dan Guzman
SQL Server MVP
"Maury Markowitz" <maury.markowitz@.gmail.com> wrote in message
news:a7f0ed7f-982e-4c41-9123-671e1958e37b@.f10g2000hsf.googlegroups.com...
>I have two tables of "trade" data, one comes from out accounting
> system, the other from text files sent to us by the counterparties
> (brokers). I attacked the problem by creating two views that map the
> data into a common format. I then used both of these in a UNION
> query...
> SELECT *
> FROM vCounterpartyTrades AS c LEFT JOIN vOurTrades AS p
> ON c.portfolioId = p.portfolioId AND c.cusip = p.cusip AND
> c.custodianId = p.custodianId
> WHERE p.quantity IS NULL OR c.quantity <> p.quantity
> UNION
> SELECT *
> FROM vCounterpartyTrades AS c RIGHT JOIN vOurTrades AS p
> ON c.portfolioId = p.portfolioId AND c.cusip = p.cusip AND
> c.custodianId = p.custodianId
> WHERE p.quantity <> 0 AND (c.quantity IS NULL OR c.quantity = 0)
> The problem is that the UNION causes a temp table to be created. When
> I try to add filters to reduce the number of rows that are returned,
> the filter is applied to the temp table, not the views, so every
> single query takes the same amount of time -- about 20 seconds, and
> that's simply too long.
> In the past I have had excellent results by replacing views with the
> SQL that makes the view. So in this case, "vCounterpartyTrades as c"
> is replaced by "( all of the SQL in that view ) as c". That way I can
> apply any WHERE filters directly in the SQL by splicing in the WHERE
> in VBA, and everything gets a lot faster.
> But the problem here is that both views appear twice, on either side
> of the UNION. That not only dramatically expands the resulting SQL
> (both the views are pretty long as it is) but means both are run
> twice. So to really speed this up I think I need to remove the UNION
> and replace it with some sort of join...
> SELECT *
> FROM *a whole bunch of SQL* AS c, *another whole bunch of SQL* AS p
> WHERE c.portfolioId = p.portfolioId AND c.cusip = p.cusip AND
> c.custodianId = p.custodianId
> AND (p.quantity IS NOT null AND (c.quantity IS NULL OR c.quantity => 0)
> OR ( p.quantity IS NULL OR c.quantity <> p.quantity))
> ...is the obvious solution, but this eliminates anything where one
> side or the other is null. I know there's a simple solution to this
> using other join styles, but I have very little familiarity with them.
> Can someone help me out?
> Maury
> p.s. Is anyone else having problems with MS's interface to these
> groups? I can no longer post via their web site, because the post form
> will not pop up.|||Ok, I actually found a great guide on all of this and got it working.
The key was to use "FULL OUTER JOIN" in the FROM. This makes all rows
from either side appear. After that there was a lot of tweaking of the
WHERE, but I did manage to get it working in the end. I still have to
use the two subqueries, but by moving the filters into their WHEREs
the performance is excellent, basically instant.
Maury

No comments:

Post a Comment