Sql Comparing two tables: Complex query problem, help needed

Hi All,
We have a table in our database (MS SQL) where in no primary key defined.
Table name is NAMELINK.
Structure is as follows :
NameId1 int 4 not null
Category char 5 nullable
Relation12 char 20 nullable
NameId2 int 4 not null
Relation21 char 21 nullable
Remarks char 30 nullable
Dependent smallint 2 not null

While conversion we are assuming that NameId1, NameId2 is a
composite primary key (looking at the data majorly it is unique). But
still there are some cases where it can not be unique.
Now we want a select query to fetch only those records where the
combination of NameId1, NameId2 is unique. We tried self join but
somehow it's not working.
Please help about this.

Regards,
PrashantTry this query

select * from NAMELINK inner join ( select distinct NameId1, NameId2 from NAMELINK) NL on NAMELINK.NameId1 = NL.NameId1 and NAMELINK.NameId2 = NL.NameId2

I have written the query on the fly without testing it. So test it and post your reply.

Originally posted by dahalkar_p
Hi All,
We have a table in our database (MS SQL) where in no primary key defined.
Table name is NAMELINK.
Structure is as follows :
NameId1 int 4 not null
Category char 5 nullable
Relation12 char 20 nullable
NameId2 int 4 not null
Relation21 char 21 nullable
Remarks char 30 nullable
Dependent smallint 2 not null

While conversion we are assuming that NameId1, NameId2 is a
composite primary key (looking at the data majorly it is unique). But
still there are some cases where it can not be unique.
Now we want a select query to fetch only those records where the
combination of NameId1, NameId2 is unique. We tried self join but
somehow it's not working.
Please help about this.

Regards,
Prashant|||Hi Mohamed,
Sorry to say that this is not going to work for me..

I need to remove the records which are duplicating this way:

Nameid1 Nameid2
1 2
2 1

I need just one of the following record not both..

Prashant|||Hmm... Seems to be pretty interesting problem. I had been struggling like hell to find a solution to it for about 4 hours continously. My objectives were to avoid temporary tables and avoid cursors; To solve by a single query. Unfortunately I could not avoid correlated query (in the second part of UNION keyword) which I hate because of performance reasons. If number the duplicates as u said were low then the performance should be good. Performance decreases as the number of duplicates increases. Anyway here goes the solution.

SELECT NAMELINK.* FROM NAMELINK LEFT JOIN
(
SELECT nl1.* FROM NAMELINK nl1
INNER JOIN NAMELINK nl2 on nl1.Nameid2 = nl2.Nameid1 and nl1.Nameid1= nl2.Nameid2
) MyNameLink ON Namelink.Nameid1 = MyNameLink.Nameid1 and Namelink.Nameid2 = MyNameLink.Nameid2
WHERE MyNameLink.Nameid1 IS NULL AND MyNameLink.Nameid2 IS NULL

UNION ALL

SELECT nl2.* FROM NAMELINK nl1
INNER JOIN NAMELINK nl2 on nl1.Nameid2 = nl2.Nameid1 and nl1.Nameid1= nl2.Nameid2
WHERE
nl2.Nameid1 =
( SELECT TOP 1 nl3.Nameid1 FROM NAMELINK nl3 INNER JOIN NAMELINK nl4 ON nl3.Nameid2 = nl4.nameid1 and nl3.Nameid1 = nl4.Nameid2 WHERE (nl3.Nameid1 + nl3.Nameid2) = (nl1.Nameid1 + nl1.Nameid2) ) and
nl2.Nameid2 =
( SELECT TOP 1 nl3.Nameid2 FROM NAMELINK nl3 INNER JOIN NAMELINK nl4 ON nl3.Nameid2 = nl4.nameid1 and nl3.Nameid1 = nl4.Nameid2 WHERE (nl3.Nameid1 + nl3.Nameid2) = (nl1.Nameid1 + nl1.Nameid2) )

I tested it thoroughly and validated the solution. Just copy, paste it for execution and tell me the result.

Good Luck.

Originally posted by dahalkar_p
Hi Mohamed,
Sorry to say that this is not going to work for me..

I need to remove the records which are duplicating this way:

Nameid1 Nameid2
1 2
2 1

I need just one of the following record not both..

Prashant|||Cheers!!!!
You have done it..
Thank's a lot man...

Prashant|||I would be glad if you could tell me the number of rows in the namelink table and your comments on the performance of the solution. ( I am much worried about performance in all my solutions).

Originally posted by dahalkar_p
Cheers!!!!
You have done it..
Thank's a lot man...

Prashant|||Right now i have a database with small number of records in the table.
They are around 250. The performance test can not be done on such small db.
I will let you know when it is done on a live database with lot's of records.

-Prashant|||select NameId1, NameId2 from NAMELINK group by NameId1, NameId2
having count(*) = 1

run this query - it should bring back some positive information!!!

Enjoy,

Neil de Later,
Johannesburg - South Africa

e-mail : neil@.bex.co.za

Originally posted by dahalkar_p
Hi All,
We have a table in our database (MS SQL) where in no primary key defined.
Table name is NAMELINK.
Structure is as follows :
NameId1 int 4 not null
Category char 5 nullable
Relation12 char 20 nullable
NameId2 int 4 not null
Relation21 char 21 nullable
Remarks char 30 nullable
Dependent smallint 2 not null

While conversion we are assuming that NameId1, NameId2 is a
composite primary key (looking at the data majorly it is unique). But
still there are some cases where it can not be unique.
Now we want a select query to fetch only those records where the
combination of NameId1, NameId2 is unique. We tried self join but
somehow it's not working.
Please help about this.

Regards,
Prashant|||Hi bex,

This will return all the rows in the table..
I think you have a confusion on the way the question is asked.
It's not just duplicate in both the tables.
It's combination should not repeat even vice versa.

Nameid1 Nameid2
1 2
2 1

Here i need any one record.

-Prashant|||--assume invalid PK NameId1, NameId2 <-> NameId2, NameId1

--without duplicities
select n1.*
from NAMELINK n1
where 1=(
select count(*) from NAMELINK n2 where
(n1.NameId1=n2.NameId1 and n1.NameId2=n2.NameId2)
or
(n1.NameId1=n2.NameId2 and n1.NameId2=n2.NameId1)
)
order by NameId1, NameId2

--only duplicities, correctly ordered
select n1.*
from NAMELINK n1
inner join NAMELINK n2
on n1.NameId1=n2.NameId1 and n1.NameId2=n2.NameId2
where 1<(
select count(*) from NAMELINK n2 where
(n1.NameId1=n2.NameId1 and n1.NameId2=n2.NameId2)
or
(n1.NameId1=n2.NameId2 and n1.NameId2=n2.NameId1)
)
order by
case when n1.NameId1>n1.NameId2
then n1.NameId2
else n1.NameId1
end
,case when n1.NameId1>n1.NameId2
then n1.NameId1
else n1.NameId2
end|||Hi,

How can we delete the duplicates in the similar scenario?

Regards,|||--Try this:

-- Coping source table, adding PK "Id"
select "Id"=IDENTITY(int,1,1),*
into dbo.TempNameLink
from dbo.NameLink
order by
case when NameId1>NameId2
then NameId2
else NameId1
end
,case when NameId1>NameId2
then NameId1
else NameId2
end
GO
-- accelerating by indexes on computed columns
alter table dbo.TempNameLink
add "compNameId1" as
case when n2.NameId1>n2.NameId2
then n2.NameId2
else n2.NameId1
end
,"compNameId2" as
case when n2.NameId1>n2.NameId2
then n2.NameId1
else n2.NameId2
end
GO
create unique clustered index IC_TempNameLink on TempNameLink ("compNameId1","compNameId2","Id")
GO
alter table dbo.TempNameLink
add constraint PK_TempNameLink
primary key nonclustered ("Id")
GO

--inserting unique values
--Information from Category,Relation12,Relation21,Remarks,Dependent columns in duplicities is lost.
create table dbo.NewNameLink (
NameId1 int not null
,NameId2 int not null
,Category char(5) null
,Relation12 char(20) null
,Relation21 char(21) null
,Remarks char(30) null
,Dependent smallint not null
,constraint PK_NewNameLink
primary key ( NameId1,NameId2 )
,constraint CK_NewNameLink
check ( NameId1<NameId2 )
)
GO
insert dbo.NewNameLink(NameId1,NameId2,Category,Relation1 2,Relation21,Remarks,Dependent)
select n1.NameId1,n1.NameId2,n1.Category,n1.Relation12,n1 .Relation21
,n1.Remarks,n1.Dependent
from dbo.TempNameLink n1
join (
select "Id"=min("Id")
from dbo.TempNameLink n2
group by n2."compNameId1",n2."compNameId2"
) XXX on n1."Id"=XXX."Id"|||Thank's for your help but Sorry to say , this will eliminate all the duplicates.

We want to keep one of the duplicate and eliminate the rest of the values.

I think i was not clear in my question.

Regards,|||All,

Here's the fix for the complex situation:

select "Id"=IDENTITY(int,1,1),*
into dbo.TempNameLink
from dbo.NameLink
order by
case when NameId1>NameId2
then NameId2
else NameId1
end
,case when NameId1>NameId2
then NameId1
else NameId2
end
GO

CREATE TABLE [NameLink1] (
[NameID1] [int] NOT NULL DEFAULT (0),
[Category] [char] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL DEFAULT (''),
[Relation12] [char] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL DEFAULT (''),
[NameID2] [int] NOT NULL DEFAULT (0),
[Relation21] [char] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL DEFAULT (''),
[Remarks] [char] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL DEFAULT (''),
[Dependent] [smallint] NOT NULL DEFAULT (0),
[UpdateDate] [datetime] NULL ,
[UpdatedByID] [int] NOT NULL DEFAULT (0),
[rowguid] uniqueidentifier ROWGUIDCOL NOT NULL DEFAULT (newid())
) ON [PRIMARY]
GO

CREATE procedure CLEANLINK(@.tempID integer output) as
begin
declare @.Nameid1 integer,
@.Nameid2 integer,
@.ID integer,
@.COUNTER integer,
@.TOTAL integer
declare TEMPCURSOR cursor dynamic scroll for select N.ID, N.NameID1, N.NameID2 from TEMPNAMELINK as N
select @.TOTAL = COUNT(*) from TEMPNAMELINK
select @.COUNTER=1
open TEMPCURSOR
while @.COUNTER <= @.TOTAL
begin
fetch next from TEMPCURSOR into @.ID, @.Nameid1, @.Nameid2
Begin
IF not exists (Select 1 from namelink1 where Nameid1 = @.Nameid1 and Nameid2 = @.Nameid2)
Begin
INSERT INTO dbo.NameLink1(NameID1, Category, Relation12, NameID2, Relation21, Remarks, Dependent, UpdateDate, UpdatedByID, rowguid) SELECT NameID1, Category, Relation12, NameID2, Relation21, Remarks, Dependent, UpdateDate, UpdatedByID, rowguid from TEMPNAMELINK where nameid1 = @.Nameid1 and nameid2 = @.Nameid2 and ID = @.ID
end
end
select @.COUNTER=@.COUNTER+1
end
DEALLOCATE TEMPCURSOR
end
GO

execute cleanlink 1
go

delete from NAMELINK
go

INSERT INTO dbo.NameLink(NameID1, Category, Relation12, NameID2, Relation21, Remarks, Dependent, UpdateDate, UpdatedByID, rowguid)
SELECT NameID1, Category, Relation12, NameID2, Relation21, Remarks, Dependent, UpdateDate, UpdatedByID, rowguid from NAMELINK1
go

drop table NAMELINK1
go

DROP TABLE TEMPNAMELINK
go

DROP PROCEDURE CLEANLINK
go

This script does it all.

Thanks to all,|||--So simply

select "Id"=IDENTITY(int,1,1),NameID1, Category, Relation12, NameID2, Relation21, Remarks, Dependent, UpdateDate, UpdatedByID, rowguid
into dbo.TempNameLink
from dbo.NameLink
order by NameId1,NameId2
GO
create clustered index IC_TempNameLink on dbo.TempNameLink (NameId1,NameId2)
GO
alter table dbo.TempNameLink
add constraint PK_TempNameLink primary key ("Id")
GO
delete dbo.NameLink
insert dbo.NameLink(NameID1, Category, Relation12, NameID2, Relation21, Remarks, Dependent, UpdateDate, UpdatedByID, rowguid )
select NameID1, Category, Relation12, NameID2, Relation21, Remarks, Dependent, UpdateDate, UpdatedByID, rowguid
from dbo.TempNameLink t
join (
select "Id"=min("Id")
from dbo.TempNameLink
group by NameID1, NameID2
) XXX on t."Id"=XXX."Id"
drop table dbo.TempNameLink

Wednesday, March 7, 2012

Complex query problem, help needed

No comments:

Post a Comment

Sql Comparing two tables

Blog Archive

About Me