Wednesday, March 28, 2012

Mirrored Records?

Hi I need some SQL script help. Need script to delete all table rows that
are duplicates in mirror image. Table has 2 columns, ColumnA and ColumnB.
Row1: ColumnA = x, ColumnB = y
Row2: ColumnA = y, ColumnB = x
Those 2 rows are exactly the same for me. Need a script that will delete
Row2 and all other rows where they're mirrored duplicates.
Thanks DarenDaren,
Once you do this, you should consider putting a constraint on the
table to enforce ColumnA <= ColumnB so this doesn't happen again.
It's a bad model if there are two ways of representing the same facts.
delete from T as T1
where ColumnA > ColumnB
and exists (
select * from T as T2
where T2.ColumnA = T1.ColumnB
and T2.ColumnB = T1.ColumnA
)
If you want to delete only these "mirror duplicates"
when they exist for the same customer/transaction/whatever,
you will need something like
delete from T as T1
where ColumnA > ColumnB
and exists (
select * from T as T2
where T2.customer = T1.customer
and T2.ColumnA = T1.ColumnB
and T2.ColumnB = T1.ColumnA
)
Steve Kass
Drew University
Daren Hawes wrote:

>Hi I need some SQL script help. Need script to delete all table rows that
>are duplicates in mirror image. Table has 2 columns, ColumnA and ColumnB.
>Row1: ColumnA = x, ColumnB = y
>Row2: ColumnA = y, ColumnB = x
>Those 2 rows are exactly the same for me. Need a script that will delete
>Row2 and all other rows where they're mirrored duplicates.
>Thanks Daren
>
>|||Thanks. I am a little with T and T2
Do I need to create a new Table? I have added the actual names below.
Column a = FromID ; Column b = ToID
CREATE TABLE [dbo].[tbl_Matrix] (
[FareID] [int] IDENTITY (1, 1) NOT NULL ,
[FromID] [int] NULL ,
[ToID] [int] NULL ,
[PriceCode] [nvarchar] (50) COLLATE Latin1_General_CI_AS NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[tbl_Matrix] ADD
CONSTRAINT [PK_tbl_Matrix] PRIMARY KEY CLUSTERED
(
[FareID]
) ON [PRIMARY]
GO
"Steve Kass" <skass@.drew.edu> wrote in message
news:%23eKPQBxZFHA.2212@.TK2MSFTNGP14.phx.gbl...
> Daren,
> Once you do this, you should consider putting a constraint on the
> table to enforce ColumnA <= ColumnB so this doesn't happen again.
> It's a bad model if there are two ways of representing the same facts.
> delete from T as T1
> where ColumnA > ColumnB
> and exists (
> select * from T as T2
> where T2.ColumnA = T1.ColumnB
> and T2.ColumnB = T1.ColumnA
> )
> If you want to delete only these "mirror duplicates"
> when they exist for the same customer/transaction/whatever,
> you will need something like
> delete from T as T1
> where ColumnA > ColumnB
> and exists (
> select * from T as T2
> where T2.customer = T1.customer
> and T2.ColumnA = T1.ColumnB
> and T2.ColumnB = T1.ColumnA
> )
> Steve Kass
> Drew University
>
> Daren Hawes wrote:
>|||Try the following untested DELETE statement:
DELETE FROM tbl_Matrix
WHERE EXISTS
(SELECT *
FROM tbl_Matrix AS T
WHERE T.fromid = tbl_Matrix.fromid
AND T.toid = tbl_Matrix.toid
AND T.fareid < tbl_Matrix.fareid)
Now make FromID and ToID not nullable and add a unqiue constraint on those
two columns.
David Portas
SQL Server MVP
--|||Thanks for the reply Dave (I'm working with Daren),
Tryed that 1, deleted nothing, modified it to this...
DELETE FROM tbl_Matrix
WHERE EXISTS
(SELECT *
FROM tbl_Matrix AS T
WHERE T.fromid = tbl_Matrix.toid
AND T.toid = tbl_Matrix.fromid)
...and it deleted everything.
This is a copy of the script we've used to create the table's data, perhaps
if we wrote this script a bit better, wouldn't need another script to clean
it up...
declare @.Counter int
declare @.Counter2 int
select @.Counter=1
select @.Counter2=1
while @.Counter < 136
begin
while @.Counter2 < 136
Begin
Insert into dbo.tbl_Matrix (FromID,ToID)
Values (@.Counter,@.Counter2)
set @.Counter2 = @.Counter2 + 1
End
set @.Counter = @.Counter + 1
set @.Counter2 = 1
end
Regards,
Offal Eater
"David Portas" <REMOVE_BEFORE_REPLYING_dportas@.acm.org> wrote in message
news:xZ6dndpRhvr8_gPfRVn-sw@.giganews.com...
> Try the following untested DELETE statement:
> DELETE FROM tbl_Matrix
> WHERE EXISTS
> (SELECT *
> FROM tbl_Matrix AS T
> WHERE T.fromid = tbl_Matrix.fromid
> AND T.toid = tbl_Matrix.toid
> AND T.fareid < tbl_Matrix.fareid)
> Now make FromID and ToID not nullable and add a unqiue constraint on those
> two columns.
> --
> David Portas
> SQL Server MVP
> --
>|||Thanks for the reply Dave (I'm working with Daren),
Not having any luck with these scripts, I've included the script we've used
to create the table. Perhaps if we wrote this differently, wouldn't have
need for the 'clean-up' script.
declare @.Counter int
declare @.Counter2 int
select @.Counter=1
select @.Counter2=1
while @.Counter < 136
begin
while @.Counter2 < 136
Begin
Insert into dbo.tbl_Matrix (FromID,ToID)
Values (@.Counter,@.Counter2)
set @.Counter2 = @.Counter2 + 1
End
set @.Counter = @.Counter + 1
set @.Counter2 = 1
end
Any feedback would be great.
Thx,
Offal Eater
"David Portas" <REMOVE_BEFORE_REPLYING_dportas@.acm.org> wrote in message
news:xZ6dndpRhvr8_gPfRVn-sw@.giganews.com...
> Try the following untested DELETE statement:
> DELETE FROM tbl_Matrix
> WHERE EXISTS
> (SELECT *
> FROM tbl_Matrix AS T
> WHERE T.fromid = tbl_Matrix.fromid
> AND T.toid = tbl_Matrix.toid
> AND T.fareid < tbl_Matrix.fareid)
> Now make FromID and ToID not nullable and add a unqiue constraint on those
> two columns.
> --
> David Portas
> SQL Server MVP
> --
>|||T1 and T2 are table aliases. You need to compare each
row of T with the rest of T, and to distinguish the "this row"
table and the "rest of" table, you need the aliases:
Since FareID is the primary key, this will do
delete from tbl_Matrix as T1
where ToID > FromID
and exists (
select * from tbl_Matrix as T2
where T2.FromID = T1.ToID
and T2.ToID = T1.FromID
)
This will arbitrarily remove the one of the two mirror duplicates
for which ToID < FromID. If you want to remove, say, the
one with the smaller FareID, you could do this instead:
delete from tbl_Matrix as T1
where exists (
select * from tbl_Matrix as T2
where T2.FromID = T1.ToID
and T2.ToID = T1.FromID
and T2.FareID > T1.FareID
)
Seeing your table now, I'd recommend you change the primary
key to (FromID, ToID), and add a table constraint to enforce
FromID < ToID, or at least put a UNIQUE constraint on
(FromID, ToID), and make those two columns NOT NULL (if
not all columns). I don't see much use to the column FareID,
actually, but removing it might mess up other thing you rely on.
There could be other problems, too. You could have many rows
with the same FromID, ToID pair in the same order, but with
different FareID values. So you might want to do this:
delete from tbl_Matrix as T1
where exists (
select * from tbl_Matrix as T2
where (
T2.FromID = T1.ToID
and T2.ToID = T1.FromID
) or (
T2.FromID = T1.FromID
and T2.ToID = T1.ToID
)
and T2.FareID > T1.FareID
)
If you have rows where FromID = ToID, you can delete them
separately. I assume those would make no sense, but you have
nothing to prevent that from occurring right now.
SK
Daren Hawes wrote:

>Thanks. I am a little with T and T2
>Do I need to create a new Table? I have added the actual names below.
>Column a = FromID ; Column b = ToID
>CREATE TABLE [dbo].[tbl_Matrix] (
> [FareID] [int] IDENTITY (1, 1) NOT NULL ,
> [FromID] [int] NULL ,
> [ToID] [int] NULL ,
> [PriceCode] [nvarchar] (50) COLLATE Latin1_General_CI_AS NULL
> ) ON [PRIMARY]
>GO
>ALTER TABLE [dbo].[tbl_Matrix] ADD
> CONSTRAINT [PK_tbl_Matrix] PRIMARY KEY CLUSTERED
> (
> [FareID]
> ) ON [PRIMARY]
>GO
>
>"Steve Kass" <skass@.drew.edu> wrote in message
>news:%23eKPQBxZFHA.2212@.TK2MSFTNGP14.phx.gbl...
>
>
>|||> Tryed that 1, deleted nothing, modified it to this...
I ran your script and it didn't generate any duplicates so there is nothing
to delete - the combination of (FromID, ToID) is already unique (18225
rows). If you have something different in your data then please post a few
INSERT statements to generate some sample data that we can use to test it
out.

> This is a copy of the script we've used to create the table's data,
> perhaps if we wrote this script a bit better, wouldn't need another script
> to clean it up...
I posted a different solution to this in reply to Daren's post earlier
today.
David Portas
SQL Server MVP
--|||Here's a much better way to create the table. It uses
just one insert and no loops, and has much less chance of
off-by-one errors:
-- create a temporary table full of integers
declare @.Ints table (
IntVal int primary key
)
insert into @.Ints
select OrderID-10247
from Northwind..Orders
where OrderID-10247 between 1 and 136
-- Insert everything at once:
insert into dbo.tbl_Matrix(FromID, ToID)
select N1.IntVal, N2.IntVal
from @.Ints as N1 join @.Ints as N2
on N1.IntVal between 1 and 136
and N2.IntVal between N1.IntVal and 136
-- or N2.IntVal between N1.IntVal+1 and 136
If you want loops, but no mirror duplicates, replace
set @.Counter2 = 1
with
set @.Counter2 = @.Counter
This will continue to give you rows where FromID = ToID. If you don't
want those, start @.Counter2 one higher. Here's a guess, cleaned up
a bit, but I'd never trust this as much as the first suggestion I gave.
declare @.Counter int
declare @.Counter2 int
select @.Counter=0
while @.Counter < 135
begin
set @.Counter = @.Counter + 1
set @.Counter2 = @.Counter
while @.Counter2 < 135
Begin
set @.Counter2 = @.Counter2 + 1
Insert into dbo.tbl_Matrix (FromID,ToID)
Values (@.Counter,@.Counter2)
End
end
SK
Offal Eater wrote:

>Thanks for the reply Dave (I'm working with Daren),
>Not having any luck with these scripts, I've included the script we've used
>to create the table. Perhaps if we wrote this differently, wouldn't have
>need for the 'clean-up' script.
>declare @.Counter int
>declare @.Counter2 int
>select @.Counter=1
>select @.Counter2=1
>while @.Counter < 136
>begin
> while @.Counter2 < 136
> Begin
> Insert into dbo.tbl_Matrix (FromID,ToID)
> Values (@.Counter,@.Counter2)
> set @.Counter2 = @.Counter2 + 1
> End
>set @.Counter = @.Counter + 1
>set @.Counter2 = 1
>end
>Any feedback would be great.
>Thx,
>Offal Eater
>"David Portas" <REMOVE_BEFORE_REPLYING_dportas@.acm.org> wrote in message
>news:xZ6dndpRhvr8_gPfRVn-sw@.giganews.com...
>
>
>|||David,
I don't think the OP wants both (a, b) and (b, a). The 18225 rows
from the script are all (a,b) where a and b are between 1 and 135.
One way to eliminate the "mirror duplicates" is to generate only pairs
where a <= b (or perhaps a < b). There should be either 9180 or
9045 rows, depending on whether = is included.
SK
David Portas wrote:

>I ran your script and it didn't generate any duplicates so there is nothing
>to delete - the combination of (FromID, ToID) is already unique (18225
>rows). If you have something different in your data then please post a few
>INSERT statements to generate some sample data that we can use to test it
>out.
>
>
>I posted a different solution to this in reply to Daren's post earlier
>today.
>
>

No comments:

Post a Comment