Monday, March 26, 2012

MINUS operation between table date ranges, is my algorithm sound?

Hi everyone, i was hoping someone could help me.
i am looking to implement a MINUS operation between two tables which contain
date ranges.
table a:
[start] [finish] [group]
1 10 0
18 19 1
23 26 2
28 31 3
table b:
[start] [finish]
4 5
18 18
25 28
Result of table a - table b:
[start] [finish]
1 3
6 10
19 19
23 24
29 31
Would anyone know how to implement this? I have done this by firstly
performing an AND operation between the two tables:
table a AND table b = table c (here the grouping colums tells me which group
from table a the date range is associated with
[start] [finish] [group]
4 5 0
18 18 1
25 26 2
28 28 3
Then I throw table a and c into another table (d). Here I assign:
start date of table a: status = 1
end date of table a: status = 0
start date of table b: status = 0
end date of table b: status = 1
start of date range in table a: boundary = 0
end of date range for table a: boundary = 2
boundary = 1 for all data coming from table b
table d:
[date] [status] [group] [boundary]
1 1 0 1
10 0 0 3
18 1 1 1
19 0 1 3
23 1 2 1
26 0 2 3
28 1 3 1
31 0 3 3
4 0 0 2
5 1 0 2
18 0 1 2
18 1 1 2
25 0 2 2
26 1 2 2
28 0 3 2
28 1 3 2
I then sort using the SQL statement: order by 3,1,2 to get:
table d:
[date] [status] [group] [boundary]
1 1 0 1
4 0 0 2
5 1 0 2
10 0 0 3
18 1 1 1
18 0 1 2
18 1 1 2
19 0 1 3
23 1 2 1
25 0 2 2
26 1 2 2
26 0 2 3
28 1 3 1
28 0 3 2
28 1 3 2
31 0 3 3
I then update the table according to:
case
when boundary = 2 and status = 0 then data = date -1
when boundary = 2 and status = 1 then data = date +1
end
and I get:
table d:
[date] [status] [group] [boundary]
1 1 0 1
3 0 0 2
6 1 0 2
10 0 0 3
18 1 1 1
17 0 1 2
19 1 1 2
19 0 1 3
23 1 2 1
24 0 2 2
27 1 2 2
26 0 2 3
28 1 3 1
27 0 3 2
29 1 3 2
31 0 3 3
I then filter the table and look for adjacient row pairs where row(i)=1 and
row(i+1)=0 :
table d:
[date] [status] [group] [boundary]
1 1 0 1
3 0 0 2
6 1 0 2
10 0 0 3
18 1 1 1
17 0 1 2
19 1 1 2
19 0 1 3
23 1 2 1
24 0 2 2
27 1 2 2
26 0 2 3
28 1 3 1
27 0 3 2
29 1 3 2
31 0 3 3
I delete those pairs where (date(i) < date(i+1) and boundary(i) >
boundary(i+1) to get:
table d:
[date] [status] [group] [boundary]
1 1 0 1
3 0 0 2
6 1 0 2
10 0 0 3
19 1 1 2
19 0 1 3
23 1 2 1
24 0 2 2
29 1 3 2
31 0 3 3
This then gives me the resultant table im looking for:
table d recast:
[start] [finish]
1 3
6 10
19 19
23 24
29 31
Is my approach sound? I am yet to encode the above in SQL. Maybe there is a
faster approach to getting this result?
Any help most appreciated!
cheers, peter"peter walker" <pwalker@.nospam.com> wrote in message
news:ucoD$XLGGHA.3056@.TK2MSFTNGP09.phx.gbl...
> Hi everyone, i was hoping someone could help me.
> i am looking to implement a MINUS operation between two tables which
> contain date ranges.
> table a:
> [start] [finish] [group]
> 1 10 0
> 18 19 1
> 23 26 2
> 28 31 3
>
> table b:
> [start] [finish]
> 4 5
> 18 18
> 25 28
> Result of table a - table b:
> [start] [finish]
> 1 3
> 6 10
> 19 19
> 23 24
> 29 31
> Would anyone know how to implement this? I have done this by firstly
Read my signature to see how much information you left out.
Let's assume your tables look like this:
CREATE TABLE a (start INTEGER NOT NULL PRIMARY KEY /* ' Was not specified
*/, finish INTEGER NOT NULL, CHECK (start<=finish) /* ? Was not specified
*/, grp INTEGER NOT NULL /* "GROUP" is a reserved word - not a good column
name */);
CREATE TABLE b (start INTEGER NOT NULL PRIMARY KEY /* ' */, finish INTEGER
NOT NULL, CHECK (start<=finish));
Your sample data:
INSERT INTO a (start,finish,grp)
SELECT 1, 10, 0 UNION ALL
SELECT 18, 19, 1 UNION ALL
SELECT 23, 26, 2 UNION ALL
SELECT 28, 31, 3;
INSERT INTO b (start, finish)
SELECT 4, 5 UNION ALL
SELECT 18, 18 UNION ALL
SELECT 25, 28;
I'll also assume you have a table of numbers - all integers from 0 to some
arbitrarily large number. One way to get the missing numbers would be like
this:
SELECT n.num
FROM numbers AS n
LEFT JOIN b
ON n.num BETWEEN b.start AND b.finish
WHERE b.start IS NULL
AND n.num BETWEEN
(SELECT MIN(start)
FROM a) AND
(SELECT MAX(finish)
FROM a);
I don't quite understand the significance of the "group" column here. Please
give us a better spec if you need more help.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||Hi, thanks for the post. Im the original poster of this topic (at work
using a friend's account).
Please disregard the [group] column in my initial description of the
problem. I used a [group] column within my algorithm. In effect I would
like to know the best way to implement a subtraction of date ranges as
follows:
table a:
[start] [finish]
1 10
18 19
23 26
28 31
table b:
[start] [finish]
4 5
18 18
25 28
Result of table a - table b:
[start] [finish]
1 3
6 10
19 19
23 24
29 31
in effect, if a date range in table_b intersects a date range in
table_a, then that data in the intersection is removed from the date
range in table_a
for example, the following are example cases where we subtract from a
range in table_a where there are ranges in table_b which intersect with
that range in table_a:
1. {23....37} - {25...29} = {23...24}
2. {23....37} - {26...32} = {23...25}, {33...37}
3. {23....37} - [ {25...27} , {31...33} ] = {23...24}, {28...30},
{34...37}
Any help on this would be great!
Many thanks.
peter|||Peter,
this is very ugly :)
but the SELECT statement should work
i use a table of natural numbers proposed by David
David, pardon me for the use of your idea :)
SELECT MIN(U.seq) as start, MAX(U.seq) as finish
FROM (SELECT G.seq, G.seq - COUNT(*)
FROM (SELECT GS.seq
FROM TableA AS GA, Sequence AS GS
WHERE GS.seq BETWEEN GA.start AND GA.finish
AND NOT EXISTS(SELECT *
FROM TableB AS GB, Sequence AS GSS
WHERE GSS.seq BETWEEN GB.start AND
GB.finish
AND GSS.seq = GS.seq)) AS G,
(SELECT LS.seq
FROM TableA AS LA, Sequence AS LS
WHERE LS.seq BETWEEN LA.start AND LA.finish
AND NOT EXISTS(SELECT *
FROM TableB AS LB, Sequence AS LSS
WHERE LSS.seq BETWEEN LB.start AND
LB.finish
AND LSS.seq = LS.seq)) AS L
WHERE L.seq <= G.seq
GROUP BY G.seq) AS U(seq, gb)
GROUP BY U.gb;
Andrey Odegov
avodeGOV@.yandex.ru
(remove GOV to respond)|||On Sat, 14 Jan 2006 12:55:36 +1000, peter walker wrote:

>Hi everyone, i was hoping someone could help me.
(snip)
Hi Peter,
I just posted a reply to your first thread about this issue.
Hugo Kornelis, SQL Server MVP|||Hello There,
I hope this might solve your problem.
Create Table TableA
(
[start] int,
[finish] int,
[group] int
)
Go
Insert into TableA
Select 1 ,10 ,0
Union All
Select 18, 19 ,1
Union All
Select 23, 26 ,2
Union All
Select 28 ,31 ,3
Go
Create Table TableB
(
[start] int,
[finish] int
)
Go
Insert into TableB
Select 4 ,5
Union All
Select 18 ,18
Union All
Select 25 ,28
Go
Create View vwTmpData
As
Select * From (
Select 1 N
Union All
Select 2
Union All
Select 3
Union All
Select 4
Union All
Select 5
Union All
Select 6
Union All
Select 7
Union All
Select 8
Union All
Select 9
Union All
Select 10
Union All
Select 11
Union All
Select 12
Union All
Select 13
Union All
Select 14
Union All
Select 15
Union All
Select 16
Union All
Select 17
Union All
Select 18
Union All
Select 19
Union All
Select 20
Union All
Select 21
Union All
Select 22
Union All
Select 23
Union All
Select 24
Union All
Select 25
Union All
Select 26
Union All
Select 27
Union All
Select 28
Union All
Select 29
Union All
Select 30
Union All
Select 31
Union All
Select 32
Union All
Select 33
) Seq Inner Join TableA T1 On N Between T1.start and T1.finish
Where N Not In (Select Start From tableB Union Select Finish From
TableB)
Go
Select identity(int,1,1) N1,* into tmpData From vwTmpData
Update tmpData Set [group] = [group] + 1
Where N - N1> 0
Select Min(N) Start,Max(N) Finish From tmpData Group by [group]
Drop Table tmpData
Drop View vwTmpData
Drop Table TableA
Drop Table TableB
With Warm regards
Jatinder Singh

No comments:

Post a Comment