Showing posts with label algorithm. Show all posts
Showing posts with label algorithm. Show all posts

Monday, March 26, 2012

MINUS operation between table date ranges, is my algorithm sound?

Hi everyone, i was hoping someone could help me.
i am looking to implement a MINUS operation between two tables which contain
date ranges.
table a:
[start] [finish] [group]
1 10 0
18 19 1
23 26 2
28 31 3
table b:
[start] [finish]
4 5
18 18
25 28
Result of table a - table b:
[start] [finish]
1 3
6 10
19 19
23 24
29 31
Would anyone know how to implement this? I have done this by firstly
performing an AND operation between the two tables:
table a AND table b = table c (here the grouping colums tells me which group
from table a the date range is associated with
[start] [finish] [group]
4 5 0
18 18 1
25 26 2
28 28 3
Then I throw table a and c into another table (d). Here I assign:
start date of table a: status = 1
end date of table a: status = 0
start date of table b: status = 0
end date of table b: status = 1
start of date range in table a: boundary = 0
end of date range for table a: boundary = 2
boundary = 1 for all data coming from table b
table d:
[date] [status] [group] [boundary]
1 1 0 1
10 0 0 3
18 1 1 1
19 0 1 3
23 1 2 1
26 0 2 3
28 1 3 1
31 0 3 3
4 0 0 2
5 1 0 2
18 0 1 2
18 1 1 2
25 0 2 2
26 1 2 2
28 0 3 2
28 1 3 2
I then sort using the SQL statement: order by 3,1,2 to get:
table d:
[date] [status] [group] [boundary]
1 1 0 1
4 0 0 2
5 1 0 2
10 0 0 3
18 1 1 1
18 0 1 2
18 1 1 2
19 0 1 3
23 1 2 1
25 0 2 2
26 1 2 2
26 0 2 3
28 1 3 1
28 0 3 2
28 1 3 2
31 0 3 3
I then update the table according to:
case
when boundary = 2 and status = 0 then data = date -1
when boundary = 2 and status = 1 then data = date +1
end
and I get:
table d:
[date] [status] [group] [boundary]
1 1 0 1
3 0 0 2
6 1 0 2
10 0 0 3
18 1 1 1
17 0 1 2
19 1 1 2
19 0 1 3
23 1 2 1
24 0 2 2
27 1 2 2
26 0 2 3
28 1 3 1
27 0 3 2
29 1 3 2
31 0 3 3
I then filter the table and look for adjacient row pairs where row(i)=1 and
row(i+1)=0 :
table d:
[date] [status] [group] [boundary]
1 1 0 1
3 0 0 2
6 1 0 2
10 0 0 3
18 1 1 1
17 0 1 2
19 1 1 2
19 0 1 3
23 1 2 1
24 0 2 2
27 1 2 2
26 0 2 3
28 1 3 1
27 0 3 2
29 1 3 2
31 0 3 3
I delete those pairs where (date(i) < date(i+1) and boundary(i) >
boundary(i+1) to get:
table d:
[date] [status] [group] [boundary]
1 1 0 1
3 0 0 2
6 1 0 2
10 0 0 3
19 1 1 2
19 0 1 3
23 1 2 1
24 0 2 2
29 1 3 2
31 0 3 3
This then gives me the resultant table im looking for:
table d recast:
[start] [finish]
1 3
6 10
19 19
23 24
29 31
Is my approach sound? I am yet to encode the above in SQL. Maybe there is a
faster approach to getting this result?
Any help most appreciated!
cheers, peter"peter walker" <pwalker@.nospam.com> wrote in message
news:ucoD$XLGGHA.3056@.TK2MSFTNGP09.phx.gbl...
> Hi everyone, i was hoping someone could help me.
> i am looking to implement a MINUS operation between two tables which
> contain date ranges.
> table a:
> [start] [finish] [group]
> 1 10 0
> 18 19 1
> 23 26 2
> 28 31 3
>
> table b:
> [start] [finish]
> 4 5
> 18 18
> 25 28
> Result of table a - table b:
> [start] [finish]
> 1 3
> 6 10
> 19 19
> 23 24
> 29 31
> Would anyone know how to implement this? I have done this by firstly
Read my signature to see how much information you left out.
Let's assume your tables look like this:
CREATE TABLE a (start INTEGER NOT NULL PRIMARY KEY /* ' Was not specified
*/, finish INTEGER NOT NULL, CHECK (start<=finish) /* ? Was not specified
*/, grp INTEGER NOT NULL /* "GROUP" is a reserved word - not a good column
name */);
CREATE TABLE b (start INTEGER NOT NULL PRIMARY KEY /* ' */, finish INTEGER
NOT NULL, CHECK (start<=finish));
Your sample data:
INSERT INTO a (start,finish,grp)
SELECT 1, 10, 0 UNION ALL
SELECT 18, 19, 1 UNION ALL
SELECT 23, 26, 2 UNION ALL
SELECT 28, 31, 3;
INSERT INTO b (start, finish)
SELECT 4, 5 UNION ALL
SELECT 18, 18 UNION ALL
SELECT 25, 28;
I'll also assume you have a table of numbers - all integers from 0 to some
arbitrarily large number. One way to get the missing numbers would be like
this:
SELECT n.num
FROM numbers AS n
LEFT JOIN b
ON n.num BETWEEN b.start AND b.finish
WHERE b.start IS NULL
AND n.num BETWEEN
(SELECT MIN(start)
FROM a) AND
(SELECT MAX(finish)
FROM a);
I don't quite understand the significance of the "group" column here. Please
give us a better spec if you need more help.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||Hi, thanks for the post. Im the original poster of this topic (at work
using a friend's account).
Please disregard the [group] column in my initial description of the
problem. I used a [group] column within my algorithm. In effect I would
like to know the best way to implement a subtraction of date ranges as
follows:
table a:
[start] [finish]
1 10
18 19
23 26
28 31
table b:
[start] [finish]
4 5
18 18
25 28
Result of table a - table b:
[start] [finish]
1 3
6 10
19 19
23 24
29 31
in effect, if a date range in table_b intersects a date range in
table_a, then that data in the intersection is removed from the date
range in table_a
for example, the following are example cases where we subtract from a
range in table_a where there are ranges in table_b which intersect with
that range in table_a:
1. {23....37} - {25...29} = {23...24}
2. {23....37} - {26...32} = {23...25}, {33...37}
3. {23....37} - [ {25...27} , {31...33} ] = {23...24}, {28...30},
{34...37}
Any help on this would be great!
Many thanks.
peter|||Peter,
this is very ugly :)
but the SELECT statement should work
i use a table of natural numbers proposed by David
David, pardon me for the use of your idea :)
SELECT MIN(U.seq) as start, MAX(U.seq) as finish
FROM (SELECT G.seq, G.seq - COUNT(*)
FROM (SELECT GS.seq
FROM TableA AS GA, Sequence AS GS
WHERE GS.seq BETWEEN GA.start AND GA.finish
AND NOT EXISTS(SELECT *
FROM TableB AS GB, Sequence AS GSS
WHERE GSS.seq BETWEEN GB.start AND
GB.finish
AND GSS.seq = GS.seq)) AS G,
(SELECT LS.seq
FROM TableA AS LA, Sequence AS LS
WHERE LS.seq BETWEEN LA.start AND LA.finish
AND NOT EXISTS(SELECT *
FROM TableB AS LB, Sequence AS LSS
WHERE LSS.seq BETWEEN LB.start AND
LB.finish
AND LSS.seq = LS.seq)) AS L
WHERE L.seq <= G.seq
GROUP BY G.seq) AS U(seq, gb)
GROUP BY U.gb;
Andrey Odegov
avodeGOV@.yandex.ru
(remove GOV to respond)|||On Sat, 14 Jan 2006 12:55:36 +1000, peter walker wrote:

>Hi everyone, i was hoping someone could help me.
(snip)
Hi Peter,
I just posted a reply to your first thread about this issue.
Hugo Kornelis, SQL Server MVP|||Hello There,
I hope this might solve your problem.
Create Table TableA
(
[start] int,
[finish] int,
[group] int
)
Go
Insert into TableA
Select 1 ,10 ,0
Union All
Select 18, 19 ,1
Union All
Select 23, 26 ,2
Union All
Select 28 ,31 ,3
Go
Create Table TableB
(
[start] int,
[finish] int
)
Go
Insert into TableB
Select 4 ,5
Union All
Select 18 ,18
Union All
Select 25 ,28
Go
Create View vwTmpData
As
Select * From (
Select 1 N
Union All
Select 2
Union All
Select 3
Union All
Select 4
Union All
Select 5
Union All
Select 6
Union All
Select 7
Union All
Select 8
Union All
Select 9
Union All
Select 10
Union All
Select 11
Union All
Select 12
Union All
Select 13
Union All
Select 14
Union All
Select 15
Union All
Select 16
Union All
Select 17
Union All
Select 18
Union All
Select 19
Union All
Select 20
Union All
Select 21
Union All
Select 22
Union All
Select 23
Union All
Select 24
Union All
Select 25
Union All
Select 26
Union All
Select 27
Union All
Select 28
Union All
Select 29
Union All
Select 30
Union All
Select 31
Union All
Select 32
Union All
Select 33
) Seq Inner Join TableA T1 On N Between T1.start and T1.finish
Where N Not In (Select Start From tableB Union Select Finish From
TableB)
Go
Select identity(int,1,1) N1,* into tmpData From vwTmpData
Update tmpData Set [group] = [group] + 1
Where N - N1> 0
Select Min(N) Start,Max(N) Finish From tmpData Group by [group]
Drop Table tmpData
Drop View vwTmpData
Drop Table TableA
Drop Table TableB
With Warm regards
Jatinder Singh

Friday, March 23, 2012

Mining Model Prediction.

Mining Model Prediction... what for?

Which Data Mining algorithm use Mining Model Prediction?

Every algorithm has to use Mining Model Prediction for a final goal of a Data Mining Project?

In a very general sense...

Mining Model Prediction is typically the part of the UI which allows querying a mining model via DMX. What the model does depends on the algorithm, this part of the UI allows authoring DMX queries specific to that model.

Now, a mining model is not required to make predictions to be useful. Examples;

- algorithms may be used just to explore data, not to make predictions. Think the dependency net in naive bayes or Decision trees

- algorithms may assign unpredictive labels (e.g Clustering)

- algorithms may actually transform data for use in a different model (think of a decision tree model using as input the cluster label from a different model -- the cluster modell effectively transforms the input for decision trees)

|||

In case of Associantion Rules what should I show to final user? The same answer that Mining Model View gives or it′s necessary to make a Mining Model Prediction to a final answer?

If yes, it′s necessary to make a Mining Model Prediction to a final answer, how I do that?

|||

In general prediction isn't always the final outcome of a data mining models. Some models are only used for descriptive analysis. All of the algorithms in Analysis Services can be used for both predictive and descriptive analysis so it really is up to the application you require for your business problem as to how you want to use this.

If you want to get a predictive result from a model, you should always do a prediction and not try to impute the results from the algorithm content. The way to do so is to use a DMX query with a prediction join. To provide further details, I would need more information on what you were trying to accomplish.

|||Thanks

Mining Model Parameter

I have one main mining structure and several mining models that are using the MS Decision Trees algorithm. I have a Reporting Services report that displays the model using a table. I want to set it up so that in the DMX query, I can use a parameter to specify which model to use. That way I just have one report that can reference multiple models. For example I would like to do something like this:

SELECT FLATTENED * FROM [@.model].CONTENT

This doesn't work because I get the error message:

Preparation of DMX query failed.
Error (Data mining): Either the user, user, does not have permission to access the referenced mining model, @.model, or the object does not exist. (Microsoft SQL Server 2005 Analysis Services)

We are running SSAS 2005 SP2. Is there a way to accomplish this?

DMX does not support parameterization of the name of the object being queried.
One of the reasons is that a statement, once prepared, is supposed to have a well defined response schema (column definition) and this cannot be enforced as the value of the parameter changes. For example, the query above would return various results depending on the model being queried.

That aside, you should be able to construct the query by concatenating the DMX fragments with the report parameter, something like
"SELECT FLATTENED * FROM [" + @.model + "].CONTENT"|||

Thanks for your response. I actually tried that and it didn't seem to work. But I did figure out a way to do it. I created a stored procedure that looks like this:

Code Snippet

AS

BEGIN

DECLARE @.OPENQUERY nvarchar(4000), @.TSQL nvarchar(4000), @.LinkedServer nvarchar(4000)

SET @.LinkedServer = 'DMSERVER'

SET @.OPENQUERY = 'SELECT * FROM OPENQUERY('+ @.LinkedServer + ','''

SET @.TSQL = 'SELECT FLATTENED * FROM ['+@.miningModel+'].CONTENT'')'

EXEC (@.OPENQUERY+@.TSQL)

END

The linked server connects to the instance of SSAS 2005 and then I just call the stored procedure from reporting services and pass in the miningModel variable. I had to set it up this way becase OPENQUERY doesn't accept variables for its arguements. I found this article that explains how to get around it. You'll see that I just copied what they did. It might be a little bit of a hack but it works!