Migrating to SQL 2005: mining

Showing posts with label mining. Show all posts

Monday, March 26, 2012

Mining XML from file

Right now, when i want to load contents of XLM file, i have to use openXML which looks this way:

declare @.idoc int
declare @.doc varchar(max)
set @.doc='<?xml version="1.0" encoding="windows-1250"?>
<root>
<tag>
<content1>xxccxcx</content1>
.
.
.
<contentXY>sf</contentXY>
</tag>
</root>
exec sp_xml_preparedocument @.idoc output, @.doc
select *
from openxml (@.idoc, '/root/tag',3 )
( and here is whole select procedure)
insert into (table name)
from openxml (@.idoc, '/root/tag,3 )
(and here it is again)

This is quite "useless" for daily using, unless someone fills the text field with new data.

Is it possible to load content directly from XML file located somewhere on the drive?Is there any other way to open XML file an load its contents into table?|||

What "text field" are you talking about?

Have you looked at the SQL Server Integration Services?

|||Whole content is loaded as text. It has "something" to do with this part:

If i wanted an visual studio solution, which i already have, i would put it in visual studio forum.

But still integration package is another file, artificially created. I dont need those.|||You could use bulk load:
1. All data to one field
insert into your_table
select * from openrowset( BULK 'c:\yourfile.txt' SINGLE_BLOB)
2. Split data to few rows
insert into your_table

select * from openrowset( BULK 'c:\yourfile.txt' FORMATFILE='C:\bcpformat.xml')

More info: http://msdn2.microsoft.com/en-us/library/ms190312.aspx

Also you could load data into xml column and use xml.nodes function insted of openxml for split xml-data to rows/filed|||

I have the same issue as errpop although i am running SQL 2005 std 1 CPU SP1 on Win 2000 server SP4 (both fully patched clean installs). The errors i am getting in the sql server logs are below and occur immediately after the sql server service starts and the number of cpu's are detected and written to the sql logs.

Error: 8313, Severity: 16, State: 1.

Error in mapping SQL Server performance object/counter indexes to object/counter names. SQL Server performance counters are disabled.

Error: 3409, Severity: 16, State: 1.

Performance counter shared memory setup failed with error -1. Reinstall sqlctr.ini for this instance, and ensure that the instance login account has correct registry permissions.

I have followed the instructions in KB300956 and nothing changed. I noticed executing lodctr sqlctr.ini runs but i dont get any messages on screen. It is like it is not working. The reason i am so interested in the performance counters is i am trying to troubleshoot a local HP SureStore DLT1 tape drive (14 hours to backup 70GB) and i need the SQL backup device counter. I dont know if this is related or not?

Any help is greatly appreciated as this is entirely new territory to me.

Mining XML from file

Right now, when i want to load contents of XLM file, i have to use openXML which looks this way:

This is quite "useless" for daily using, unless someone fills the text field with new data.

Is it possible to load content directly from XML file located somewhere on the drive?Is there any other way to open XML file an load its contents into table?|||

What "text field" are you talking about?

Have you looked at the SQL Server Integration Services?

|||Whole content is loaded as text. It has "something" to do with this part:

If i wanted an visual studio solution, which i already have, i would put it in visual studio forum.

select * from openrowset( BULK 'c:\yourfile.txt' FORMATFILE='C:\bcpformat.xml')

More info: http://msdn2.microsoft.com/en-us/library/ms190312.aspx

Also you could load data into xml column and use xml.nodes function insted of openxml for split xml-data to rows/filed|||

Error: 8313, Severity: 16, State: 1.

Error in mapping SQL Server performance object/counter indexes to object/counter names. SQL Server performance counters are disabled.

Error: 3409, Severity: 16, State: 1.

Performance counter shared memory setup failed with error -1. Reinstall sqlctr.ini for this instance, and ensure that the instance login account has correct registry permissions.

Any help is greatly appreciated as this is entirely new territory to me.

Mining Structures

I am using the wizard to create a new mining structure and getting the error at the time of selecting the datamining technique like either microsoft decision tree or microsoft timeseries etc.,

"Unable to retrieve a list of supported data mining algorithms. Make sure you are connected to the correct Analysis Services server instance and the Analysis Services server named localhost is running and configured properly. You can continue with a default list of data mining algorithms."

What is that means?

Thanks in advance,

Got it. I need to start the AS from Control Panel/Administrative tools etc etc., before doning this.

I think the default is Manual to start the service.

|||Right-click on the Analysis Service project and select Properties - under Configuration Properties->Deployment, check what you have under Target->Server. The value of the Server property should be the name of the Analysis Services instance that you want the project to connect/deploy to. When you try to select a data mining technique, we try to connect to the Target Server to fetch the list of supported techniques. If you have "localhost" in this property, check Control Panel->Administrative Services->Services to confirm that the "SQL Server Analysis Services" service is running. If you specified a named instance (servername\instancename), check that the corresponding instance (SQL Server Analysis Services (instancename)" as well as the "SQL Server Browser" service is running.sql

Mining structure Processing Event Notification

Hello,

Can anyone spot what i am missing here ? The problem is that i am getting a null object for e.TextData in the t_OnEvent(object sender, TraceEventArgs e) function below. I am trying to get event- notifications while processing the data mining structure.

thanks

anil

//-code below-

using Microsoft.AnalysisServices;

private void ProcessMiningStructure(MiningStructure mStruct)
{
Trace t;
TraceEvent e;

t = server.Traces.Add();
e = t.Events.Add(TraceEventClass.ProgressReportCurrent);
e.Columns.Add(TraceColumn.TextData);

t.OnEvent += new TraceEventHandler(t_OnEvent);
t.Update();
try
{
t.Start();
mStruct.Process(ProcessType.ProcessFull);
t.Stop();
}
catch (Exception ex)
{

}
t.Drop();

}
//

void t_OnEvent(object sender, TraceEventArgs e)
{
SetText(e.TextData);
}

Hey Anil,

Your code is almost correct, the only part missing is the column definition for event's columns. See below:

using System;

using Microsoft.AnalysisServices;

namespace ConsoleApplication1

{

class Program

{

static void Main(string[] args)

{

Server srv = new Server();

srv.Connect("localhost");

Trace t = srv.Traces.Add();

TraceEvent ev = t.Events.Add(TraceEventClass.ProgressReportCurrent);

ev.Columns.Add(TraceColumn.ObjectID);

ev.Columns.Add(TraceColumn.TextData);

ev.Columns.Add(TraceColumn.EventClass);

ev.Columns.Add(TraceColumn.EventSubclass);

t.OnEvent += new TraceEventHandler(Progress_Event_Report);

t.Update();

t.Start();

srv.Databases["Adventure Works DW"].MiningStructures["Market Basket"].MiningModels["Market Basket"].Process(ProcessType.ProcessFull);

t.Stop();

t.Drop();

srv.Disconnect();

}

static void Progress_Event_Report(object sender, TraceEventArgs e_args)

{

Console.WriteLine("{0}\t{1}\t{2}\t{3}", e_args.ObjectID, e_args.EventClass, e_args.EventSubclass, e_args.TextData);

}

Hope this helps,

Raymond

|||

Raymond

This helped fix the issue

thanks for your response

-anil

Mining structure Processing Event Notification

Hello,

thanks

anil

//-code below-

using Microsoft.AnalysisServices;

private void ProcessMiningStructure(MiningStructure mStruct)
{
Trace t;
TraceEvent e;

t = server.Traces.Add();
e = t.Events.Add(TraceEventClass.ProgressReportCurrent);
e.Columns.Add(TraceColumn.TextData);

t.OnEvent += new TraceEventHandler(t_OnEvent);
t.Update();
try
{
t.Start();
mStruct.Process(ProcessType.ProcessFull);
t.Stop();
}
catch (Exception ex)
{

}
t.Drop();

}
//

void t_OnEvent(object sender, TraceEventArgs e)
{
SetText(e.TextData);
}

Hey Anil,

Your code is almost correct, the only part missing is the column definition for event's columns. See below:

using System;

using Microsoft.AnalysisServices;

namespace ConsoleApplication1

{

class Program

{

static void Main(string[] args)

{

Server srv = new Server();

srv.Connect("localhost");

Trace t = srv.Traces.Add();

TraceEvent ev = t.Events.Add(TraceEventClass.ProgressReportCurrent);

ev.Columns.Add(TraceColumn.ObjectID);

ev.Columns.Add(TraceColumn.TextData);

ev.Columns.Add(TraceColumn.EventClass);

ev.Columns.Add(TraceColumn.EventSubclass);

t.OnEvent += new TraceEventHandler(Progress_Event_Report);

t.Update();

t.Start();

srv.Databases["Adventure Works DW"].MiningStructures["Market Basket"].MiningModels["Market Basket"].Process(ProcessType.ProcessFull);

t.Stop();

t.Drop();

srv.Disconnect();

}

static void Progress_Event_Report(object sender, TraceEventArgs e_args)

{

Console.WriteLine("{0}\t{1}\t{2}\t{3}", e_args.ObjectID, e_args.EventClass, e_args.EventSubclass, e_args.TextData);

}

Hope this helps,

Raymond

|||

Raymond

This helped fix the issue

thanks for your response

-anil

Mining structure Processing Event Notification

Hello,

thanks

anil

//-code below-

using Microsoft.AnalysisServices;

private void ProcessMiningStructure(MiningStructure mStruct)
{
Trace t;
TraceEvent e;

t = server.Traces.Add();
e = t.Events.Add(TraceEventClass.ProgressReportCurrent);
e.Columns.Add(TraceColumn.TextData);

t.OnEvent += new TraceEventHandler(t_OnEvent);
t.Update();
try
{
t.Start();
mStruct.Process(ProcessType.ProcessFull);
t.Stop();
}
catch (Exception ex)
{

}
t.Drop();

}
//

void t_OnEvent(object sender, TraceEventArgs e)
{
SetText(e.TextData);
}

Hey Anil,

Your code is almost correct, the only part missing is the column definition for event's columns. See below:

using System;

using Microsoft.AnalysisServices;

namespace ConsoleApplication1

{

class Program

{

static void Main(string[] args)

{

Server srv = new Server();

srv.Connect("localhost");

Trace t = srv.Traces.Add();

TraceEvent ev = t.Events.Add(TraceEventClass.ProgressReportCurrent);

ev.Columns.Add(TraceColumn.ObjectID);

ev.Columns.Add(TraceColumn.TextData);

ev.Columns.Add(TraceColumn.EventClass);

ev.Columns.Add(TraceColumn.EventSubclass);

t.OnEvent += new TraceEventHandler(Progress_Event_Report);

t.Update();

t.Start();

srv.Databases["Adventure Works DW"].MiningStructures["Market Basket"].MiningModels["Market Basket"].Process(ProcessType.ProcessFull);

t.Stop();

t.Drop();

srv.Disconnect();

}

static void Progress_Event_Report(object sender, TraceEventArgs e_args)

{

Console.WriteLine("{0}\t{1}\t{2}\t{3}", e_args.ObjectID, e_args.EventClass, e_args.EventSubclass, e_args.TextData);

}

Hope this helps,

Raymond

|||

Raymond

This helped fix the issue

thanks for your response

-anil

Mining structure has to contain at least 2 mining models to see their accuracy charts?

Hi ,all here,

Thank you very much for your kind attention.

I just found that I am not able to view the accuracy chart for my mining model. The error message is: no mining models are selected for comparision. Which is quite strange.

Any guidance? thank you very much.

With best regards,

Yours sincerely,

Can you provide more details? The only scenario I can think of where this would happen is where the first model was a clustering model that had no predictable attributes.|||

Hi, Jamie,

Yes, my mining model is clustering model. So as you mentioned, seeing the accuracy chart of clustering model is restricted in there? If so, what's the resolution? Thank you very much for your further guidance.

With best regards,

Yours sincerely,

|||

The accuracy chart is only useful if your model has a predictable attribute (column), and the accuracy of the prediction can be evaluated against some test data. If your clustering model does not have a predictable attribute, the accuracy chart cannot be used.

|||

Hi, Bogdan, thank you very much for your guidance.

So in what way can we verify if the clustering model is good enough or not?

With best regards,

Yours sincerely,

Friday, March 23, 2012

Mining Query Builder

On pages 123, 124 of Data Mining with SQL Server 2005, I can't finish the example. When I add 'Monthly' to the end of the expression "[Home Owners].[Theater Freq]" in the Criteria/Argument column, I usually get an error message saying the syntax is incorrect when I switch to the Results view. If I place a dot after the original expression, like this: [Home Owners].[Theater Freq].Monthly, I get an error message that "The dot expression is not allowed in the context at line 4, column 23. Use sub-SELECT instead." What am I doing wrong?

Thanks!

Try formating your query as

SELECT t.CustomerID, [Movie Trees].[Theater Freq],
PredictProbability([Movie Trees].[Theater Freq], 'Monthly') as ProbMonthly FROM ...

This returns the probability for the state 'Monthly' of the [Movie Trees].[Theater Freq]

sql

Mining Model?

Hi,

I'm working on a project to create a mining model.I have one "flattened" table with a productID (key) and related attributes. Example columns are weight, color, price, units sold last year, product category, product rating (top seller, etc) and similar columns, about 40 in total. There are about 15,000 products and same number of rows in the table.

The objective is to pass a ProductID (that also exists in the table) and get back the top N products that most resemble the source product based on all the attributes.

Any advice on which models I should test and how to set up the models would be much appreciated. Also if there is a similar example/sample out there, please let me know. I downloaded the MovieClick sample, but that doesn't work in my case as I only have the equivalent of the Customers table.

Thanks in Advance.

This is actually a K-Nearest Neighbor problem, for which we don't ship an implementation for SQL Server Analysis Services (although someone may have written a plug in). I could imagine that you could use the fuzzy matching transform in Integration Services to do this, however. To do so you would just use your product table as the reference table, send in the fields of the product of interest, then sort the results by the match confidence.

HTH

-Jamie

|||

Thanks Jamie, I'll try fuzzy matching in SSIS. I'm also thinking of writing my own code to calculate based on KNN in TSQL. I'm not sure if the performance will be acceptable though especially as I need to pass in up to 5 products, evaluate based on their combined attributes and get a resultset back with top N matches.

Is there a third party software out there that can do this?

|||When using the fuzzy components in SSIS, do be aware that they are string matching components, evaluating the edit distance between two string. So, for example, "911" will be pretty much as close a match to "999" as "199" - so for any columns where you need to evaluate similarity of values you may need to use other components such as the Derived Column to perform the calculation.|||

Hi Donald,

Can you explain what you mean by Derived column. How would I make it work to calculate distances for numeric columns/attributes.

Thanks.

|||I don't know of a 3rd party KNN, but you would likely get better performance implementing your own algorithm using our plug-in interfaces in C#.|||

Jamie/Donald

Thanks for your responses. My C# is not that strong. For now I'll write something in SQL and if I can create a model that works, I'll convert that into a plug-in.

This is the plan. Normalize all attributes using mean and std dev. Apply the nearest neighbour algorithm using Euclidean distance. Also experiment with arbitrary weights for attributes or maybe use correlation to filter out some of the weaker attributes, not sure how that will work though at this point.

Any suggestions are welcome. Also, Is there a sample plug-in in C# out there I can use as template. Is there any way I can mold this table to use Association Rules or any other built-in algorithm.

Regards,

Asim.

|||

There are tutorials here for writing managed plug-in algorithms: http://www.sqlserverdatamining.com/DMCommunity/Tutorials/default.aspx

You may need to register (free!) to get access to the links.

mining model with time

i want create a model to predict what product i should Ads for customer at time.

example: in summer, i should show the , drinks (coca, pepsi) , ice food, bikini, sandals, glasses. in winter, i should show shoes, coats, hot food....

i have some table:

order(orderid, time,cusid...)

product(productid,..)

orderdetails(orderid, productid..)

customers(cuisid,....)

You can add the season as an input for the model, along with whatever other inputs you are interested in.

mining model viewer error

I am developing a multiple regression model for a super market site selection with 11 predictors.

As soon as I press the model viewer tab, the following message appears:

The server content appears to be out of date.
Would you like to build and deploy the project first?

I opt no and the error message pops up:

The tree graph cannot be created because of the following error:

'Either the user, ......., does not have access to the ....... database, or the database does not exist.'.

Could anyone mine out the cause please....

saji

Did you deploy the model at least once? The project (containing the data source, data source view, mining structure and model) should be deployed on the server for processing to happen. Once processing is complete, the viewer will work. It seems like the project was not deployed on the server.

Or, short answer, try to answer 'Yes' at the "Would you like to build and deploy first?" question, and let us know if it worked

sql

Mining Model predicts same set of products for all cases

I have developed a product

basket mining model as follows

DSV

SELECT C.CustomerId,C.CustomerIdName,P.ProductId,P.ProductIdName

FROM Customer INNER

JOIN CustomerProduct

ON C.OpportunityId

= P.OpportunityId

Mining Structure

CREATE MINING MODEL ProductBasket

(

CustomerId

TEXT KEY,

CustomerIdName

PREDICT,

ProductId

PREDICT,

ProductRecommend

TABLE PREDICT

(

ProductId TEXT KEY

ProductIdName PREDICTONLY

)

USING Microsoft_Association_Rules

Prediction Query

Since I want the output in the following format

Product

Items (nested Table)

Product

product B

Product C

Product

Product A

Product C

I have written the prediction query as follows

SELECT

t.[ProductId],

PredictAssociation([Association].[ ProductId],3)

From

[Association]

PREDICTION JOIN

OPENQUERY([Adventure Works Cycle MSCRM],

'SELECT DISTINCT

[ProductId]

FROM

(SELECT ProductId FROM ProductBase)

as [Product]

') AS t

[Association].[Product Id] = t.[ProductId]

The model is predicting the

same set of products for every case. Even changes in the algorithm parameter

value do not have any impact on the result.

What is the reason for this and

how can u rectify it?

If you're doing a simple recommendation model (i.e. a model that provides recommendations based on what's in the user's shopping basket, there is no need to have the product id at both case and nested level. Your modeling seems unneccessarily complex for this scenario - it also explains why you're seeing the same recommendations for all cases: based on the DSV you've listed, your training cases don't include the basket contents at all for each transaction.

If you use a model definition similar to the one I described in an earlier thread on this topic (http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=649759&SiteID=1), your query would look like this:

SELECT t.CustomerID, PredictAssociation(AssociationModel.Products, 3) FROM PM
PREDICTION JOIN
SHAPE {
OPENQUERY(ds, 'SELECT CustomerID FROM CustomerProducts ORDER BY CustomerID')
}
APPEND ( {
OPENQUERY(ds, 'SELECT CustomerID, ProductName FROM CustomerProducts ORDER BY CustomerID') }
RELATE CustomerID to CustomerID ) AS Products as t
ON AssociationModel.Products.ProductName = t.Products.ProductName

The core idea is that the input case (for prediction as well as training) is the set of products in the customer's basket and the prediction is the list of recommended products. Both the input and prediction map to the same nested table in the mining structure/model.

|||

The first query should be SELECT DISTINCT CustomerID ...

|||

Also, if you want simply a list of recommendations per product, and you first created a model based on customer baskets you could then create a query like this to get the complete list of product recommendations:

SELECT t.ProductName, PredictAssociation(Products, 3) FROM PM
PREDICTION JOIN
SHAPE {
OPENQUERY(ds, 'SELECT DISTINCT ProductName FROM CustomerProducts ORDER BY ProductName')
}
APPEND ( {
OPENQUERY(ds, 'SELECT DISTINCT ProductName FROM CustomerProducts ORDER BY ProductName') }
RELATE ProductName to ProductName) AS Products as t
ON PM.Products.ProductName = t.Products.ProductName

This works because you are specifying a basket containing a single item as input for the prediction on a model that was created on customer baskets. Using the productname on the top level of the input query for the prediction is just a shortcut since we know we want baskets of size 1 anyway. Doing an input query like this for the model training would cause no rules to be found since only baskets greater than size 1 imply any patterns.

Mining Model Prediction.

Mining Model Prediction... what for?

Which Data Mining algorithm use Mining Model Prediction?

Every algorithm has to use Mining Model Prediction for a final goal of a Data Mining Project?

In a very general sense...

Mining Model Prediction is typically the part of the UI which allows querying a mining model via DMX. What the model does depends on the algorithm, this part of the UI allows authoring DMX queries specific to that model.

Now, a mining model is not required to make predictions to be useful. Examples;

- algorithms may be used just to explore data, not to make predictions. Think the dependency net in naive bayes or Decision trees

- algorithms may assign unpredictive labels (e.g Clustering)

- algorithms may actually transform data for use in a different model (think of a decision tree model using as input the cluster label from a different model -- the cluster modell effectively transforms the input for decision trees)

|||

In case of Associantion Rules what should I show to final user? The same answer that Mining Model View gives or it′s necessary to make a Mining Model Prediction to a final answer?

If yes, it′s necessary to make a Mining Model Prediction to a final answer, how I do that?

|||

In general prediction isn't always the final outcome of a data mining models. Some models are only used for descriptive analysis. All of the algorithms in Analysis Services can be used for both predictive and descriptive analysis so it really is up to the application you require for your business problem as to how you want to use this.

If you want to get a predictive result from a model, you should always do a prediction and not try to impute the results from the algorithm content. The way to do so is to use a DMX query with a prediction join. To provide further details, I would need more information on what you were trying to accomplish.

|||Thanks

Mining Model Parameter

I have one main mining structure and several mining models that are using the MS Decision Trees algorithm. I have a Reporting Services report that displays the model using a table. I want to set it up so that in the DMX query, I can use a parameter to specify which model to use. That way I just have one report that can reference multiple models. For example I would like to do something like this:

SELECT FLATTENED * FROM [@.model].CONTENT

This doesn't work because I get the error message:

Preparation of DMX query failed.
Error (Data mining): Either the user, user, does not have permission to access the referenced mining model, @.model, or the object does not exist. (Microsoft SQL Server 2005 Analysis Services)

We are running SSAS 2005 SP2. Is there a way to accomplish this?

DMX does not support parameterization of the name of the object being queried.
One of the reasons is that a statement, once prepared, is supposed to have a well defined response schema (column definition) and this cannot be enforced as the value of the parameter changes. For example, the query above would return various results depending on the model being queried.

That aside, you should be able to construct the query by concatenating the DMX fragments with the report parameter, something like
"SELECT FLATTENED * FROM [" + @.model + "].CONTENT"|||

Thanks for your response. I actually tried that and it didn't seem to work. But I did figure out a way to do it. I created a stored procedure that looks like this:

Code Snippet

BEGIN

DECLARE @.OPENQUERY nvarchar(4000), @.TSQL nvarchar(4000), @.LinkedServer nvarchar(4000)

SET @.LinkedServer = 'DMSERVER'

SET @.OPENQUERY = 'SELECT * FROM OPENQUERY('+ @.LinkedServer + ','''

SET @.TSQL = 'SELECT FLATTENED * FROM ['+@.miningModel+'].CONTENT'')'

EXEC (@.OPENQUERY+@.TSQL)

END

The linked server connects to the instance of SSAS 2005 and then I just call the stored procedure from reporting services and pass in the miningModel variable. I had to set it up this way becase OPENQUERY doesn't accept variables for its arguements. I found this article that explains how to get around it. You'll see that I just copied what they did. It might be a little bit of a hack but it works!

Mining cubes vs relational

I've not found much guidance so far about the pros and cons of mining OLAP cubes vs their underlying fact and dimension tables. Can anyone offer advice in this area, or point me to more info about this?

Thanks,
Kevin

Generally I recommend mining OLAP cubes when you are mining results that OLAP is good at producing, otherwise you should mine relational tables. OLAP is not particularly good at returning large amounts of detail data. However, if you were mining stores, for example, and your inputs were total sales of products for particular months/categories, or even sales acceleration of such by store, those measures are very difficult and slow to obtain through relational means, and I would recommend OLAP.

Another issue with OLAP MM's is that there really isn't any way to split data into training and validation sets. If this is a requirement, and you need to use the aggregation/calculation facilities of OLAP, you can get the OLAP results into a table, split into training and validation sets, and then mine the resultant relational table.

sql

Mining Content Viewer for Linear Regression: Node Distribution output

With the number of threads it is difficult to know if this has been posted. If I use the Mining Content Viewer for Linear Regression, under Node Distribution, there are values given for Attribute Name, Attribute Value, Support, Probability, Variance, and Value Type. The output is similar to what Joris supplied in his thread about Predict Probability in Decision Trees. My questions:

1. How should these fields be interpreted?

2. With Linear Regression, is it possible to get the coefficient values and tests of significance (t-tests?), if they are not part of the output I have pointed to?

Thanks for your help with this?

Sam

The interpretation of the NODE_DISTRIBUTION rows depends mainly on the VALUE TYPE column.

To exemplify the values, here is the distribution of one node from applying regression to the Iris data set. The target is PetalWidth, with SepalLength, SepalWidth and PetalLength as regressors:

- Two rows of the distribution describe the target continuous attribute. They can be recognized by their value type. The row having value type 1 (Missing) represents the statistics for the Missing state of the target attribute in the current node, while the row having value type 3 (Continuous) represents the statistics for the Existing state of the target attribute. If you do not have gaps in your data, than you can ignore the row with ValueType = 1. For the row with value type 3, ATTRIBUTE_NAME is the name of the target attribute (PetalWidth in my example), ATTRIBUTE_VALUE is the mean of the PetalWidth. You also get the support and variance. Support is the number of training casese in this node, Mean and Variance are computed only over the traiing cases that ended up in this node

- For each regressor, there are 3 distribution rows, having the valuetype, respectively: 7(coefficient), 8(Score gain), 9(Statistics). For all these 3 rows, ATTRIBUTE_NAME is the name of the regressor. Then:

for the row with Value Type 7 (Coefficient), ATTRIBUTE_VALUE is the regression coefficient associated with the regressor ('a' in y=ax+b).

mining association rules using SQL

Hello everybody,

I am interested in mining association rules field

I have found some posted statements but I didn't undertand them. I need your help badly.

Just tell from where should I start?

Thanks alot

There is a tutorial using association rules for market basket analysis at http://msdn2.microsoft.com/en-us/library/ms345326.aspx

This is a good place to start.

Mining Approach?

I have a scenario in mind and was wondering if anyone had any suggestions on an approach.

Lets say I have a dataset where I have captured various attributes about blade servers in a rack. I have internal temperature, fan speed, disk reads, cpu temperature and failure events. Each of these are continuous variables except for the failure event, which has only two states, 'true' or 'false' which indicates whether a failure occurred at that point in time.

The table looks like this:

MachineID Timestamp Temperature FanSpeed Reads CPUTemp Failure

At the time of failure, only the MachineID, Timestamp and Failure values will be populated in the table. The failures at this time are not categorized or discriminated from each other in any way although they will be in the future. We're looking to use a mining algorithm to determine which variable is the best predictor of failure, or combination of variables. What do you think is the right approach? How might this approach change once the failures do become categorized and differentiated?

Thanks,

Frank

This is an interesting problem and I would do some data preperation beforehand to solve this one.

Basically the problem comes to defining your "case". In this problem I would define a case as a "Machine/Time Period". The next problem is what do you want to predict? For example, is predicting failure at the current timestamp useful? Probably not. You probably want to predict failures ahead of time. This leads us to the problem of granularity. Is the timestamp too granular to be interesting? Should you summarize the data to a higher level first?

Let's say you want to predict if a failure will occur within the next five minutes based on minute-by-minute data for the last five minutes. In this situation you would need to aggregate your data at the minute level and create a "FailureInTheNextFiveMinutes" column. The previous five minutes would be represented as a nested table e.g.

Case Table

MachineID/Minute FailureInNextFiveMinutes

NestedTable

MachineID/Minute Minute Termperature FanSpeed Reads CPUTemp

In the nested table the first column is the reference to the case id and the "Minute" column is an integer 1-5 representing how long ago the row represents (e.g. 1 minute ago, 2 minutes ago). You would use the Minute column as the key.

Such a model would give you insights such as "If there are X number of reads 3 minutes ago and the temperature is above Y 1 minute ago you have a 60% chance of failure in the next five minutes".

Obviously you should tune this to fit your needs, but that's the general idea.

HTH

-Jamie

|||

Thanks for the reply.

Your suggestion confirms the direction we decided to take, in fact we probably simplfied it a little bit.

We're feeding a Neural Network with some 7 day aggregates (min,max,avg,std,var) on the different attributes we're capturing 14 days before failures. The hope is that we'll be able to predict issues 7 days before they actually occur, hence the historical snapshot.

I'm happy to report that the algorithm is performing fine, we're in the process of tweaking what we feed the algorithm to come up with the best prediction accuracy.

Frank

|||

I have another point of view for this problem that suppose you can collect accurately training data sets.

Suppose you can get the following row periodically
MachineID Minute Termperature FanSpeed Reads CPUTemp CollectData Failure

where CollectData mean current data/time and Failure has the default value "No"

When the failure is done you have to have the last row before the crash so for that last row the value for Failure
column can be put "Yes". After you have an appropriate training data set (as a volume of data) you can build a data mining model with Failure as predictive column.

Gigi Ciubuc

www.sqlserver.ro

Mining Algorithms

I would like to use SQL Server 2005 data mining to look at some engineering problems. In order to really do that, I would like to see some reference material on how the different algorithms work. Does anyone know where I could find some? I have looked hard thru BOL and cant find any there.

Hi!

You can find a lot of useful material about Data mining algorithms on URL: www.sqlserverdatamining.com. And also in SQL server 2005 tutorials for data mining which is a part of SQL server 2005 documentation.

|||As I wrote in another thread, usually I need only online documentation to learn new features of applications, but I found the online materials are inadequate for me to grasp the data mining part of SQL 2005 after spending a lot of time on them. I am half through the book Data Mining with SQL Server 2005 and have found it is quite helpful for learning the concepts and practical steps even though it has lots of editorial errors (fortunately they are so obvious that they do not pose much risk of misleading readers).I (scientist in R&D) am using DM to explore real problems now.|||

Online documents is sufficient enough to understand various features of SQL Server. If you want to learn moreabout mining algorithms to find best suitable for your purpose here is a good site

http://www.kdnuggets.com/

Mining Algorithms

Hi!

Online documents is sufficient enough to understand various features of SQL Server. If you want to learn moreabout mining algorithms to find best suitable for your purpose here is a good site

http://www.kdnuggets.com/

sql

Monday, March 26, 2012

Mining XML from file

Mining XML from file

Mining Structures

Mining structure Processing Event Notification

Mining structure Processing Event Notification

Mining structure Processing Event Notification

Mining structure has to contain at least 2 mining models to see their accuracy charts?

Friday, March 23, 2012

Mining Query Builder

Mining Model?

mining model with time

mining model viewer error

Mining Model predicts same set of products for all cases

Mining Model Prediction.

Mining Model Parameter

Mining cubes vs relational

Mining Content Viewer for Linear Regression: Node Distribution output

mining association rules using SQL

Mining Approach?

Mining Algorithms

Mining Algorithms

Migrating to SQL 2005

Blog Archive

About Me