Migrating to SQL 2005: productid

Friday, March 23, 2012

Mining Model?

Hi,

I'm working on a project to create a mining model.I have one "flattened" table with a productID (key) and related attributes. Example columns are weight, color, price, units sold last year, product category, product rating (top seller, etc) and similar columns, about 40 in total. There are about 15,000 products and same number of rows in the table.

The objective is to pass a ProductID (that also exists in the table) and get back the top N products that most resemble the source product based on all the attributes.

Any advice on which models I should test and how to set up the models would be much appreciated. Also if there is a similar example/sample out there, please let me know. I downloaded the MovieClick sample, but that doesn't work in my case as I only have the equivalent of the Customers table.

Thanks in Advance.

This is actually a K-Nearest Neighbor problem, for which we don't ship an implementation for SQL Server Analysis Services (although someone may have written a plug in). I could imagine that you could use the fuzzy matching transform in Integration Services to do this, however. To do so you would just use your product table as the reference table, send in the fields of the product of interest, then sort the results by the match confidence.

HTH

-Jamie

|||

Thanks Jamie, I'll try fuzzy matching in SSIS. I'm also thinking of writing my own code to calculate based on KNN in TSQL. I'm not sure if the performance will be acceptable though especially as I need to pass in up to 5 products, evaluate based on their combined attributes and get a resultset back with top N matches.

Is there a third party software out there that can do this?

|||When using the fuzzy components in SSIS, do be aware that they are string matching components, evaluating the edit distance between two string. So, for example, "911" will be pretty much as close a match to "999" as "199" - so for any columns where you need to evaluate similarity of values you may need to use other components such as the Derived Column to perform the calculation.|||

Hi Donald,

Can you explain what you mean by Derived column. How would I make it work to calculate distances for numeric columns/attributes.

Thanks.

|||I don't know of a 3rd party KNN, but you would likely get better performance implementing your own algorithm using our plug-in interfaces in C#.|||

Jamie/Donald

Thanks for your responses. My C# is not that strong. For now I'll write something in SQL and if I can create a model that works, I'll convert that into a plug-in.

This is the plan. Normalize all attributes using mean and std dev. Apply the nearest neighbour algorithm using Euclidean distance. Also experiment with arbitrary weights for attributes or maybe use correlation to filter out some of the weaker attributes, not sure how that will work though at this point.

Any suggestions are welcome. Also, Is there a sample plug-in in C# out there I can use as template. Is there any way I can mold this table to use Association Rules or any other built-in algorithm.

Regards,

Asim.

|||

There are tutorials here for writing managed plug-in algorithms: http://www.sqlserverdatamining.com/DMCommunity/Tutorials/default.aspx

You may need to register (free!) to get access to the links.

Mining Model predicts same set of products for all cases

I have developed a product

basket mining model as follows

DSV

SELECT C.CustomerId,C.CustomerIdName,P.ProductId,P.ProductIdName

FROM Customer INNER

JOIN CustomerProduct

ON C.OpportunityId

= P.OpportunityId

Mining Structure

CREATE MINING MODEL ProductBasket

(

CustomerId

TEXT KEY,

CustomerIdName

PREDICT,

ProductId

PREDICT,

ProductRecommend

TABLE PREDICT

(

ProductId TEXT KEY

ProductIdName PREDICTONLY

)

USING Microsoft_Association_Rules

Prediction Query

Since I want the output in the following format

Product

Items (nested Table)

Product

product B

Product C

Product

Product A

Product C

I have written the prediction query as follows

SELECT

t.[ProductId],

PredictAssociation([Association].[ ProductId],3)

From

[Association]

PREDICTION JOIN

OPENQUERY([Adventure Works Cycle MSCRM],

'SELECT DISTINCT

[ProductId]

FROM

(SELECT ProductId FROM ProductBase)

as [Product]

') AS t

[Association].[Product Id] = t.[ProductId]

The model is predicting the

same set of products for every case. Even changes in the algorithm parameter

value do not have any impact on the result.

What is the reason for this and

how can u rectify it?

If you're doing a simple recommendation model (i.e. a model that provides recommendations based on what's in the user's shopping basket, there is no need to have the product id at both case and nested level. Your modeling seems unneccessarily complex for this scenario - it also explains why you're seeing the same recommendations for all cases: based on the DSV you've listed, your training cases don't include the basket contents at all for each transaction.

If you use a model definition similar to the one I described in an earlier thread on this topic (http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=649759&SiteID=1), your query would look like this:

SELECT t.CustomerID, PredictAssociation(AssociationModel.Products, 3) FROM PM
PREDICTION JOIN
SHAPE {
OPENQUERY(ds, 'SELECT CustomerID FROM CustomerProducts ORDER BY CustomerID')
}
APPEND ( {
OPENQUERY(ds, 'SELECT CustomerID, ProductName FROM CustomerProducts ORDER BY CustomerID') }
RELATE CustomerID to CustomerID ) AS Products as t
ON AssociationModel.Products.ProductName = t.Products.ProductName

The core idea is that the input case (for prediction as well as training) is the set of products in the customer's basket and the prediction is the list of recommended products. Both the input and prediction map to the same nested table in the mining structure/model.

|||

The first query should be SELECT DISTINCT CustomerID ...

|||

Also, if you want simply a list of recommendations per product, and you first created a model based on customer baskets you could then create a query like this to get the complete list of product recommendations:

SELECT t.ProductName, PredictAssociation(Products, 3) FROM PM
PREDICTION JOIN
SHAPE {
OPENQUERY(ds, 'SELECT DISTINCT ProductName FROM CustomerProducts ORDER BY ProductName')
}
APPEND ( {
OPENQUERY(ds, 'SELECT DISTINCT ProductName FROM CustomerProducts ORDER BY ProductName') }
RELATE ProductName to ProductName) AS Products as t
ON PM.Products.ProductName = t.Products.ProductName

This works because you are specifying a basket containing a single item as input for the prediction on a model that was created on customer baskets. Using the productname on the top level of the input query for the prediction is just a shortcut since we know we want baskets of size 1 anyway. Doing an input query like this for the model training would cause no rules to be found since only baskets greater than size 1 imply any patterns.

Friday, March 23, 2012

Mining Model?

Mining Model predicts same set of products for all cases

Migrating to SQL 2005

Blog Archive

About Me