Showing posts with label threads. Show all posts
Showing posts with label threads. Show all posts

Friday, March 23, 2012

Mining Content Viewer for Linear Regression: Node Distribution output

With the number of threads it is difficult to know if this has been posted. If I use the Mining Content Viewer for Linear Regression, under Node Distribution, there are values given for Attribute Name, Attribute Value, Support, Probability, Variance, and Value Type. The output is similar to what Joris supplied in his thread about Predict Probability in Decision Trees. My questions:

1. How should these fields be interpreted?

2. With Linear Regression, is it possible to get the coefficient values and tests of significance (t-tests?), if they are not part of the output I have pointed to?

Thanks for your help with this?

Sam

The interpretation of the NODE_DISTRIBUTION rows depends mainly on the VALUE TYPE column.

To exemplify the values, here is the distribution of one node from applying regression to the Iris data set. The target is PetalWidth, with SepalLength, SepalWidth and PetalLength as regressors:

- Two rows of the distribution describe the target continuous attribute. They can be recognized by their value type. The row having value type 1 (Missing) represents the statistics for the Missing state of the target attribute in the current node, while the row having value type 3 (Continuous) represents the statistics for the Existing state of the target attribute. If you do not have gaps in your data, than you can ignore the row with ValueType = 1. For the row with value type 3, ATTRIBUTE_NAME is the name of the target attribute (PetalWidth in my example), ATTRIBUTE_VALUE is the mean of the PetalWidth. You also get the support and variance. Support is the number of training casese in this node, Mean and Variance are computed only over the traiing cases that ended up in this node

- For each regressor, there are 3 distribution rows, having the valuetype, respectively: 7(coefficient), 8(Score gain), 9(Statistics). For all these 3 rows, ATTRIBUTE_NAME is the name of the regressor. Then:

for the row with Value Type 7 (Coefficient), ATTRIBUTE_VALUE is the regression coefficient associated with the regressor ('a' in y=ax+b).

Monday, March 19, 2012

Min/Maximum Grouping Query

I know this has been posted before, but I can't find the previous threads so please bear with me...

I want to grab the very 1st record of each product in a table like this

ID CLIENTID PRODID
1 a 1
2 b 1
3 c 1
4 a 2
5 b 2
6 c 2
7 a 3
8 b 3
9 c 3

so that I'd get a record set like:

ID CLIENTID PRODID
1 a 1
4 a 2
7 a 3

Thanks for the hellp guru'sSELECT t1.ID, t1.CLIENTID, t1.PRODID
FROM table t1
INNER JOIN (
SELECT MIN(ID) AS ID, PRODID
FROM table
ORDER BY PRODID) t2 ON t1.ID = t2.ID
AND t1.PRODID = t2.PRODID|||You did mean GROUP BY, not ORDER BY, right?

SELECT t1.ID, t1.CLIENTID, t1.PRODID
FROM table t1
INNER JOIN (
SELECT MIN(ID) AS ID, PRODID
FROM table
GROUP BY PRODID) t2 ON t1.ID = t2.ID
AND t1.PRODID = t2.PRODID|||Yep. Been sniped. (grin)|||Thanks guys, exactly what I was after. :)