Saturday, July 25, 2009

Data Mining with C# and ADO.NET - ' Creating Predictions '

Doing Predictions on the Model

Now we can use our model to predict an outcome for some new customer data. For example, imagine that you have a Web site where users create profiles. You could overlay this profile data on a model, and predict the banner advertisements these users that they would find most appealing.

For our example, let's say we want to determine which customers over the age 30 will choose a gold membership, with a 75% probability or better. In short, we would be 75% certain that the customers we find would choose the gold membership card.

You can probably guess that the DMX SELECT query statement is used most often to do predictions on a model. A DMX call to invoke our prediction follows below:

String PredictModel =
"Select T.CustomerID, MemberCard_Prediction .MemberCard From MemberCard_Prediction" +
" Natural Prediction Join OpenQuery (Customers, 'select * from NewCustomers) As T" +
" Where T.Age > 30" +
" And PredictProbability(MemberCard, 'Gold') >0.75";
OleDbCommand CMD = new OleDbCommand(PredictModel, conn);
OleDbDataReader myReader; myReader = CMD.ExecuteReader();
while (myReader.Read()) {
//Write out data here
}
myReader.Close();

This query introduces new cases to our model from a datasource called Customers containing the table NewCustomers. The DMX function NaturalPredictionJoin allows us to join the data from the NewCustomers table and our model without any further specifications, because both the table and the model have the same columns.

The PredictProbability function is used in conjunction with the Where clause to produce our desired results. Use this method if you want to introduce new cases to the model, but not actually have the data as part of the model. You may also just want to do predictions against the model, in which case you wouldn't need any alternate source, as all the cases are already in the model.

If you want to explore this subject further, I highly recommend the book by ZhaoHui Tang and Jamie MacLennan, Data Mining with SQL Server 2005, which will help you understand other functions and datamining concepts not mentioned here. This article should be a nice addition to the book, as the book includes no C# ADO.NET code.

No comments:

Post a Comment