Data Mining Tools, Banking Data Warehouse with C# Sharepoint, and Data Mining Software: Predicting future steps when you do not have enough history

No History? No Worries!

Say you’re launching a new product and you want to predict what sales might look like in the next few months. Classical time series prediction does not work in this scenario because you don’t have historical sales data for the product. However, new SQL Server 2008 extensions to the Microsoft_Time_Series algorithm and DMX allow you to easily apply the patterns from another similarly-behaving time series to solve this conundrum.

Predicting Post-IPO Stock Values for VISA

In this example, we illustrate how to use the PredictTimeSeries method with the parameter REPLACE_MODEL_CASES to obtain predictions for a time series for which we do not have enough historic data. The power of this combination comes into play when we have a time series with not enough historic data points to build a realistic model, but we know that this series follows a pattern similar to another time series for which we have enough historic data to build a model.

Here’s an Excel 2007 workbook that has 73 historic data points representing post-IPO daily closing values for the MasterCard stock and just 8 corresponding values for Visa (since we’re doing this 8 days after the Visa IPO). Our goal is to use the MasterCard stock history to derive better predictions for the Visa stock.

We will use the SQL Server 2008 Data Mining Client Add-in for Excel to build and query the time series model:

1. Make sure you have Excel 2007 with the DM Client add-in installed.

2. Save the workbook with the MasterCard/Visa stock data to your local disk and open in Excel 2007.

3. To create a model for the MasterCard stock data on the first worksheet, click on the “Data Mining” tab and select the “Forecast” task.

4. Select “Next” on the first page of the forecast wizard “Getting Started with the Forecast Wizard”. (Note: This page might not appear if you previously selected the option to skip the welcome page.)

5. On the second page “Select Data Source”, select the table we created previously and click on “Next” button.

6. On the “Forecasting” page, select the time stamp column to be the first column, named “TimeStamp”.

7. In the input columns grid, de-select the “TimeStamp” column and select the “MasterCard” column, then click “Next”.

8. On the last page of the wizard, rename the structure “MasterCardStructure” and the model “MasterCardModel”, leave the default selections to browse the model after it is created and to allow drill through, and click “Finish” to end the wizard and proceed to build the model.

The MasterCard model historic data and the first 10 predicted values are illustrated in the following graph:

Now, use the same steps to create a time series model for the Visa stock using the 8 historical data points on the second workbook sheet. You will see right away that the model will not generate meaningful predictions due to the lack of sufficient historic data points. The VisaModel historic data and the next 10 predicted values are illustrated in the following graph:

Better Predictions Using REPLACE_MODEL_CASES

A better approach is to use the knowledge that the Visa and MasterCard stocks have a similar pattern and to use the model built for MasterCard to obtain predictions for the Visa stock values. Here’s how (again using the Data Mining Client Add-in for Excel):

1. Select the “Query” task from the “Data Mining” ribbon and click "Next" on the introductory page.

2. Select the “MasterCardModel” model and click the “Advanced” button.

3. On the “Data Mining Advance Query Editor” page, click on the button “Edit Query”, select Yes on the dialog asking to confirm that “Any changes to the query text will not be preserved when you switch back to the design mode.”

4. Type the following query:

SELECT

(SELECT $Time, [MasterCard] as [Visa] FROM

PredictTimeSeries([MasterCardModel].[MasterCard], 10, REPLACE_MODEL_CASES)) as Predictions

From [MasterCardModel]

NATURAL PREDICTION JOIN

(SELECT 1 AS [TimeStamp], 64.35 as [MasterCard]

UNION

SELECT 2 AS [TimeStamp], 62.76 as [MasterCard]

UNION

SELECT 3 AS [TimeStamp], 64.48 as [MasterCard]

UNION

SELECT 4 AS [TimeStamp], 66.11 as [MasterCard]

UNION

SELECT 5 AS [TimeStamp], 69 as [MasterCard]

UNION

SELECT 6 AS [TimeStamp], 75.1 as [MasterCard]

UNION

SELECT 7 AS [TimeStamp], 82.75 as [MasterCard]

UNION

SELECT 8 AS [TimeStamp], 82.86 as [MasterCard])

as t

5. Click “Finish” and select the results of the query to be copied into a new worksheet.

The results should look like this:

When the REPLACE_MODEL_CASES parameter is used, the PredictTimeSeries method will return the requested number of prediction obtained by replacing the last historic points of the given model with the new values provided in the query. In our case, the last 8 data points for the MasterCardModel are replaced with the values we generate on the fly using the SELECT and UNION options in the input set specified after the “NATURAL PREDICTION JOIN” keywords. Then, the MasterCardModel equations are used to predict the next 10 values for the Visa stock series.

To see the power of this method, we can compare the predictions obtained using the MasterCard model (Predictions.Visa), with the predictions generated by the VisaModel model obtained using only the limit sets of 8 data points of the Visa stock values (Predictions.Visa2). The results are illustrated in the following graph:

So there you go - you have a new tool in your arsenal when you don’t have enough data to make accurate time series predictions. Enjoy!

Data Mining Tools, Banking Data Warehouse with C# Sharepoint, and Data Mining Software

Thursday, August 20, 2009

Predicting future steps when you do not have enough history

No comments:

Post a Comment

Blog Archive

FEEDJIT Live Traffic Feed