Thursday, August 20, 2009

Data Mining Dimensions

What is a data mining dimension?

A DM dimension is a dimension with a special parent-child hierarchy that's based on relationships discovered in your data by applying data mining, as opposed to a regular dimension where the hierarchies are user-defined. For example, you might discover interesting groups of customers by building a mining model that applies the Microsoft_Clustering algorithm on demographic data in your Customers dimension. A DM dimension based on this mining model can be used to browse your customer sales data and slice it by the customer groups found by the mining model.

How do I create and use a data mining dimension?

When you build a mining model based on an OLAP cube using the Data Mining Wizard in Business Intelligence Development Studio, the last dialog in the wizard allows you to create an associated data mining dimension as well a new cube that links to the measuregroups in the source cube and includes the DM dimension. When you browse the new cube, you can slice the data in the original cube using the new hierarchy discovered by the mining model.

You can also create a data mining dimension (and a cube that uses it) outside of the Data Mining wizard by selecting an existing OLAP mining model in the mining model editor and picking "Create a Data Mining Dimension" from either the Mining Model menu or the context (right-click) menu.

How does it work?

A data mining dimension is processed with a data source view that points to a DMX query which fetches data from an OLAP-specific view of the source mining model's content. You can run this query yourself to see what it returns:

SELECT * FROM .DIMENSION_CONTENT

As part of the data mining dimension processing, a special index is built that maps cases in the mining model's source OLAP dimension to members in the data mining dimension (which represent a hierarchical view of nodes in the mining model content). This index is used when querying fact data using the data mining dimension.

The data mining dimension and its source mining model have to reside on the same Analysis Server database.

Which algorithms support data mining dimensions?

You can build data mining dimensions based on OLAP mining models that use the Microsoft_Decision_Trees, Microsoft_Clustering, Microsoft_Association_Rules or Microsoft_Sequence_Clustering algorithms. In addition, third-party plug-in algorithms may choose to support data mining dimensions.


No comments:

Post a Comment