Sunday, November 15, 2009

Hello

Hello guys i m back

Wednesday, September 9, 2009

Data Mining Tools: From SAS to R/Java

After a few months using SAS, I find it a powerful and interesting tool to use. It has its own programming language (SAS Base) which allows you to be very specific. On the other side, you have a strong GUI (Enterprise Guide) which lets you perform several tasks using drag and drop. One of the main issue with SAS is, of course, its price. Not every company can afford a SAS license.

sastojavar1

The question then is: when you have SAS code, how can you convert it to a free programming language such as Java or R? After some search on the web, I found the following solutions:

  • From SAS to Java: In the first look it seems difficult to compare SAS with Java code, since they are very different (Java is web oriented). However, it seems that DullesOpen has a tool, named Carolina, to convert SAS into Java code. You can test their tool online and I have contacted them to get a trial version. I will let you know when I have more information about this tool.
  • From SAS to R: SAS system and R language (the free version of S programming language) are quite close since they are both statistics oriented. Of course, the syntax is completely different. Up to know, I didn’t found any tool for automatic SAS to R code conversion. I will comment this post as soon as I will find some interesting tool.

Ajay from DecisionStats told me about MineQuest. It seems that they have a tool to convert SAS code to WPS (World Programming System), a SAS Base clone. If you have some suggestions or way of converting SAS to any other language, you’re welcome to comment this post.

Thursday, August 20, 2009

Banking Data Warehousing

Large U.S. retail banks are building data warehouses and centralizing disconnected data marts. But warehouse-supported marts are proliferating, promising continued challenges to achieving a single enterprise customer view.

Despite widely reported disappointments, and some outright failures, of data warehouse initiatives, Gartner research shows that a data-warehouse-based architecture is the architecture of choice for customer information analysis and decision support among large U.S. retail banks. In a survey of U.S. retail banks with deposits of more than $1 billion, more than one-half of those with deposits of more than $4 billion said they use a data warehouse or data warehouse with associated data marts, with the percentage even higher for the very largest banks. In our research, the smaller banks (deposits between $1 billion and $4 billion) were more likely to say they use a series of unlinked data marts or rely on an operational customer information file (CIF) for analysis and decision support

What to install to use Data Mining

SQL Server 2005 comes with many components. This document describes what components are necessary to perform data mining once you have the SQL Server 2005 beta.

Analysis Services

Analysis Services is the only required component to install on the server. If you want to do data mining against existing SQL Server 2000 databases or other data sources (DB2, Oracle, Access, etc), this is the component you need to install.

Reporting Services

Install Reporting Services if you want to be able to create reports that work against your data mining models.

SQL Server Database Services

You only need to install the SQL Server relational engine if you want to use it as a data source, or if you want to use the Data Mining samples and tutorials.

Data Transformation Services

Installing DTS causes the DTS service to be installed on your server, allowing the running of scheduled packages. Install this if you want to use the integrated data mining tasks and transforms on your server.

Workstation Components, etc.

Install the Workstation Components, etc, on any client machine that will be creating mining models, authoring reports and DTS packages, or managing Analysis Services. The Workstation Components work equally well when installed on the same machine as the server.

Advanced

You will need to click the Advanced button to install the samples and sample databases. All the samples are located under Client Components/Documentation and Samples. The pertinent samples and databases to install are:

Databases:

AdventureWorksDW Sample Data Warehouse (requires you install SQL Server Database Services)

AdventureWorks Sample OLAP

Samples

Analysis Services Samples

Data Transformation Services Samples

Note that installing Samples installs the Samples installation packages. Links to run these are found under Start/Microsoft SQL Server 2005/Install Samples

Data Mining Dimensions

What is a data mining dimension?

A DM dimension is a dimension with a special parent-child hierarchy that's based on relationships discovered in your data by applying data mining, as opposed to a regular dimension where the hierarchies are user-defined. For example, you might discover interesting groups of customers by building a mining model that applies the Microsoft_Clustering algorithm on demographic data in your Customers dimension. A DM dimension based on this mining model can be used to browse your customer sales data and slice it by the customer groups found by the mining model.

How do I create and use a data mining dimension?

When you build a mining model based on an OLAP cube using the Data Mining Wizard in Business Intelligence Development Studio, the last dialog in the wizard allows you to create an associated data mining dimension as well a new cube that links to the measuregroups in the source cube and includes the DM dimension. When you browse the new cube, you can slice the data in the original cube using the new hierarchy discovered by the mining model.

You can also create a data mining dimension (and a cube that uses it) outside of the Data Mining wizard by selecting an existing OLAP mining model in the mining model editor and picking "Create a Data Mining Dimension" from either the Mining Model menu or the context (right-click) menu.

How does it work?

A data mining dimension is processed with a data source view that points to a DMX query which fetches data from an OLAP-specific view of the source mining model's content. You can run this query yourself to see what it returns:

SELECT * FROM .DIMENSION_CONTENT

As part of the data mining dimension processing, a special index is built that maps cases in the mining model's source OLAP dimension to members in the data mining dimension (which represent a hierarchical view of nodes in the mining model content). This index is used when querying fact data using the data mining dimension.

The data mining dimension and its source mining model have to reside on the same Analysis Server database.

Which algorithms support data mining dimensions?

You can build data mining dimensions based on OLAP mining models that use the Microsoft_Decision_Trees, Microsoft_Clustering, Microsoft_Association_Rules or Microsoft_Sequence_Clustering algorithms. In addition, third-party plug-in algorithms may choose to support data mining dimensions.


Predicting future steps when you do not have enough history

No History? No Worries!

Say you’re launching a new product and you want to predict what sales might look like in the next few months. Classical time series prediction does not work in this scenario because you don’t have historical sales data for the product. However, new SQL Server 2008 extensions to the Microsoft_Time_Series algorithm and DMX allow you to easily apply the patterns from another similarly-behaving time series to solve this conundrum.

Predicting Post-IPO Stock Values for VISA

In this example, we illustrate how to use the PredictTimeSeries method with the parameter REPLACE_MODEL_CASES to obtain predictions for a time series for which we do not have enough historic data. The power of this combination comes into play when we have a time series with not enough historic data points to build a realistic model, but we know that this series follows a pattern similar to another time series for which we have enough historic data to build a model.

Here’s an Excel 2007 workbook that has 73 historic data points representing post-IPO daily closing values for the MasterCard stock and just 8 corresponding values for Visa (since we’re doing this 8 days after the Visa IPO). Our goal is to use the MasterCard stock history to derive better predictions for the Visa stock.

We will use the SQL Server 2008 Data Mining Client Add-in for Excel to build and query the time series model:

1. Make sure you have Excel 2007 with the DM Client add-in installed.

2. Save the workbook with the MasterCard/Visa stock data to your local disk and open in Excel 2007.

3. To create a model for the MasterCard stock data on the first worksheet, click on the “Data Mining” tab and select the “Forecast” task.

4. Select “Next” on the first page of the forecast wizard “Getting Started with the Forecast Wizard”. (Note: This page might not appear if you previously selected the option to skip the welcome page.)

5. On the second page “Select Data Source”, select the table we created previously and click on “Next” button.

6. On the “Forecasting” page, select the time stamp column to be the first column, named “TimeStamp”.

7. In the input columns grid, de-select the “TimeStamp” column and select the “MasterCard” column, then click “Next”.

8. On the last page of the wizard, rename the structure “MasterCardStructure” and the model “MasterCardModel”, leave the default selections to browse the model after it is created and to allow drill through, and click “Finish” to end the wizard and proceed to build the model.

The MasterCard model historic data and the first 10 predicted values are illustrated in the following graph:

Now, use the same steps to create a time series model for the Visa stock using the 8 historical data points on the second workbook sheet. You will see right away that the model will not generate meaningful predictions due to the lack of sufficient historic data points. The VisaModel historic data and the next 10 predicted values are illustrated in the following graph:

Better Predictions Using REPLACE_MODEL_CASES

A better approach is to use the knowledge that the Visa and MasterCard stocks have a similar pattern and to use the model built for MasterCard to obtain predictions for the Visa stock values. Here’s how (again using the Data Mining Client Add-in for Excel):

1. Select the “Query” task from the “Data Mining” ribbon and click "Next" on the introductory page.

2. Select the “MasterCardModel” model and click the “Advanced” button.

3. On the “Data Mining Advance Query Editor” page, click on the button “Edit Query”, select Yes on the dialog asking to confirm that “Any changes to the query text will not be preserved when you switch back to the design mode.”

4. Type the following query:

SELECT

(SELECT $Time, [MasterCard] as [Visa] FROM

PredictTimeSeries([MasterCardModel].[MasterCard], 10, REPLACE_MODEL_CASES)) as Predictions

From [MasterCardModel]

NATURAL PREDICTION JOIN

(SELECT 1 AS [TimeStamp], 64.35 as [MasterCard]

UNION

SELECT 2 AS [TimeStamp], 62.76 as [MasterCard]

UNION

SELECT 3 AS [TimeStamp], 64.48 as [MasterCard]

UNION

SELECT 4 AS [TimeStamp], 66.11 as [MasterCard]

UNION

SELECT 5 AS [TimeStamp], 69 as [MasterCard]

UNION

SELECT 6 AS [TimeStamp], 75.1 as [MasterCard]

UNION

SELECT 7 AS [TimeStamp], 82.75 as [MasterCard]

UNION

SELECT 8 AS [TimeStamp], 82.86 as [MasterCard])

as t

5. Click “Finish” and select the results of the query to be copied into a new worksheet.

The results should look like this:

When the REPLACE_MODEL_CASES parameter is used, the PredictTimeSeries method will return the requested number of prediction obtained by replacing the last historic points of the given model with the new values provided in the query. In our case, the last 8 data points for the MasterCardModel are replaced with the values we generate on the fly using the SELECT and UNION options in the input set specified after the “NATURAL PREDICTION JOIN” keywords. Then, the MasterCardModel equations are used to predict the next 10 values for the Visa stock series.

To see the power of this method, we can compare the predictions obtained using the MasterCard model (Predictions.Visa), with the predictions generated by the VisaModel model obtained using only the limit sets of 8 data points of the Visa stock values (Predictions.Visa2). The results are illustrated in the following graph:

So there you go - you have a new tool in your arsenal when you don’t have enough data to make accurate time series predictions. Enjoy!


Wednesday, August 19, 2009

Quick Visualization of irs.gov Search Queries

Here is a quick visualization I did in honor of April 15th to investigate what people looking for on tax day…

This “query tree” shows the most frequent searches starting with the term “irs”. Each branch in the tree represents a query where the words are sized according to frequency of occurrence. I like how you can see at a glance what the most popular tax forms are by following the “irs tax form …” branch. Apparently form 8868, Application for Extension of Time To File, is in high demand.

It was created by uploading search queries from AOL users leading to clicks on irs.gov during Spring 2006 to Concentrate, which generated the query tree. This image is a snapshot of an interactive flash visualization in Concentrate, where the focus term was “irs”. Looking at query patterns like this can help you get an idea of what people are looking for and how to better organize your site so they can find it quickly.

The interactive flash visualization was developed by Chris Gemignani using Flare with some input from Zach Gemignani and myself and inspiration from the Many Eyes WordTree.

The raw data is from the released AOL Search data sample, and consists of the subset of unique queries leading to clicks on irs.gov from March to May 2006. The IRS queries used to make the visualization can be downloaded here: irs.gov.queries.csv (191K)

Here are the top 10 queries in the file:

Query Searches
irs 4787
irs.gov 2282
www.irs.gov 1975
internal revenue service 1154
irs forms 608
tax forms 361
irs tax forms 196
internal revenue 158
taxes 142
wheres my refund 139
federal tax forms 125
irs refunds 106

Research on Data Mining

It’s time for some holidays at DMR. I will be back in the blogosphere in the beginning of August. In the meanwhile, here are some links that may interest you:

Hope to see you soon on Data Mining Research.

Tuesday, August 11, 2009

SAS tutorial - generation of a sample business datawarehouse scenario

This tutorial shows how to use SAS to implement ETL process which generates a star schema datawarehouse architecture.

We assume that you already have basic SAS/BASE knowledge and are familiar with SAS environment, assigning libraries, running SAS programs.

The aim of this tutorial is to generate a datawarehouse structure which would help monitor performance of a sample business scenario described here:
A manufacturing company Data Warehouse Business Scenario - requirements from a palm and tropical plants nursery which implements an analysis and reporting system to track sales, costs, forecasts and business performance management figures.

Tutorial overview:
- First step will be to read dimensions and populate sample dimensions data
- Then a fact table will be created.
- In the next step we will randomly generate transactions for the fact table with sales data for three years. To generate the numbers we will use SAS random number generators with uniform and random distributions.
- The final tasks will be to validate and extract generated data and feed the reporting application.

Please also be aware of the fact that SAS is very powerful and flexible system and the things we show in this tutorial can be done in many different ways. It is just one way to get the expected results.

SAS tutorial chapters


  • 1. Load extracts into SAS - A couple of programs responsible for loading the dimensions extracts into SAS and an example on how to create dynamically additional dimension tables
  • 2. Populate dimensions - Example of how to populate random dimensions in a fact table
  • 3. Generate measures - Random generation of a given set of measures. The measures are generated randomly, however they apply business rules described in a business scenario
  • 4. Sales fact table - In that lesson we create a fact table with sales figures designed in a star schema datawarehouse architecture. Additionally, we perform a statistical analysis of the newly generated data
  • 5. Costs fact table - We randomly generate a fact table with costs. Costs are divided into fixed and variable costs and allocated on year and month detail level
  • 6. SAS ETL Process - Run the whole process in a sequence which may be considered as a representation of ETL Process in SAS. The process could also be set up in SAS ETL Studio or SAS Warehouse Administrator

  • Header and trailer structured textfile processing in SAS - Example from the Data Warehousing Tutorial

  • Data Warehousing ETL tutorial

    The ETL and Data Warehousing tutorial is organized into lessons representing various business intelligence scenarios, each of which describes a typical data warehousing challenge.
    This guide might be considered as an ETL process and Data Warehousing knowledge base with a series of examples illustrating how to manage and implement the ETL process in a data warehouse environment.

    The purpose of this tutorial is to outline and analyze the most widely encountered real life datawarehousing problems and challenges that need to be taken during the design and architecture phases of a successful data warehouse project deployment.

    The DW tutorial shows how to feed data warehouses in organizations operating in a wide range of industries.
    Each provided topic is thoroughly analyzed, discussed and a recommended approach is presented to help understand how to implement the ETL process.

    Going through the sample implementations of the business scenarios is also a good way to compare BI and ETL tools and get to know the different approaches to designing the data integration process. This also gives an idea and helps identify strong and weak points of various ETL and data warehousing applications.

    This tutorial shows how to use the following ETL and datawarehousing tools: Datastage, SAS, Pentaho, Cognos and Teradata.


    Data Warehousing & ETL Tutorial lessons

    Etl Tools Info portal

    ETL-Tools.Info portal provides information about different business intelligence tools and datawarehousing solutions, with a main focus on ETL process and tools. On our pages you will find both general articles with high-level information on various Business Intelligence applications and architectures, as well as technical documents, with a low-level description of the presented solutions and detailed tutorials.
    A great attention is paid to the Datastage ETL tool and we provide a number of Datastage examples, Datastage tutorials, best practices and resolved problems with real-life examples.
    There is also a wide range of information on a rapidly growing Open Source Business Intelligence market (OSBI), with emphasis on applications from the Pentaho BI family, including a Pentaho tutorial.
    We also provide a SAS Guide with tutorial, which illustrates the vision of SAS on Business Intelligence, Data Warehousing and ETL process.
    We have recently added a new ETL case study (ETL course with examples) section which represents a set of business cases, each of which illustrates a typical data warehousing problem. We analyze the cases thoroughly and propose the most efficient and appropriate approach to solving that problems by showing sample ETL process designs and DW architectures.
    Microsoft users may be very interested in exploring our Excel BI crosstabs section with FAQ and sample solutions.

    What is Business Intelligence?

    Business intelligence is a broad set of applications, technologies and knowledge for gathering and analyzing data for the purpose of helping users make better business decisions.
    The main challenge of Business Intelligence is to gather and serve organized information regarding all relevant factors that drive the business and enable end-users to access that knowledge easily and efficiently and in effect maximize the success of an organization.

    Business intelligence produces analysis and provides in depth knowledge about performance indicators such as company's customers, competitors, business counterparts, economic environment and internal operations to help making effective and good quality business decisions.

    From a technical standpoint, the most important areas that Business Intelligence (BI) covers are:

    Data Warehouse, Data Mart, Data Mining, and Decision Support Resources

    Featured Resources

  • First Place Learning : Data Warehouse, Data Mart, Data Mining, and Decision Support
  • Additional Resources

  • KDNuggets : Data Mining and Knowledge Discovery Resource center ****
  • Alacrity, Inc. : Integrated Data Intelligence Software
  • Allen, Davis, and Associates : Data Warehousing Career Newsletter
  • AlphaBlox : Data Analysis Software
  • AltaPlan : OLAP Links
  • Aonix : Object Oriented Modeling Tool and Cleansing Software
  • Attar Software : Data Mining / Neural Nets
  • Bill Inmon : Leading Data Management and Data Warehouse Speaker and Writer
  • Brio Technologies : Brio Web Warehouse and Decision Support Suite
  • Bull : Data Warehousing Solutions
  • Business Intelligence : The OLAP Report -- Richard Creeth
  • Business Objects, Incorporated : WebIntelligence for enterprise decision support
  • Cognos, Incorporated : Data Warehousing Software Tool Suite
  • CDI : Creative Data Inc : Data Warehouse Consulting and Training -- good Data Warehouse Links
  • D2K, Incorporated : "Turning Data into Knowledge"
  • DataFlux : Data Quality and Integration Software
  • DataMirror : Data Integration, Data Protection, Data Audit Solutions
  • datawarehouseconsulting.com : Data Warehouse Consulting
  • datawarehousing.com : Data Warehousing Portal
  • Data Warehousing Institute : Conferences and whitepapers
  • Decision Point Applications : Packaged Data Warehouse Solutions
  • Decision Technology : DecisionCentric® Server
  • Decision Works : Data Warehouse Consulting and Education
  • Decision Works Ltd : TRACK Objects software and consulting
  • Dimensional Insight, Inc. : Reporting and analysis software
  • DM Review : Leading Data Management Industry Publication
  • Don Meyer & Associates : Data Warehousing Consultants
  • DW Soft : Data Warehouse Software and Service using Microsoft Repository
  • Epsilon Data Management : Databased Marketing Services and Training
  • Evolutionary Technologies, Inc. : ETI*EXTRACT(r) Tool Suite for Data Warehousing and Data Migration
  • FileTek : Software for managing massive amounts of atomic data
  • First Logic : Customer Data Management Software
  • Hyperion : ESSBASE - High Speed OLAP Processor
  • IBM : DataGuide
  • Informatica : The Data Mart Company
  • Information Builders : Data management software
    Data Warehousing, Decision Support, Middleware, Data Access, ...
  • Intelligent Solutions, Inc. : Claudia Imhof / Data Warehousing and Data Modeling
  • IRI : CoSORT ETL Software
  • Kalido : Software for adaptive enterprise data warehousing and master data management
  • Kenan Systems Corporation : Market Analysis Software
  • Megaputer Intelligence : Data Mining and Warehousing
  • Micro Strategy : Relational OLAP (ROLAP) Software and Services
  • MiningCo : Data Mining plus excellent data management articles
  • Nautilus Systems, Inc. : Data Warehousing, Data Mining, and Data Visualization software
  • netcarve Technologies GmbH : Data Warehousing and Data Mining Solutions
  • NetScheme Solutions Inc.
  • Open Technologies, Inc. : Data Warehouse Consulting and Recruiting Specialists
  • OSMC : Data Warehousing Consultants
  • Paralogic, Inc. : Data Warehousing and Decision Support Consulting / Lexington, MA
  • Perl.Com : Perl is a scripting language useful for extracting and loading data
  • Pervasive : Data Integration
  • Pilot Software : Customer and Market Data Analysis Software
  • Poinpoint Solutions Inc. : Data Warehouseing solutions for the insurance industry
  • PLATINUM technology, inc. : Data management software and consulting
  • Princeton Softech : Database Management and Data Warehouse Software
  • Query Object Systems Corp : Business Solution Components
  • Ralph Kimball Associates : A pioneer and leader in the Data Warehouse field
  • Redbrick Systems : Multidimensional database software
  • Redbrick Whitepaper : Server Requirements
  • Retek : Data Warehousing for Retail Industry
  • Rocket Software : Business Intelligence
  • Rulequest Research : Data Mining Tools
  • saleslobby.com : Whitepaper - Building the Customer Data Warehouse
  • Salford Systems : CART software for tree-structure, non-parametric data analysis
  • SAS Institute : Data Warehouse and Data Mining Software
  • Seagate Software : Crystal Reports
  • Silvon Software : Supply Chain Data Warehousing
  • SolutionsIQ : Data Warehousing Solutions
  • Speedware Corp :Business Intelligence Software
  • Sybase, Inc. : Data Warehousing Database Software
  • Teleran : Data Warehousing and eCommerce Solutions
  • Teradata / NCR : Database machine
  • Software AG : Data modeling and data warehouse course outlines
  • Thinking Machines Corporation : Data mining software for loyalty management systems
  • Trillium Software : Data Cleansing and Data Reengineering
  • Universal Data Solutions, LLC : Len Silverston - Data Modeling and Data Warehouse - Coauthor of 'The Data Model Resource Book'
  • Thursday, August 6, 2009

    Datasets for Association Rule Mining

    A normal transaction consists of a transaction-id and a list of items in every row or sentence. Sometimes, the items are represented as boolean values 0 if the item is not bought, or 1 if the item is bought. But the commonly used format for Market Basket data is that of numeric values for items without any other information:
    1 3 5 9 11 20 31 45 49
    3 7 11 12 15 20 43...
    This format has to be converted in order to be used by ARMiner and ARtool, since those tools can only evaluate binary data. ARMiner and ARtool have a special converter for that purpose which have to be performed before analyzing the data. WEKA needs a special ASCII-Data format (*.arff) for data analysis containing information about the attributes and a boolean representation of the items. Since there is no unique format for input-data, it is impossible to evaluate the same dataset in one format with different tools. In this paper, we present a dataset generator that is able to generate datasets that are readable by ARMiner, ARtool,WEKA and other data mining tools. Additionally, the generator has the ability to produce large Market Basket datasets with timestamps to simulate transactions in both retail and e-commerce environments.

    Datasets for Market Basket Analysis

    Since Market Basket Analysis is an important tool for improved marketing, many software solutions are available for this purpose. Business tools like Microsoft SQL Server 2005, or IBM DB2 Data Warehouse focus on the analysis of relational business data. Software tools which are available for free download like ARMiner , ARtool , or WEKA , are more dedicated to research. They do not only analyze data but also give additional information on the effectiveness of algorithms performed. So, in order to generate data to be used by those tools, we have to investigate which kinds of datasets can be generated.

    Wednesday, July 29, 2009

    Popular Data Mining Software

    You can sort the table below by clicking on the column names.

    Software Name Details
    ACTwo Software AC2 http://www.alice soft.com/products/ac2.html AC2 is a set of C/C librairies allowing developper and IT professional to embed data mining functionalities into their ...
    ARtool Software ARtool http://www.cs.umb.edu/~laur/ARtool/ ARtool contains several implementations of algorithms for mining frequent itemsets and association rules. ARtool and its ...
    ASOCAGHeidelberg The Knowledge Processing Company ASOC AG Heidelberg The Knowledge Processing Company http://www.asoc.de Contributed by: ASOC AG Heidelberg The Knowledge Processing Company http://www ...
    Add Software To Listing Add new software package to The Data Mine: "}% Click on "Software" on top bar to come back once you've done this.'}% %IF{"context authenticated" then ' Please fill ...
    Aimm Software AIMM http://www.brandmarc.nl/ Contributed by: m.derksen #64;brandmarcSPAM BLOCKER.nl Note: This info converted from the original "The Data Mine" pages and pre dates ...
    Alice Software Alice http://www.alice soft.com/products Alice is a powerful and easy to use Data Mining Tool. Use decision trees to explore exploit your data. Textual reports, ...
    All Data Mining Software The table below lists all data mining software whose details have been checked. You may also want to view the OldListOfDataMiningSoftware. You can sort the table below ...
    Auto Class C AutoClass C http://ic www.arc.nasa.gov/ic/projects/bayes group/group/autoclass/autoclass c program.html AutoClass C is a public domain version of AutoClass III ...
    Bayesian Knowledge Discoverer Bayesian Knowledge Discoverer http://kmi.open.ac.uk/projects/bkd/ Bayesian Knowledge Discoverer (BKD) is a computer program able to learn Bayesian Belief Networks ...
    Business 3 D IgorMalinka 22 Feb 2006 We are looking for interested parties for commercialization
    CARTFrom Salford Systems CART from Salford Systems http://www.salford systems.com Commercial Contributed by: mlipsey #64;salford systemsSPAM BLOCKER.com Note: This info converted from ...
    CARTRBy Salford Systems CART(r) by Salford Systems http://www.salford systems.com Robust decision tree technology for data mining, predictive modeling, and data pre processing. Contributed ...
    CViz Software CViz http://www.alphaWorks.ibm.com/formula/cviz CViz is a visualization tool designed for analyzing high dimensional data (data with many elements) in large, complex ...
    Castaneda DMS Castaneda DMS http://www.girgese.com/ A data mining suite for use on personal computers. It provides association rules, FOIL algorithm, clustering and decision ...
    Clementine Software N.B. see page " SPSS Clementine " to edit this information
    Codework Three Way Tangram Codework 3 way TANGRAM http://www.codework it.com/tangram/ 3 way TANGRAM is a desktop OLAP for the Windows platform. Write in APL code. Contributed by: codework ...
    Commercial Tools For Data Mining Commercial tools for data mining http://www.cs.bham.ac.uk/~anp/dm docs/oudshoff.tools.posting Posting by Sandra Oudshoff on comp.ai summarizing information on a ...
    Cygron Data Scope Cygron DataScope http://www.cygron.com Visual data mining and decision support tool with ODBC import capability, html export, 3D interactive graphs, automatic relation ...
    Cypress The Integrated Document And Knowledge Server Cypress, The Integrated Document and Knowledge Server® http://www.cypressdelivers.com Commercial Cypress® is a document and knowledge management system that captures ...
    Dat Gen Dataset Generator (DatGen) http://www.datasetgenerator.com One important way to test learning from example algorithms is to evaluate their performance against well ...
    Data Analysis Software For Scientific Analysis Of Experimental Data Data analysis software for scientific analysis of experimental data gopher://calypso.oit.unc.edu:70/11/../.pub/academic/data analysis Misc software (editors note ...
    Data Engine DataEngine http://www.mitgmbh.de DataEngine is a software product for data analysis using fuzzy technologies, neural networks, and conventional statistics. It has ...
    Data Engine 3 1 DataEngine 3.1 http://www.mitgmbh.de DataEngine is the software for intelligent data analysis and data mining. By using neural networks, fuzzy logic and statistical ...
    Data Intelligence Add In AIComponents Data Mining Add In for Excel http://www.aicomponents.com This tool allows you to apply clustering, decision trees, neural networks, and association ...
    Data Miner Maximzier Inc Data Miner Maximzier Inc http://www.dmmax.com DMM is a predictive modeling software which was developed in order to maximize profit in business application(e.g ...
    Data Mining Software By PMSI Data mining software by PMSI http://www.altern.org/pmsi/home gb.htm Lots of tutorials and shareware in English and French. Contributed by: pmsi #64;alternSPAM BLOCKER ...
    Data Mining Suite Data Mining Suite http://www.datamining.com The Data Mining Suite™ is an integrated set of products that provide a powerful, complete and comprehensive solution ...
    Data Mining Tool Easy Miner Data Mining Tool Easy Miner http://www.co.umist.ac.uk/~koundour/index.html A data mining tool (Easy Miner) for the areas of : association rules, classification ...
    Data Mining Tools AGILE8 INSIGHT© PROCESS ANALYSIS TOOL KIT (from website www.agile8consulting.com) Most of the value added in today's organisations is not in systems, but in the minds ...
    Data Mite DataMite http://www.lpa.co.uk/dtm.html DataMite enables rules and knowledge to be discovered in ODBC compliant relational databases. DataMite requires neither programming ...
    Data Sage Datasage no longer exists. They were acquired by Vignette. Thanks to GaborMelli for the information AndyPryke 28 Oct 2001
    Data Scope DataScope http://www.tiszanet.hu/cygron/DATASCP.HTM The key to knowledge is to display and manage your data in the most 'understandable' form. As you may have experienced ...
    Data Surveyor Data Surveyor http://www.ddi.nl Data Surveyor is a data mining tool for expert users. It consists of a suite of powerful algorithms and provides support for all ...
    Data XTm DataX(tm) http://www.zaptron.com/datax Contributed by: scott Ivan, scott #64;zaptronSPAM BLOCKER.com Note: This info converted from the original "The Data Mine" ...
    Dataset Generator See: DatGen AndyPryke 28 Oct 2001
    Db Bridge dbBridge http://www.internetivity.com/ Contributed by: Note: This info converted from the original "The Data Mine" pages and pre dates June 2001. Please remove this ...
    Db Bridge Universal Remote Data Connectivity dbBridge Universal Remote Data Connectivity http://www.dbBridge.com Dalco Technologies dbBridge is a client side driver similar to other OLEDB/ODBC driver with ...
    Db Probe dbProbe http://www.internetivity.com/ Contributed by: glenn #64;nonlinearSPAM BLOCKER.ca Note: This info converted from the original "The Data Mine" pages and pre ...
    Db Prophet Neural Network Data Mining Tool By Trajecta dbProphet: neural network data mining tool by Trajecta http://www.trajecta.com Utilizing sophisticated neural network technologies, Trajecta offers a broad range ...
    Decision Tree Decision Tree http://www.creditscore.co.nz Builds decision trees and logarithmic scorecards on any dataset (automatically handles discrete and continuous data) ...
    Decisionhouse Software Decisionhouse http://www.quadstone.co.uk Contributed by: ANP Note: This info converted from the original "The Data Mine" pages and pre dates June 2001. Please remove ...
    Dimensional Insight Inc Dimensional Insight, Inc. http://www.dimins.com Dimensional Insight offers business intelligence solutions, putting you in command of your business. Companies use ...
    Explora Software Explora http://orgwis.gmd.de:80/explora/ An freely available and ftpable Machintosh KDD package. Note: This info converted from the original "The Data Mine" pages ...
    FTPAble Machine Learning Software FTP able machine learning software http://www.cs.bham.ac.uk/~anp/dm docs/machine learning.software From comp.ai faq/part4. List of Ftpable machine learning software ...
    Fast Mind
    Gain Smarts GainSmarts http://www.urbanscience.com GainSmarts is an expert system using profilling and predictive modelling algorithm. The software is platform independent ...
    Gornik System Górnik System Tool for advanced Data Mining and analysis including classification, segmentation, survival methods etc. and data processing tools. Runs on Windows ...
    Graf FXGraphical Data Mining Shareware Graf FX Graphical Data Mining Shareware http://www.gr fx.com/graf fx.htm Contributed by: fx #64;bigpondSPAM BLOCKER.com Note: This info converted from the original ...
    Graf Fx Graf fx ... The Data Mining Tool For Microsft Access http://www.gr fx.com/graf fx.htm Commercial Data mining shareware written entirely in all current versions ...
    Guiding Inductive Learning With AQualitative Model Guiding Inductive Learning with a Qualitative Model http://www.cs.utexas.edu/users/pclark/software.html This package allows a qualitative model to bias induction ...
    IBMIntelligent Miner For Data IBM Intelligent Miner for Data http://www 4.ibm.com/software/data/iminer/fordata/ Use the IBM DB2 Intelligent Miner for Data to gain new business insights and to ...
    IBMVisualization Data Explorer IBM Visualization Data Explorer http://www.almaden.ibm.com/dx/ IBM Visualization Data Explorer is an interactive software program that allows scientists, engineers ...
    ISoft Alice ALICE d'ISoft http://www.isoft.fr/ Alice is a powerful and easy to use Data Mining Tool. Use decision trees to explore exploit your data. Textual reports, SQL ...
    IXLAnd IDISSoftware IXL and IDIS software http://www.cs.bham.ac.uk/~anp/dm docs/ixl/intern1.txt IXL was one the first commercial discovery and data mining programs which was followed ...
    Inlen Project INLEN http://www.mli.gmu.edu/projects/inlen.html This project is concerned with the development of a large scale multi type reasoning system, called INLEN, for extracting ...
    Insightful Miner InsightfulMiner, an affordable, scalable full life cycle data mining software. More info at http://www.insightful.com. JudyM 07 Mar 2002
    Iris Software IRIS http://allanon.gmd.de/and/java/iris/Iris.html IRIS is a prototype system supporting visual analysis of spatially referenced data. IRIS automatically produces ...
    Java Drill Down Demo Java Drill Down Demo http://www.itivity.com You need a Java enhanced browser to see this demo. It shows a demo of data access via "drill down". Note: This info ...
    KDNuggets Software List Gregory Piatetsky Shapiro's KD Nuggets Software List: http://www.kdnuggets.com/software/index.html AndyPryke 28 Oct 2001
    KXEN KXEN provides next generation business analytics software to drive better corporate decisions. KXEN's unmatched speed, ease of use and scalability enable leading companies ...
    Knowledge Access Suite Knowledge Access Suite http://www.datamining.com The Knowledge Access Suite™ has delivered the first and only set of products ever to provide business users with ...
    Knowledge Miner KnowledgeMiner http://www.scriptsoftware.com/km/ It discovers relationships in data and forecast using the self organizing GMDH approach. Contributed by: Gregory Ivakhnenko ...
    Knowledge Sync Alert Messaging By Vineyardsoft KnowledgeSync Alert Messaging by Vineyardsoft http://www.vineyardsoft.com/ KnowledgeSync 2000 identifies potential business problems (e.g., a pending order for ...
    Kovach Computing Services Kovach Computing Services http://www.kovcomp.co.uk/ Contains information about their shareware statistical software as well as links to other sites with statistical ...
    Kxen Components KXEN components www.kxen.com Kxen components can be described as: Vapnik based algorithm Robust models open architecture Speed of modeling Ease of use ...
    Level Five Quest LEVEL5 Quest http://www.l5r.com We at Level Five Research have developed an interesting twist in data mining which fills what we perceive to be a gap between heavy ...
    MLCLibrary Utilities MLC Library / Utilities http://www.sgi.com/Technology/mlc/ MLC is a machine learning library developed in C . MLC is public domain and can be used free of charge ...
    Maestro Software Maestro http://www.jjt.com Maestro a metadata driven SAS based statistical analysis tool particulary suited to semiconductor and flat panel display industries. However ...
    Magnum Opus Magnum Opus http://www.giwebb.com/ Established software for fast effective discovery of real associations. Designed by data miners for data miners. Incorporates ...
    Managed Reporting Environment MRE Managed Reporting Environment (MRE) http://www.SolutionsIQ.com/consulting/mre.html SolutionsIQ’s Managed Reporting Environment (MRE) is a centralized reporting ...
    Method And System For Electronic Exchange Of Tax Information Method and system for Electronic Exchange of Tax Information www.cpa network.com Commercial Century Process Associates Patent Pending on a Method and system for ...
    Mine Set MineSet http://mineset.sgi.com MineSet 2.5 released in May 1998. It is a fully integrated, comprehensive suite of easy to use analytical and visual data mining tools ...
    Mine Set SGI MineSet (SGI) http://www.sgi.com/software/mineset/ the second release of SGI's product for exploratory data analysis. Combining powerful integrated, interactive ...
    Model Quest Enterprise ModelQuest Enterprise http://www.abtech.com Highly automated predictive data mining software that includes Expert Mining Strategies, new proprietary modeling techniques ...
    Model Quest Enterprise Ab Tech Corporation ModelQuest Enterprise AbTech Corporation http://www.abtech.com Contributed by: updated by Christine Gresser, sales #64;abtechSPAM BLOCKER.com Note: This info ...
    Model Quest Market Miner Ab Tech Corporation ModelQuest MarketMiner AbTech Corporation http://www.abtech.com Contributed by: Contributed by Christine Gresser, sales #64;abtechSPAM BLOCKER.com Note: This ...
    Monarch Software You need information but how do you get at it? As a professional working in today's competitive world, you'll be very aware of the importance of concise and relevant ...
    Most Popular Data Mining Software Most Popular Data Mining Software Surveys conducted by Nuggets and Analytics have asked people involved in data mining what software they use. While it's not necessarily ...
    NULL All text removed
    Neural Net And Genetic Based DMSoftware Neural net and genetic based DM software http://www.altern.org/pmsi/home gb.htm Lots of classification/prediction/time series tutorials working demos. Contributed ...
    Nuggets TM Nuggets(TM) http://www.Data mine.com Nuggets uses proprietary search algorithms called SiftAgents(TM) to develop English "if then" rules. These algorithms use genetic ...
    ODBCMINE ODBCMINE http://www.intsysr.com/odbcmine.htm ODBCMINE analyzes ODBC data sources using the C4.5 algorithm, and outputs graphical decision trees in Scalable Vector ...
    Old List Of Data Mining Software For up to date listings, see AllDataMiningSoftware. The table below lists all data mining software whose details have not been checked. You can sort the table below ...
    Oracle Con Text Option Technical Oracle ConText Option technical http://technet.oracle.com/doc/context1x/CO11QCK/ch1.htm Oracle ConText Option is an option to Oracle, providing powerful search ...
    Oracle Context Option Oracle Context Option http://technet.oracle.com/doc/context200/CO20APP/intro.htm This chapter provides an overview of the Oracle ConText Option. Contributed by ...
    Orchestrate Software Orchestrate http://www.torrent.com Torrent’s Orchestrate simplifies and accelerates the development, deployment, and management of enterprise scale ...
    Partek Software Partek http://www.partek.com Software for data mining and knowledge discovery based on statistical methods, data visualization, neural networks, fuzzy logic and genetic ...
    Piping Systems Fluid Flow Software Piping Systems Fluid Flow Software http://www.fluidflowinfo.com Piping Systems Fluid Flow has been developed to provide the engineer with a total working environment ...
    Pmsi New URL pmsi new URL pmsi.nfrance.com Thanks for updating for your page ! Contributed by: Note: This info converted from the original "The Data Mine" pages and pre ...
    Prediction Works PredictionWorks http://www.predictionworks.com/analyze/ A free on line data mining service for smaller files. The service automatically tests several algorithms including ...
    Pv Wave PV WAVE http://www.vni.com PV WAVE is a Rapid Application Development Environment for the visualization and analysis of data. Note: This info converted from the ...
    QTMSQuantitative Target Marketing System QTMS : Quantitative Target Marketing System http://www.multivariate.com An expert system of multivariate modeling that highlights a new technique called " All Possible ...
    QYield Software Q YIELD http://www.quadrillion.com/ Software for data mining semiconductor fab production data to determine possible production problems.
    Real Time Stock Market Predictions From Textual News real time stock market predictions from textual news http://www.cs.ust.hk/~beat/Predict Beat Wuthrich beat #64;csSPAM BLOCKER.ust.hk for more info see www.cs.ust ...
    Recent Contributions Data Mining Software Data Mining Software The information about the packages on this page has been taken from README files, and other information provided on the web ...
    Ro CRobust Bayesian Classifier RoC (Robust Bayesian Classifier) http://kmi.open.ac.uk/projects/bkd/ RoC is a Bayesian supervised classifier able to handle incomplete databases with no assumption ...
    Rosetta Toolkit Rosetta A Rough Set Toolkit for Analysis of Data http://www.idi.ntnu.no/~aleks/rosetta/ Contributed by: Note: This info converted from the original "The Data ...
    SASInstitute Launches Enterprise Miner Software SAS Institute Launches Enterprise Miner software http://www.sas.com/software/data mining/ Commercial The respected French Analysts, Yphise have evaluated the Enterprise ...
    Sav ZServer Sav Z Server http://sites.netscape.net/savtechno/ Sav Z (Web Data) Server is a Web based object relational database server implemented in JavaTM. Server generates ...
    Sector Computing SOLAPOn The Web Sector Computing's OLAP On The Web http://ourworld.compuserve.com/homepages/SMGSecor/BI/Biindex.htm Commercial Contributed by: Note: This info converted from ...
    See Five Software C5.0 / See5 http://www.rulequest.com Contributed by: quinlan #64;rulequestSPAM BLOCKER.com Note: This info converted from the original "The Data Mine" pages and ...
    Set Enumeration Learn SE Learn http://www.isp.pitt.edu/~rymon/SE Learn.html An SE tree based induction and classification tool. Set Enumeration (SE) trees provide the basis for an induction ...
    Silicon Graphics Mine Set Data Mining Silicon Graphics MineSet Data Mining http://www.sgi.com/Products/software/MineSet/ Commercial Commercial Software for data mining. Contributed by: Note: This ...
    Sipina Pro Sipina W v2.0 and Sipina Pro http://eric.univ lyon2.fr/~ricco/sipina.html SIPINA W is a software for Knowledge Discovery in Databases. This version v2.0 contains ...
    Snob Software Snob http://www.cs.monash.edu.au/~dld/Snob.html Snob (Wallace and Boulton, 1968) was probably the first (Bayesian) program to do clustering (or unsupervised learning ...
    Software Form Name Type Size Values Tooltip message Name text 64 The name of the program or package Brief Summary text 64 Short summary ...
    Source Forge HonweiMo 02 Nov 2003
    Sphinx Vision By ASOC sphinxVision by ASOC http://www.asoc.com SOM neural network Contributed by: hans peter.neeb #64;ffm2SPAM BLOCKER.siemens.de Note: This info converted from the ...
    Stat Soft KyleMiller 26 Jan 2009 StatSoft, Inc. was founded in 1984 and is now one of the largest global providers of analytic software worldwide. StatSoft is also the largest ...
    Stat Soft Inc StatSoft, Inc. http://www.statsoft.com StatSoft, Inc., founded in 1984, is now one of the largest developers of enterprise and single user software for data analysis ...
    Stat Soft STATISTICAData Mining StatSoft STATISTICA Data Mining http://www.statsoft.com/datamining.html Contributed by StatSoft.com info@statsoft.com Contributed by: Note: This info converted ...
    Super Query SuperQuery http://www.azmy.com SuperQuery: A Database analysis software that has a knowledge discovery engine. You can download Free Trial version. You can also ...
    Svm Light SVMlight is an implementation of Vapnik's Support Vector Machine Vapnik, 1995 for the problem of pattern recognition, for the problem of regression, and for the ...
    Synthetic Classification Data Sets Synthetic Classification Data Sets program SCDS has been renamed DatGen
    TMiner Personal Edition TMiner Personal Edition http://frontdb.ugr.es Free Java Data Mining software downloadable from http://frontdb.ugr.es (Research section). TMiner collects some algorithms ...
    Test Add Software To Listing Add new software package to The Data Mine: "}% Click on "Software" on top bar to come back once you've done this.'}% %IF{"context authenticated" then ' ...
    Tetralogie Software Tetralogie http://atlas.irit.fr Techniques and Technologies for Information Retrieval and Resource Discovery, Contributed by: Taoufik Dkaki , Bernard Dousset, Said ...
    Text Analyst TextAnalyst http://www.megaputer.com TextAnalyst performs semantic analysis of texts in an arbitrary application domain. It is based on proprietary neural net technology ...
    The Data Mining Suite The Data Mining Suite http://www.datamining.com Contributed by: The Data Mining Suite http://www.datamining.com Contributed by: Contributed by: datamine ...
    The Knowledge Access Suite The Knowledge Access Suite http://www.datamining.com Contributed by: Note: This info converted from the original "The Data Mine" pages and pre dates June 2001 ...
    The Knowledge Access Suite And The Data Mining Suite The Knowledge Access Suite and The Data Mining Suite (Information Discovery, Inc.) http://www.datamining.com/ The Knowledge Access Suite™ has delivered the first ...
    Thinkbase Data Mining Product Thinkbase's Data Mining Product http://www.ThinkBase.com/ Note: This info converted from the original "The Data Mine" pages and pre dates June 2001. Please remove ...
    Thinking Machine Data Mining Product Thinking Machine's Data Mining Product http://www.think.com/html/products/products.htm It includes Neural Networks, Classification and Regression Trees (CART), ...
    Ti MBLTilburg Memory Based Learner TiMBL Tilburg Memory Based Learner http://ilk.kub.nl/software.html Contributed by: Jakub Zavrel Note: This info converted from the original "The Data Mine" pages ...
    Tooldiag Software Tooldiag http://documents.cfar.umd.edu/resources/source/tooldiag.html A software toolbox for the analysis of multidimensional data. C source and documentation included ...
    Visua Links VisuaLinks http://www.visualanalytics.com VisuaLinks is state of the art Java technology supporting link analyses and data visualization. VisuaLinks uses an intuitive ...
    Visual Text VisualText http://www.textai.com VisualText is a comprehensive GUI development environment for creating text analyzers. Resulting analyzers can run as C executables ...
    Web Atom TWiki's Software web
    Web Changes
    Web Create New Topic
    Web Index
    Web Left Bar
    Web Mining web mining Date: Location: Final Date For Submissions: Contributed by: Note: This info converted from the original "The Data Mine" pages and pre dates ...
    Web Notify
    Web Preferences Software Web Preferences The following settings are web preferences of the Software web. These preferences overwrite the site level preferences in . and ...
    Web Right Bar " warn "off"}% Web List of Software Most Popular Software Add Software to List Data Mining /$name ...
    Web Rss " else "TWiki's Software web"}% /Software The Documentation Web of TWiki. TWiki is an Enterprise Collaboration Platform.
    Web Search
    Web Search Advanced
    Web Statistics Statistics for Software Web Month: Topic views: Topic saves: File uploads: Most popular topic views: Top contributors for topic save and ...
    Web Topic List
    Weka Software N.B. see page " Weka " to edit this information
    Win Viz WinViz http://www.iti.gov.sg/iti RnD/infosheet/is/winviz.html WinViz is a Visual Data Analysis tool designed to complement spreadsheets, databases, executive information ...
    Winrosa Software WINROSA http://www.mitgmbh.de WINROSA is a software tool which generates automatically Fuzzy If Then Rules from your data. The generated data set can be run by most ...
    Wiz Rule For Windows WizRule for Windows http://www.wizsoft.com Discovers rules and identified exceptions to those rules. A demo version of the software is available online. Note ...
    Wiz Why WizWhy http://www.wizsoft.com WizWhy reveals all if then rules (with no limit as to the number of clauses) and mathematical formula rules, and predicts the value of ...
    Wonder Owl Wonder Owl Commercial Wonder Owl is the leading data mining and personalization package for managers and business people. Simple and intuitive to use, yet powerful ...
    Xmdv Tool XmdvTool http://wwwcip.informatik.uni erlangen.de/user/tntimm/XmdvTool.html The XmdvTool allows users to visually explore multivariate data in a variety of methods ...
    Xpert Rule XpertRule http://www.attar.com Data Mining using high performance parallel SQL technology Knowledge Induction can be achieved by a Windows PC client being able to ...
    Yphise Software Evaluation Reports Yphise Software Evaluation Reports http://www.yphise.com Commercial Yphise provides software evaluation of interest to IT managers. Yphise software evaluation Report ...